2.8 KiB
2.8 KiB
CH-Annotator Quick Reference
ID: ch_annotator-v1_7_0
Full Name: CH-Annotator (Cultural Heritage Annotator)
Status: PRODUCTION
What is CH-Annotator?
CH-Annotator is the comprehensive entity annotation convention for this project. It covers:
- Named Entity Recognition (NER)
- Property Extraction
- Entity Resolution and Linking
- Claim Validation
- Document Structure Annotation
Quick Start
Using CH-Annotator in Extraction
When extracting entities, reference the convention in provenance:
provenance:
extraction_method: ch_annotator-v1_7_0
extraction_date: "2025-12-06T10:00:00Z"
Finding the Convention
| Location | Description |
|---|---|
data/entity_annotation/ch_annotator-v1_7_0.yaml |
Complete 2500+ line convention |
data/entity_annotation/modules/ |
Modular hypernym definitions |
.opencode/CH_ANNOTATOR_CONVENTION.md |
Full documentation |
AGENTS.md Rule 10 |
Agent usage rules |
Hypernym Codes (9 Categories)
| Code | Name | Use For |
|---|---|---|
| AGT | AGENT | People, AI, animals, fictional beings |
| GRP | GROUP | Organizations, institutions, movements |
| TOP | TOPONYM | Place names (cities, countries, regions) |
| GEO | GEOMETRY | Coordinates, polygons, spatial data |
| TMP | TEMPORAL | Dates, times, periods, durations |
| APP | APPELLATION | Names, titles, identifiers |
| ROL | ROLE | Occupations, positions, honorifics |
| WRK | WORK | Publications, artworks, records |
| QTY | QUANTITY | Numbers, measurements, currencies |
Heritage Institution Subtypes (GRP.HER)
GRP.HER.GAL → Gallery
GRP.HER.LIB → Library
GRP.HER.ARC → Archive
GRP.HER.MUS → Museum
GRP.HER.OFF → Official Institution
GRP.HER.RES → Research Center
GRP.HER.COR → Corporation
GRP.HER.UNK → Unknown
GRP.HER.BOT → Botanical/Zoo
GRP.HER.EDU → Education Provider
GRP.HER.SOC → Collecting Society
...
Claim Provenance (5 Required Components)
Every claim MUST have:
claim:
claim_type: full_name
claim_value: "Rijksmuseum"
provenance:
namespace: skos # 1. Ontology prefix
path: /html/body/h1[1] # 2. Source path
timestamp: "2025-12-06T10:00:00Z" # 3. Timestamp
agent: ch_annotator-v1_7_0 # 4. Model/agent
context_convention: ch_annotator-v1_7_0 # 5. Convention
Authority Stack
Use These (Digital Humanities):
- TEI P5
- CIDOC-CRM 7.1.3
- TimeML/TIMEX3
- FRBR/LRM
- GeoSPARQL
Avoid (Web-centric):
- NERD (deprecated, interchange only)
Naming History
| Date | Name |
|---|---|
| Pre-2025-12-06 | GLAM-NER v1.7.0-unified |
| 2025-12-06+ | CH-Annotator v1.7.0 (ch_annotator-v1_7_0) |
See Also: .opencode/CH_ANNOTATOR_CONVENTION.md for comprehensive documentation