glam/docs/CH_ANNOTATOR_QUICK_REFERENCE.md
2025-12-07 00:26:01 +01:00

2.8 KiB

CH-Annotator Quick Reference

ID: ch_annotator-v1_7_0
Full Name: CH-Annotator (Cultural Heritage Annotator)
Status: PRODUCTION


What is CH-Annotator?

CH-Annotator is the comprehensive entity annotation convention for this project. It covers:

  • Named Entity Recognition (NER)
  • Property Extraction
  • Entity Resolution and Linking
  • Claim Validation
  • Document Structure Annotation

Quick Start

Using CH-Annotator in Extraction

When extracting entities, reference the convention in provenance:

provenance:
  extraction_method: ch_annotator-v1_7_0
  extraction_date: "2025-12-06T10:00:00Z"

Finding the Convention

Location Description
data/entity_annotation/ch_annotator-v1_7_0.yaml Complete 2500+ line convention
data/entity_annotation/modules/ Modular hypernym definitions
.opencode/CH_ANNOTATOR_CONVENTION.md Full documentation
AGENTS.md Rule 10 Agent usage rules

Hypernym Codes (9 Categories)

Code Name Use For
AGT AGENT People, AI, animals, fictional beings
GRP GROUP Organizations, institutions, movements
TOP TOPONYM Place names (cities, countries, regions)
GEO GEOMETRY Coordinates, polygons, spatial data
TMP TEMPORAL Dates, times, periods, durations
APP APPELLATION Names, titles, identifiers
ROL ROLE Occupations, positions, honorifics
WRK WORK Publications, artworks, records
QTY QUANTITY Numbers, measurements, currencies

Heritage Institution Subtypes (GRP.HER)

GRP.HER.GAL  → Gallery
GRP.HER.LIB  → Library
GRP.HER.ARC  → Archive
GRP.HER.MUS  → Museum
GRP.HER.OFF  → Official Institution
GRP.HER.RES  → Research Center
GRP.HER.COR  → Corporation
GRP.HER.UNK  → Unknown
GRP.HER.BOT  → Botanical/Zoo
GRP.HER.EDU  → Education Provider
GRP.HER.SOC  → Collecting Society
...

Claim Provenance (5 Required Components)

Every claim MUST have:

claim:
  claim_type: full_name
  claim_value: "Rijksmuseum"
  provenance:
    namespace: skos                       # 1. Ontology prefix
    path: /html/body/h1[1]               # 2. Source path
    timestamp: "2025-12-06T10:00:00Z"    # 3. Timestamp
    agent: ch_annotator-v1_7_0           # 4. Model/agent
    context_convention: ch_annotator-v1_7_0  # 5. Convention

Authority Stack

Use These (Digital Humanities):

  • TEI P5
  • CIDOC-CRM 7.1.3
  • TimeML/TIMEX3
  • FRBR/LRM
  • GeoSPARQL

Avoid (Web-centric):

  • NERD (deprecated, interchange only)

Naming History

Date Name
Pre-2025-12-06 GLAM-NER v1.7.0-unified
2025-12-06+ CH-Annotator v1.7.0 (ch_annotator-v1_7_0)

See Also: .opencode/CH_ANNOTATOR_CONVENTION.md for comprehensive documentation