119 lines
2.8 KiB
Markdown
119 lines
2.8 KiB
Markdown
# CH-Annotator Quick Reference
|
|
|
|
**ID**: `ch_annotator-v1_7_0`
|
|
**Full Name**: CH-Annotator (Cultural Heritage Annotator)
|
|
**Status**: PRODUCTION
|
|
|
|
---
|
|
|
|
## What is CH-Annotator?
|
|
|
|
CH-Annotator is the comprehensive entity annotation convention for this project. It covers:
|
|
- Named Entity Recognition (NER)
|
|
- Property Extraction
|
|
- Entity Resolution and Linking
|
|
- Claim Validation
|
|
- Document Structure Annotation
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Using CH-Annotator in Extraction
|
|
|
|
When extracting entities, reference the convention in provenance:
|
|
|
|
```yaml
|
|
provenance:
|
|
extraction_method: ch_annotator-v1_7_0
|
|
extraction_date: "2025-12-06T10:00:00Z"
|
|
```
|
|
|
|
### Finding the Convention
|
|
|
|
| Location | Description |
|
|
|----------|-------------|
|
|
| `data/entity_annotation/ch_annotator-v1_7_0.yaml` | Complete 2500+ line convention |
|
|
| `data/entity_annotation/modules/` | Modular hypernym definitions |
|
|
| `.opencode/CH_ANNOTATOR_CONVENTION.md` | Full documentation |
|
|
| `AGENTS.md` Rule 10 | Agent usage rules |
|
|
|
|
---
|
|
|
|
## Hypernym Codes (9 Categories)
|
|
|
|
| Code | Name | Use For |
|
|
|------|------|---------|
|
|
| **AGT** | AGENT | People, AI, animals, fictional beings |
|
|
| **GRP** | GROUP | Organizations, institutions, movements |
|
|
| **TOP** | TOPONYM | Place names (cities, countries, regions) |
|
|
| **GEO** | GEOMETRY | Coordinates, polygons, spatial data |
|
|
| **TMP** | TEMPORAL | Dates, times, periods, durations |
|
|
| **APP** | APPELLATION | Names, titles, identifiers |
|
|
| **ROL** | ROLE | Occupations, positions, honorifics |
|
|
| **WRK** | WORK | Publications, artworks, records |
|
|
| **QTY** | QUANTITY | Numbers, measurements, currencies |
|
|
|
|
---
|
|
|
|
## Heritage Institution Subtypes (GRP.HER)
|
|
|
|
```
|
|
GRP.HER.GAL → Gallery
|
|
GRP.HER.LIB → Library
|
|
GRP.HER.ARC → Archive
|
|
GRP.HER.MUS → Museum
|
|
GRP.HER.OFF → Official Institution
|
|
GRP.HER.RES → Research Center
|
|
GRP.HER.COR → Corporation
|
|
GRP.HER.UNK → Unknown
|
|
GRP.HER.BOT → Botanical/Zoo
|
|
GRP.HER.EDU → Education Provider
|
|
GRP.HER.SOC → Collecting Society
|
|
...
|
|
```
|
|
|
|
---
|
|
|
|
## Claim Provenance (5 Required Components)
|
|
|
|
Every claim MUST have:
|
|
|
|
```yaml
|
|
claim:
|
|
claim_type: full_name
|
|
claim_value: "Rijksmuseum"
|
|
provenance:
|
|
namespace: skos # 1. Ontology prefix
|
|
path: /html/body/h1[1] # 2. Source path
|
|
timestamp: "2025-12-06T10:00:00Z" # 3. Timestamp
|
|
agent: ch_annotator-v1_7_0 # 4. Model/agent
|
|
context_convention: ch_annotator-v1_7_0 # 5. Convention
|
|
```
|
|
|
|
---
|
|
|
|
## Authority Stack
|
|
|
|
**Use These** (Digital Humanities):
|
|
- TEI P5
|
|
- CIDOC-CRM 7.1.3
|
|
- TimeML/TIMEX3
|
|
- FRBR/LRM
|
|
- GeoSPARQL
|
|
|
|
**Avoid** (Web-centric):
|
|
- NERD (deprecated, interchange only)
|
|
|
|
---
|
|
|
|
## Naming History
|
|
|
|
| Date | Name |
|
|
|------|------|
|
|
| Pre-2025-12-06 | GLAM-NER v1.7.0-unified |
|
|
| 2025-12-06+ | CH-Annotator v1.7.0 (`ch_annotator-v1_7_0`) |
|
|
|
|
---
|
|
|
|
**See Also**: `.opencode/CH_ANNOTATOR_CONVENTION.md` for comprehensive documentation
|