166 lines
5.8 KiB
Markdown
166 lines
5.8 KiB
Markdown
# Timeline Event Provenance Policy
|
|
|
|
## Overview
|
|
|
|
This document clarifies the provenance model for `CustodianTimelineEvent` data (renamed from `LinkupTimelineEvent` in January 2026).
|
|
|
|
**Key Change**: The class is now source-agnostic. Detailed provenance about the data source (Linkup, Wikidata, web scraping, etc.) belongs in observation classes, not in the event itself.
|
|
|
|
## Architectural Principle: Provenance Separation (Rule 37)
|
|
|
|
Domain classes model WHAT happened, not HOW we know:
|
|
|
|
| Layer | Purpose | Classes |
|
|
|-------|---------|---------|
|
|
| **Domain** | What happened (events, entities) | `CustodianTimelineEvent` |
|
|
| **Observation** | How we observed it (provenance) | `WebObservation`, `CustodianObservation` |
|
|
|
|
See `.opencode/PROVENANCE_SEPARATION_RULE.md` for the full rule.
|
|
|
|
## Rule 6 Scope Clarification
|
|
|
|
**AGENTS.md Rule 6** ("WebObservation Claims MUST Have XPath Provenance") applies ONLY to:
|
|
- `WebClaim` class
|
|
- `WebObservation` class
|
|
- `PersonWebClaim` class
|
|
|
|
**Rule 6 does NOT apply to**:
|
|
- `CustodianTimelineEvent` class (source-agnostic design)
|
|
- `WikidataEnrichment` (uses entity URI provenance)
|
|
- Other API-based enrichments
|
|
|
|
## CustodianTimelineEvent Provenance Model
|
|
|
|
The `CustodianTimelineEvent` class uses **source-agnostic provenance fields**:
|
|
|
|
### Required Fields
|
|
|
|
| Field | Purpose |
|
|
|-------|---------|
|
|
| `event_type` | What kind of event (FOUNDING, MERGER, etc.) |
|
|
| `date_precision` | How specific is the date (day, year, decade) |
|
|
| `approximate` | Is the date approximate (circa, roughly) |
|
|
| `description` | Human-readable summary of the event |
|
|
| `extraction_method` | How was the event discovered |
|
|
| `extraction_timestamp` | When was the event extracted |
|
|
| `data_tier` | Quality tier (TIER_1 to TIER_4) |
|
|
|
|
### Optional Fields
|
|
|
|
| Field | Purpose |
|
|
|-------|---------|
|
|
| `event_date` | When the event occurred (if known) |
|
|
| `source_urls` | URLs documenting the event |
|
|
| `extraction_notes` | Free-text notes for source-specific details |
|
|
| `archive_path` | Path to archived source material |
|
|
| `observation_ref` | Link to observation class for detailed provenance |
|
|
|
|
### Extraction Methods
|
|
|
|
The `TimelineExtractionMethodEnum` covers various sources:
|
|
|
|
| Method | Description |
|
|
|--------|-------------|
|
|
| `api_response_regex` | Date extracted via regex from API response |
|
|
| `api_response_llm` | Date extracted using LLM analysis |
|
|
| `web_scrape_xpath` | Date extracted via XPath from archived HTML |
|
|
| `wikidata_sparql` | Date extracted from Wikidata SPARQL |
|
|
| `manual_research` | Event discovered through manual research |
|
|
| `manual_verification` | Event manually verified and corrected |
|
|
|
|
## Source-Specific Details in extraction_notes
|
|
|
|
Use the `extraction_notes` field to capture source-specific details that don't fit elsewhere:
|
|
|
|
### For API-sourced data (e.g., Linkup)
|
|
```yaml
|
|
extraction_notes: |
|
|
Query: "Drents Archief" Assen opgericht OR gesticht
|
|
API: Linkup. Answer: "Het RHC Drents Archief werd opgericht op 30 april 2005..."
|
|
Sources cited: nl.wikipedia.org, bizzy.ai
|
|
archive_path: web/0002/linkup/linkup_founding_20251215T160438Z.json
|
|
```
|
|
|
|
### For web-scraped data
|
|
```yaml
|
|
extraction_notes: |
|
|
XPath: /html/body/main/section[2]/div/p[3]
|
|
Source page: https://www.rijksmuseum.nl/en/about-us/history
|
|
archive_path: web/0001/rijksmuseum.nl/about-us/rendered.html
|
|
```
|
|
|
|
### For Wikidata-sourced data
|
|
```yaml
|
|
extraction_notes: |
|
|
Wikidata: Q190804
|
|
Property: P571 (inception date)
|
|
SPARQL timestamp: 2025-12-20T14:30:00Z
|
|
```
|
|
|
|
## Linking to Observation Classes
|
|
|
|
For detailed provenance, use `observation_ref` to link to a `WebObservation`:
|
|
|
|
```yaml
|
|
timeline_events:
|
|
- event_type: FOUNDING
|
|
event_date: "2005-04-30"
|
|
# ... other fields ...
|
|
observation_ref: "https://nde.nl/ontology/hc/observation/web/2025-12-15/drents-archief"
|
|
```
|
|
|
|
The referenced `WebObservation` contains:
|
|
- Full API response details
|
|
- XPath provenance (if applicable)
|
|
- HTTP response metadata
|
|
- Archived content hash
|
|
|
|
## Data Quality Tiers
|
|
|
|
Events should have their `data_tier` set appropriately:
|
|
|
|
| Tier | Description | Typical Source |
|
|
|------|-------------|----------------|
|
|
| `TIER_4_INFERRED` | Unverified, possibly from LLM | Initial API extraction |
|
|
| `TIER_3_CROWD_SOURCED` | Verified against Wikipedia/Wikidata | Cross-referenced |
|
|
| `TIER_2_VERIFIED` | Verified against institutional website | Official source |
|
|
| `TIER_1_AUTHORITATIVE` | Verified against official registry | Government records |
|
|
|
|
## Migration from LinkupTimelineEvent
|
|
|
|
The following fields were removed from the class (use alternatives):
|
|
|
|
| Old Field | Migration Path |
|
|
|-----------|----------------|
|
|
| `linkup_query` | Put in `extraction_notes` |
|
|
| `linkup_answer` | Put in `extraction_notes` |
|
|
| `fetch_timestamp` | Use `extraction_timestamp` |
|
|
| `LinkupExtractionMethodEnum` | Use `TimelineExtractionMethodEnum` |
|
|
|
|
Data files with `timeline_enrichment.timeline_events` continue to work - the events are now instances of `CustodianTimelineEvent`.
|
|
|
|
## Current Statistics (January 2026)
|
|
|
|
- **Total events**: ~1,199 across ~862 custodian files
|
|
- **Event types**: FOUNDING (927), TRANSFER (190), MERGER (57), DISSOLUTION (10), RENAMING (10)
|
|
- **Data tier**: Mostly TIER_4_INFERRED (pending verification)
|
|
|
|
## Schema Reference
|
|
|
|
The formal schema is defined in:
|
|
- `schemas/20251121/linkml/modules/classes/CustodianTimelineEvent.yaml`
|
|
|
|
## Related Documentation
|
|
|
|
- `AGENTS.md` Rule 6 - WebObservation XPath requirements
|
|
- `.opencode/PROVENANCE_SEPARATION_RULE.md` - Rule 37 on provenance separation
|
|
- `.opencode/WEB_OBSERVATION_PROVENANCE_RULES.md` - WebClaim details
|
|
- `schemas/20251121/linkml/modules/classes/WebObservation.yaml` - WebObservation schema
|
|
- `schemas/20251121/linkml/modules/classes/CustodianObservation.yaml` - CustodianObservation schema
|
|
|
|
---
|
|
|
|
**Created**: 2025-12-16
|
|
**Updated**: 2026-01-01 (Renamed class to CustodianTimelineEvent, source-agnostic design)
|
|
**Status**: ACTIVE
|
|
**Applies to**: Heritage Custodian Timeline Events
|