5.8 KiB
Timeline Event Provenance Policy
Overview
This document clarifies the provenance model for CustodianTimelineEvent data (renamed from LinkupTimelineEvent in January 2026).
Key Change: The class is now source-agnostic. Detailed provenance about the data source (Linkup, Wikidata, web scraping, etc.) belongs in observation classes, not in the event itself.
Architectural Principle: Provenance Separation (Rule 37)
Domain classes model WHAT happened, not HOW we know:
| Layer | Purpose | Classes |
|---|---|---|
| Domain | What happened (events, entities) | CustodianTimelineEvent |
| Observation | How we observed it (provenance) | WebObservation, CustodianObservation |
See .opencode/PROVENANCE_SEPARATION_RULE.md for the full rule.
Rule 6 Scope Clarification
AGENTS.md Rule 6 ("WebObservation Claims MUST Have XPath Provenance") applies ONLY to:
WebClaimclassWebObservationclassPersonWebClaimclass
Rule 6 does NOT apply to:
CustodianTimelineEventclass (source-agnostic design)WikidataEnrichment(uses entity URI provenance)- Other API-based enrichments
CustodianTimelineEvent Provenance Model
The CustodianTimelineEvent class uses source-agnostic provenance fields:
Required Fields
| Field | Purpose |
|---|---|
event_type |
What kind of event (FOUNDING, MERGER, etc.) |
date_precision |
How specific is the date (day, year, decade) |
approximate |
Is the date approximate (circa, roughly) |
description |
Human-readable summary of the event |
extraction_method |
How was the event discovered |
extraction_timestamp |
When was the event extracted |
data_tier |
Quality tier (TIER_1 to TIER_4) |
Optional Fields
| Field | Purpose |
|---|---|
event_date |
When the event occurred (if known) |
source_urls |
URLs documenting the event |
extraction_notes |
Free-text notes for source-specific details |
archive_path |
Path to archived source material |
observation_ref |
Link to observation class for detailed provenance |
Extraction Methods
The TimelineExtractionMethodEnum covers various sources:
| Method | Description |
|---|---|
api_response_regex |
Date extracted via regex from API response |
api_response_llm |
Date extracted using LLM analysis |
web_scrape_xpath |
Date extracted via XPath from archived HTML |
wikidata_sparql |
Date extracted from Wikidata SPARQL |
manual_research |
Event discovered through manual research |
manual_verification |
Event manually verified and corrected |
Source-Specific Details in extraction_notes
Use the extraction_notes field to capture source-specific details that don't fit elsewhere:
For API-sourced data (e.g., Linkup)
extraction_notes: |
Query: "Drents Archief" Assen opgericht OR gesticht
API: Linkup. Answer: "Het RHC Drents Archief werd opgericht op 30 april 2005..."
Sources cited: nl.wikipedia.org, bizzy.ai
archive_path: web/0002/linkup/linkup_founding_20251215T160438Z.json
For web-scraped data
extraction_notes: |
XPath: /html/body/main/section[2]/div/p[3]
Source page: https://www.rijksmuseum.nl/en/about-us/history
archive_path: web/0001/rijksmuseum.nl/about-us/rendered.html
For Wikidata-sourced data
extraction_notes: |
Wikidata: Q190804
Property: P571 (inception date)
SPARQL timestamp: 2025-12-20T14:30:00Z
Linking to Observation Classes
For detailed provenance, use observation_ref to link to a WebObservation:
timeline_events:
- event_type: FOUNDING
event_date: "2005-04-30"
# ... other fields ...
observation_ref: "https://nde.nl/ontology/hc/observation/web/2025-12-15/drents-archief"
The referenced WebObservation contains:
- Full API response details
- XPath provenance (if applicable)
- HTTP response metadata
- Archived content hash
Data Quality Tiers
Events should have their data_tier set appropriately:
| Tier | Description | Typical Source |
|---|---|---|
TIER_4_INFERRED |
Unverified, possibly from LLM | Initial API extraction |
TIER_3_CROWD_SOURCED |
Verified against Wikipedia/Wikidata | Cross-referenced |
TIER_2_VERIFIED |
Verified against institutional website | Official source |
TIER_1_AUTHORITATIVE |
Verified against official registry | Government records |
Migration from LinkupTimelineEvent
The following fields were removed from the class (use alternatives):
| Old Field | Migration Path |
|---|---|
linkup_query |
Put in extraction_notes |
linkup_answer |
Put in extraction_notes |
fetch_timestamp |
Use extraction_timestamp |
LinkupExtractionMethodEnum |
Use TimelineExtractionMethodEnum |
Data files with timeline_enrichment.timeline_events continue to work - the events are now instances of CustodianTimelineEvent.
Current Statistics (January 2026)
- Total events: ~1,199 across ~862 custodian files
- Event types: FOUNDING (927), TRANSFER (190), MERGER (57), DISSOLUTION (10), RENAMING (10)
- Data tier: Mostly TIER_4_INFERRED (pending verification)
Schema Reference
The formal schema is defined in:
schemas/20251121/linkml/modules/classes/CustodianTimelineEvent.yaml
Related Documentation
AGENTS.mdRule 6 - WebObservation XPath requirements.opencode/PROVENANCE_SEPARATION_RULE.md- Rule 37 on provenance separation.opencode/WEB_OBSERVATION_PROVENANCE_RULES.md- WebClaim detailsschemas/20251121/linkml/modules/classes/WebObservation.yaml- WebObservation schemaschemas/20251121/linkml/modules/classes/CustodianObservation.yaml- CustodianObservation schema
Created: 2025-12-16
Updated: 2026-01-01 (Renamed class to CustodianTimelineEvent, source-agnostic design)
Status: ACTIVE
Applies to: Heritage Custodian Timeline Events