# Timeline Event Provenance Policy ## Overview This document clarifies the provenance model for `CustodianTimelineEvent` data (renamed from `LinkupTimelineEvent` in January 2026). **Key Change**: The class is now source-agnostic. Detailed provenance about the data source (Linkup, Wikidata, web scraping, etc.) belongs in observation classes, not in the event itself. ## Architectural Principle: Provenance Separation (Rule 37) Domain classes model WHAT happened, not HOW we know: | Layer | Purpose | Classes | |-------|---------|---------| | **Domain** | What happened (events, entities) | `CustodianTimelineEvent` | | **Observation** | How we observed it (provenance) | `WebObservation`, `CustodianObservation` | See `.opencode/PROVENANCE_SEPARATION_RULE.md` for the full rule. ## Rule 6 Scope Clarification **AGENTS.md Rule 6** ("WebObservation Claims MUST Have XPath Provenance") applies ONLY to: - `WebClaim` class - `WebObservation` class - `PersonWebClaim` class **Rule 6 does NOT apply to**: - `CustodianTimelineEvent` class (source-agnostic design) - `WikidataEnrichment` (uses entity URI provenance) - Other API-based enrichments ## CustodianTimelineEvent Provenance Model The `CustodianTimelineEvent` class uses **source-agnostic provenance fields**: ### Required Fields | Field | Purpose | |-------|---------| | `event_type` | What kind of event (FOUNDING, MERGER, etc.) | | `date_precision` | How specific is the date (day, year, decade) | | `approximate` | Is the date approximate (circa, roughly) | | `description` | Human-readable summary of the event | | `extraction_method` | How was the event discovered | | `extraction_timestamp` | When was the event extracted | | `data_tier` | Quality tier (TIER_1 to TIER_4) | ### Optional Fields | Field | Purpose | |-------|---------| | `event_date` | When the event occurred (if known) | | `source_urls` | URLs documenting the event | | `extraction_notes` | Free-text notes for source-specific details | | `archive_path` | Path to archived source material | | `observation_ref` | Link to observation class for detailed provenance | ### Extraction Methods The `TimelineExtractionMethodEnum` covers various sources: | Method | Description | |--------|-------------| | `api_response_regex` | Date extracted via regex from API response | | `api_response_llm` | Date extracted using LLM analysis | | `web_scrape_xpath` | Date extracted via XPath from archived HTML | | `wikidata_sparql` | Date extracted from Wikidata SPARQL | | `manual_research` | Event discovered through manual research | | `manual_verification` | Event manually verified and corrected | ## Source-Specific Details in extraction_notes Use the `extraction_notes` field to capture source-specific details that don't fit elsewhere: ### For API-sourced data (e.g., Linkup) ```yaml extraction_notes: | Query: "Drents Archief" Assen opgericht OR gesticht API: Linkup. Answer: "Het RHC Drents Archief werd opgericht op 30 april 2005..." Sources cited: nl.wikipedia.org, bizzy.ai archive_path: web/0002/linkup/linkup_founding_20251215T160438Z.json ``` ### For web-scraped data ```yaml extraction_notes: | XPath: /html/body/main/section[2]/div/p[3] Source page: https://www.rijksmuseum.nl/en/about-us/history archive_path: web/0001/rijksmuseum.nl/about-us/rendered.html ``` ### For Wikidata-sourced data ```yaml extraction_notes: | Wikidata: Q190804 Property: P571 (inception date) SPARQL timestamp: 2025-12-20T14:30:00Z ``` ## Linking to Observation Classes For detailed provenance, use `observation_ref` to link to a `WebObservation`: ```yaml timeline_events: - event_type: FOUNDING event_date: "2005-04-30" # ... other fields ... observation_ref: "https://nde.nl/ontology/hc/observation/web/2025-12-15/drents-archief" ``` The referenced `WebObservation` contains: - Full API response details - XPath provenance (if applicable) - HTTP response metadata - Archived content hash ## Data Quality Tiers Events should have their `data_tier` set appropriately: | Tier | Description | Typical Source | |------|-------------|----------------| | `TIER_4_INFERRED` | Unverified, possibly from LLM | Initial API extraction | | `TIER_3_CROWD_SOURCED` | Verified against Wikipedia/Wikidata | Cross-referenced | | `TIER_2_VERIFIED` | Verified against institutional website | Official source | | `TIER_1_AUTHORITATIVE` | Verified against official registry | Government records | ## Migration from LinkupTimelineEvent The following fields were removed from the class (use alternatives): | Old Field | Migration Path | |-----------|----------------| | `linkup_query` | Put in `extraction_notes` | | `linkup_answer` | Put in `extraction_notes` | | `fetch_timestamp` | Use `extraction_timestamp` | | `LinkupExtractionMethodEnum` | Use `TimelineExtractionMethodEnum` | Data files with `timeline_enrichment.timeline_events` continue to work - the events are now instances of `CustodianTimelineEvent`. ## Current Statistics (January 2026) - **Total events**: ~1,199 across ~862 custodian files - **Event types**: FOUNDING (927), TRANSFER (190), MERGER (57), DISSOLUTION (10), RENAMING (10) - **Data tier**: Mostly TIER_4_INFERRED (pending verification) ## Schema Reference The formal schema is defined in: - `schemas/20251121/linkml/modules/classes/CustodianTimelineEvent.yaml` ## Related Documentation - `AGENTS.md` Rule 6 - WebObservation XPath requirements - `.opencode/PROVENANCE_SEPARATION_RULE.md` - Rule 37 on provenance separation - `.opencode/WEB_OBSERVATION_PROVENANCE_RULES.md` - WebClaim details - `schemas/20251121/linkml/modules/classes/WebObservation.yaml` - WebObservation schema - `schemas/20251121/linkml/modules/classes/CustodianObservation.yaml` - CustodianObservation schema --- **Created**: 2025-12-16 **Updated**: 2026-01-01 (Renamed class to CustodianTimelineEvent, source-agnostic design) **Status**: ACTIVE **Applies to**: Heritage Custodian Timeline Events