glam/.opencode/TIMELINE_EVENT_PROVENANCE_POLICY.md
2026-01-02 02:11:04 +01:00

5.8 KiB

Timeline Event Provenance Policy

Overview

This document clarifies the provenance model for CustodianTimelineEvent data (renamed from LinkupTimelineEvent in January 2026).

Key Change: The class is now source-agnostic. Detailed provenance about the data source (Linkup, Wikidata, web scraping, etc.) belongs in observation classes, not in the event itself.

Architectural Principle: Provenance Separation (Rule 37)

Domain classes model WHAT happened, not HOW we know:

Layer Purpose Classes
Domain What happened (events, entities) CustodianTimelineEvent
Observation How we observed it (provenance) WebObservation, CustodianObservation

See .opencode/PROVENANCE_SEPARATION_RULE.md for the full rule.

Rule 6 Scope Clarification

AGENTS.md Rule 6 ("WebObservation Claims MUST Have XPath Provenance") applies ONLY to:

  • WebClaim class
  • WebObservation class
  • PersonWebClaim class

Rule 6 does NOT apply to:

  • CustodianTimelineEvent class (source-agnostic design)
  • WikidataEnrichment (uses entity URI provenance)
  • Other API-based enrichments

CustodianTimelineEvent Provenance Model

The CustodianTimelineEvent class uses source-agnostic provenance fields:

Required Fields

Field Purpose
event_type What kind of event (FOUNDING, MERGER, etc.)
date_precision How specific is the date (day, year, decade)
approximate Is the date approximate (circa, roughly)
description Human-readable summary of the event
extraction_method How was the event discovered
extraction_timestamp When was the event extracted
data_tier Quality tier (TIER_1 to TIER_4)

Optional Fields

Field Purpose
event_date When the event occurred (if known)
source_urls URLs documenting the event
extraction_notes Free-text notes for source-specific details
archive_path Path to archived source material
observation_ref Link to observation class for detailed provenance

Extraction Methods

The TimelineExtractionMethodEnum covers various sources:

Method Description
api_response_regex Date extracted via regex from API response
api_response_llm Date extracted using LLM analysis
web_scrape_xpath Date extracted via XPath from archived HTML
wikidata_sparql Date extracted from Wikidata SPARQL
manual_research Event discovered through manual research
manual_verification Event manually verified and corrected

Source-Specific Details in extraction_notes

Use the extraction_notes field to capture source-specific details that don't fit elsewhere:

For API-sourced data (e.g., Linkup)

extraction_notes: |
  Query: "Drents Archief" Assen opgericht OR gesticht
  API: Linkup. Answer: "Het RHC Drents Archief werd opgericht op 30 april 2005..."
  Sources cited: nl.wikipedia.org, bizzy.ai  
archive_path: web/0002/linkup/linkup_founding_20251215T160438Z.json

For web-scraped data

extraction_notes: |
  XPath: /html/body/main/section[2]/div/p[3]
  Source page: https://www.rijksmuseum.nl/en/about-us/history  
archive_path: web/0001/rijksmuseum.nl/about-us/rendered.html

For Wikidata-sourced data

extraction_notes: |
  Wikidata: Q190804
  Property: P571 (inception date)
  SPARQL timestamp: 2025-12-20T14:30:00Z  

Linking to Observation Classes

For detailed provenance, use observation_ref to link to a WebObservation:

timeline_events:
  - event_type: FOUNDING
    event_date: "2005-04-30"
    # ... other fields ...
    observation_ref: "https://nde.nl/ontology/hc/observation/web/2025-12-15/drents-archief"

The referenced WebObservation contains:

  • Full API response details
  • XPath provenance (if applicable)
  • HTTP response metadata
  • Archived content hash

Data Quality Tiers

Events should have their data_tier set appropriately:

Tier Description Typical Source
TIER_4_INFERRED Unverified, possibly from LLM Initial API extraction
TIER_3_CROWD_SOURCED Verified against Wikipedia/Wikidata Cross-referenced
TIER_2_VERIFIED Verified against institutional website Official source
TIER_1_AUTHORITATIVE Verified against official registry Government records

Migration from LinkupTimelineEvent

The following fields were removed from the class (use alternatives):

Old Field Migration Path
linkup_query Put in extraction_notes
linkup_answer Put in extraction_notes
fetch_timestamp Use extraction_timestamp
LinkupExtractionMethodEnum Use TimelineExtractionMethodEnum

Data files with timeline_enrichment.timeline_events continue to work - the events are now instances of CustodianTimelineEvent.

Current Statistics (January 2026)

  • Total events: ~1,199 across ~862 custodian files
  • Event types: FOUNDING (927), TRANSFER (190), MERGER (57), DISSOLUTION (10), RENAMING (10)
  • Data tier: Mostly TIER_4_INFERRED (pending verification)

Schema Reference

The formal schema is defined in:

  • schemas/20251121/linkml/modules/classes/CustodianTimelineEvent.yaml
  • AGENTS.md Rule 6 - WebObservation XPath requirements
  • .opencode/PROVENANCE_SEPARATION_RULE.md - Rule 37 on provenance separation
  • .opencode/WEB_OBSERVATION_PROVENANCE_RULES.md - WebClaim details
  • schemas/20251121/linkml/modules/classes/WebObservation.yaml - WebObservation schema
  • schemas/20251121/linkml/modules/classes/CustodianObservation.yaml - CustodianObservation schema

Created: 2025-12-16
Updated: 2026-01-01 (Renamed class to CustodianTimelineEvent, source-agnostic design) Status: ACTIVE
Applies to: Heritage Custodian Timeline Events