# Timeline Event Provenance Policy

## Overview

This document clarifies the provenance model for `CustodianTimelineEvent` data (renamed from `LinkupTimelineEvent` in January 2026).

**Key Change**: The class is now source-agnostic. Detailed provenance about the data source (Linkup, Wikidata, web scraping, etc.) belongs in observation classes, not in the event itself.

## Architectural Principle: Provenance Separation (Rule 37)

Domain classes model WHAT happened, not HOW we know:

| Layer | Purpose | Classes |
|-------|---------|---------|
| **Domain** | What happened (events, entities) | `CustodianTimelineEvent` |
| **Observation** | How we observed it (provenance) | `WebObservation`, `CustodianObservation` |

See `.opencode/PROVENANCE_SEPARATION_RULE.md` for the full rule.

## Rule 6 Scope Clarification

**AGENTS.md Rule 6** ("WebObservation Claims MUST Have XPath Provenance") applies ONLY to:
- `WebClaim` class
- `WebObservation` class  
- `PersonWebClaim` class

**Rule 6 does NOT apply to**:
- `CustodianTimelineEvent` class (source-agnostic design)
- `WikidataEnrichment` (uses entity URI provenance)
- Other API-based enrichments

## CustodianTimelineEvent Provenance Model

The `CustodianTimelineEvent` class uses **source-agnostic provenance fields**:

### Required Fields

| Field | Purpose |
|-------|---------|
| `event_type` | What kind of event (FOUNDING, MERGER, etc.) |
| `date_precision` | How specific is the date (day, year, decade) |
| `approximate` | Is the date approximate (circa, roughly) |
| `description` | Human-readable summary of the event |
| `extraction_method` | How was the event discovered |
| `extraction_timestamp` | When was the event extracted |
| `data_tier` | Quality tier (TIER_1 to TIER_4) |

### Optional Fields

| Field | Purpose |
|-------|---------|
| `event_date` | When the event occurred (if known) |
| `source_urls` | URLs documenting the event |
| `extraction_notes` | Free-text notes for source-specific details |
| `archive_path` | Path to archived source material |
| `observation_ref` | Link to observation class for detailed provenance |

### Extraction Methods

The `TimelineExtractionMethodEnum` covers various sources:

| Method | Description |
|--------|-------------|
| `api_response_regex` | Date extracted via regex from API response |
| `api_response_llm` | Date extracted using LLM analysis |
| `web_scrape_xpath` | Date extracted via XPath from archived HTML |
| `wikidata_sparql` | Date extracted from Wikidata SPARQL |
| `manual_research` | Event discovered through manual research |
| `manual_verification` | Event manually verified and corrected |

## Source-Specific Details in extraction_notes

Use the `extraction_notes` field to capture source-specific details that don't fit elsewhere:

### For API-sourced data (e.g., Linkup)
```yaml
extraction_notes: |
  Query: "Drents Archief" Assen opgericht OR gesticht
  API: Linkup. Answer: "Het RHC Drents Archief werd opgericht op 30 april 2005..."
  Sources cited: nl.wikipedia.org, bizzy.ai
archive_path: web/0002/linkup/linkup_founding_20251215T160438Z.json
```

### For web-scraped data
```yaml
extraction_notes: |
  XPath: /html/body/main/section[2]/div/p[3]
  Source page: https://www.rijksmuseum.nl/en/about-us/history
archive_path: web/0001/rijksmuseum.nl/about-us/rendered.html
```

### For Wikidata-sourced data
```yaml
extraction_notes: |
  Wikidata: Q190804
  Property: P571 (inception date)
  SPARQL timestamp: 2025-12-20T14:30:00Z
```

## Linking to Observation Classes

For detailed provenance, use `observation_ref` to link to a `WebObservation`:

```yaml
timeline_events:
  - event_type: FOUNDING
    event_date: "2005-04-30"
    # ... other fields ...
    observation_ref: "https://nde.nl/ontology/hc/observation/web/2025-12-15/drents-archief"
```

The referenced `WebObservation` contains:
- Full API response details
- XPath provenance (if applicable)
- HTTP response metadata
- Archived content hash

## Data Quality Tiers

Events should have their `data_tier` set appropriately:

| Tier | Description | Typical Source |
|------|-------------|----------------|
| `TIER_4_INFERRED` | Unverified, possibly from LLM | Initial API extraction |
| `TIER_3_CROWD_SOURCED` | Verified against Wikipedia/Wikidata | Cross-referenced |
| `TIER_2_VERIFIED` | Verified against institutional website | Official source |
| `TIER_1_AUTHORITATIVE` | Verified against official registry | Government records |

## Migration from LinkupTimelineEvent

The following fields were removed from the class (use alternatives):

| Old Field | Migration Path |
|-----------|----------------|
| `linkup_query` | Put in `extraction_notes` |
| `linkup_answer` | Put in `extraction_notes` |
| `fetch_timestamp` | Use `extraction_timestamp` |
| `LinkupExtractionMethodEnum` | Use `TimelineExtractionMethodEnum` |

Data files with `timeline_enrichment.timeline_events` continue to work - the events are now instances of `CustodianTimelineEvent`.

## Current Statistics (January 2026)

- **Total events**: ~1,199 across ~862 custodian files
- **Event types**: FOUNDING (927), TRANSFER (190), MERGER (57), DISSOLUTION (10), RENAMING (10)
- **Data tier**: Mostly TIER_4_INFERRED (pending verification)

## Schema Reference

The formal schema is defined in:
- `schemas/20251121/linkml/modules/classes/CustodianTimelineEvent.yaml`

## Related Documentation

- `AGENTS.md` Rule 6 - WebObservation XPath requirements
- `.opencode/PROVENANCE_SEPARATION_RULE.md` - Rule 37 on provenance separation
- `.opencode/WEB_OBSERVATION_PROVENANCE_RULES.md` - WebClaim details
- `schemas/20251121/linkml/modules/classes/WebObservation.yaml` - WebObservation schema
- `schemas/20251121/linkml/modules/classes/CustodianObservation.yaml` - CustodianObservation schema

---

**Created**: 2025-12-16  
**Updated**: 2026-01-01 (Renamed class to CustodianTimelineEvent, source-agnostic design)
**Status**: ACTIVE  
**Applies to**: Heritage Custodian Timeline Events