# Provenance Separation Rule ## Rule 37: Domain Classes MUST NOT Contain Data-Source-Specific Provenance ### Overview Domain classes that model heritage custodian entities (events, identifiers, locations, etc.) MUST NOT contain provenance fields specific to any particular data source or API. ### Rationale 1. **Separation of Concerns**: Domain semantics (what happened) should be separate from provenance (how we know). 2. **Source Flexibility**: The same event can be discovered via multiple sources (Linkup, Wikidata, manual research). 3. **Schema Stability**: Adding new data sources should not require modifying domain classes. 4. **Provenance Reuse**: Observation classes can be reused across different domain entities. ### Rule **NEVER put data-source-specific fields in domain classes:** ```yaml # WRONG - Domain class with source-specific fields CustodianTimelineEvent: slots: - event_type - event_date - description - linkup_query # Source-specific! - linkup_answer # Source-specific! - fetch_timestamp # Source-specific! ``` **CORRECT - Separate domain and provenance:** ```yaml # Domain class - source-agnostic CustodianTimelineEvent: slots: - event_type - event_date - description - data_tier # Quality indicator (not source-specific) - observation_ref # Reference to observation (optional) # Provenance classes - source-specific WebObservation: # For web-scraped data with XPath provenance CustodianObservation: # For institutional observations LinkupObservation: # NEW - if needed for Linkup-specific provenance slots: - linkup_query - linkup_answer - source_urls - fetch_timestamp - archive_path ``` ### Application to Timeline Events The `CustodianTimelineEvent` class models organizational change events (founding, merger, dissolution, etc.) as **domain entities**. **Timeline events can be discovered from multiple sources:** | Source | Provenance Class | |--------|------------------| | Linkup API | `WebObservation` (with API-specific metadata in `extraction_notes`) | | Web scraping | `WebObservation` (with XPath provenance in `claims`) | | Wikidata SPARQL | `WebObservation` (with SPARQL query provenance) | | Manual research | `CustodianObservation` (with source document reference) | | Institutional records | `CustodianObservation` (with official source) | ### Provenance Flow ``` SourceDocument/API Response ↓ WebObservation / CustodianObservation (provenance record) ↓ CustodianTimelineEvent (domain entity) ↓ references_observation → Observation (backlink for audit) ``` ### Existing Provenance Classes Use these existing classes for different provenance needs: | Class | Purpose | Location | |-------|---------|----------| | `CustodianObservation` | Source-based evidence of custodian existence | `schemas/.../classes/CustodianObservation.yaml` | | `WebObservation` | Web retrieval provenance with claims | `schemas/.../classes/WebObservation.yaml` | | `WebClaim` | Individual claims with XPath provenance | `schemas/.../classes/WebClaim.yaml` | | `SourceDocument` | Reference to source documents | `schemas/.../classes/SourceDocument.yaml` | ### Migration Note The former `LinkupTimelineEvent` class contained Linkup-specific provenance fields. These have been moved to: - `extraction_notes` field for API-specific metadata - `archive_path` field for archived API responses - The class was renamed to `CustodianTimelineEvent` to be source-agnostic ### Data Tier Always Required Even without source-specific provenance, domain classes MUST indicate data quality: ```yaml CustodianTimelineEvent: slots: - data_tier # REQUIRED: TIER_1 through TIER_4 ``` This allows consumers to understand trustworthiness without needing source-specific knowledge. ### Examples **Founding event from Linkup:** ```yaml timeline_events: - event_type: FOUNDING event_date: "2005-04-30" date_precision: day description: "Founded on 30 April 2005" data_tier: TIER_4_INFERRED # LLM-extracted extraction_notes: | Source: Linkup API query "Drents Archief opgericht" Verified against: nl.wikipedia.org/wiki/Drents_Archief ``` **Founding event from institutional website:** ```yaml timeline_events: - event_type: FOUNDING event_date: "2005-04-30" date_precision: day description: "Founded on 30 April 2005" data_tier: TIER_2_VERIFIED # Verified from official source extraction_notes: | Source: Official website about page XPath: /html/body/div[2]/section[1]/p[3] ``` ### Related Rules - **Rule 6**: WebObservation Claims MUST Have XPath Provenance (for web-scraped claims) - **Rule 35**: Provenance Statements MUST Have Dual Timestamps - **Rule 22**: Custodian YAML Files Are the Single Source of Truth ### Related Documentation - `schemas/20251121/linkml/modules/classes/CustodianObservation.yaml` - `schemas/20251121/linkml/modules/classes/WebObservation.yaml` - `schemas/20251121/linkml/modules/classes/CustodianTimelineEvent.yaml` - `.opencode/WEB_OBSERVATION_PROVENANCE_RULES.md` --- **Created**: 2026-01-01 **Status**: ACTIVE **Applies to**: All domain classes in the Heritage Custodian Ontology