4.1 KiB
4.1 KiB
Linkup Provenance Policy
Overview
This document clarifies the provenance requirements for LinkupTimelineEvent data, explaining why it differs from WebClaim provenance.
Rule 6 Scope Clarification
AGENTS.md Rule 6 ("WebObservation Claims MUST Have XPath Provenance") applies ONLY to:
WebClaimclassWebObservationclassPersonWebClaimclass
Rule 6 does NOT apply to:
LinkupTimelineEventclass (has its own provenance model)WikidataEnrichment(uses entity URI provenance)- Other API-based enrichments
Why Linkup Provenance is Different
Data Flow Comparison
WebClaim (XPath required):
Webpage HTML → Archive to file → Parse HTML → Extract XPath → Claim value
↓ ↓
Verifiable: Can check XPath points to value in HTML
LinkupTimelineEvent (API provenance):
Query → Linkup API → LLM Answer + Source URLs → Archive JSON → Regex Extract → Event
↓
NOT directly verifiable: LLM may hallucinate, sources may be misquoted
Fundamental Difference
| Aspect | WebClaim | LinkupTimelineEvent |
|---|---|---|
| Source | HTML file (static) | LLM answer (generated) |
| Verification | Automated (XPath lookup) | Manual (check source_urls) |
| Trust Model | High (direct extraction) | Low (LLM intermediary) |
| Data Tier | TIER_2 or higher | TIER_4_INFERRED (always) |
LinkupTimelineEvent Provenance Requirements
All events MUST have these fields (per schema LinkupTimelineEvent.yaml):
Required Fields
| Field | Purpose |
|---|---|
linkup_query |
The exact query sent to API (reproducibility) |
linkup_answer |
Full LLM response (audit trail) |
fetch_timestamp |
When API was called |
archive_path |
Path to archived JSON (evidence) |
extraction_method |
How event was extracted |
extraction_timestamp |
When extraction occurred |
data_tier |
Always TIER_4_INFERRED initially |
Optional but Recommended
| Field | Purpose |
|---|---|
source_urls |
URLs cited by Linkup for manual verification |
Verification Pathway
Timeline events can be promoted from TIER_4 to higher tiers through verification:
TIER_4_INFERRED (initial)
↓ Verify against source_urls
TIER_3_CROWD_SOURCED (if verified against Wikipedia)
↓ Verify against institutional website
TIER_2_VERIFIED (if institutional source confirms)
↓ Verify against official registry/document
TIER_1_AUTHORITATIVE (rare for events)
Implementation
Current Statistics (December 2025)
- Total events: 1,199 across 862 custodian files
- Event types: FOUNDING (927), TRANSFER (190), MERGER (57), DISSOLUTION (10), RENAMING (10)
- Data tier: 100% TIER_4_INFERRED
- Provenance fields: 100% complete
Archived JSON Location
All Linkup API responses are archived at:
data/custodian/web/{entry_number}/linkup/linkup_{event_type}_{timestamp}.json
Example:
data/custodian/web/1071/linkup/linkup_founding_20251215T215802Z.json
Schema Reference
The formal schema is defined in:
schemas/20251121/linkml/modules/classes/LinkupTimelineEvent.yaml
Key documentation in schema (lines 6-15):
# Key principle:
# Linkup API returns LLM-generated answers with source URLs, not XPath locations.
# Therefore, provenance is different from WebClaim:
# - Store the query that was sent to Linkup
# - Store the LLM answer (which may contain hallucinations)
# - Store source URLs (for manual verification)
# - Archive the complete API response JSON
#
# This acknowledges that Linkup data is TIER_4_INFERRED (LLM-generated)
# and requires manual verification before promotion to higher tiers.
Related Documentation
AGENTS.mdRule 6 - WebObservation XPath requirements.opencode/WEB_OBSERVATION_PROVENANCE_RULES.md- WebClaim detailsschemas/20251121/linkml/modules/classes/WebClaim.yaml- WebClaim schemaschemas/20251121/linkml/modules/classes/LinkupTimelineEvent.yaml- LinkupTimelineEvent schema
Created: 2025-12-16
Status: ACTIVE
Applies to: Dutch GLAM Timeline Event Enrichment project