133 lines
4.1 KiB
Markdown
133 lines
4.1 KiB
Markdown
# Linkup Provenance Policy
|
|
|
|
## Overview
|
|
|
|
This document clarifies the provenance requirements for `LinkupTimelineEvent` data, explaining why it differs from `WebClaim` provenance.
|
|
|
|
## Rule 6 Scope Clarification
|
|
|
|
**AGENTS.md Rule 6** ("WebObservation Claims MUST Have XPath Provenance") applies ONLY to:
|
|
- `WebClaim` class
|
|
- `WebObservation` class
|
|
- `PersonWebClaim` class
|
|
|
|
**Rule 6 does NOT apply to**:
|
|
- `LinkupTimelineEvent` class (has its own provenance model)
|
|
- `WikidataEnrichment` (uses entity URI provenance)
|
|
- Other API-based enrichments
|
|
|
|
## Why Linkup Provenance is Different
|
|
|
|
### Data Flow Comparison
|
|
|
|
**WebClaim** (XPath required):
|
|
```
|
|
Webpage HTML → Archive to file → Parse HTML → Extract XPath → Claim value
|
|
↓ ↓
|
|
Verifiable: Can check XPath points to value in HTML
|
|
```
|
|
|
|
**LinkupTimelineEvent** (API provenance):
|
|
```
|
|
Query → Linkup API → LLM Answer + Source URLs → Archive JSON → Regex Extract → Event
|
|
↓
|
|
NOT directly verifiable: LLM may hallucinate, sources may be misquoted
|
|
```
|
|
|
|
### Fundamental Difference
|
|
|
|
| Aspect | WebClaim | LinkupTimelineEvent |
|
|
|--------|----------|---------------------|
|
|
| **Source** | HTML file (static) | LLM answer (generated) |
|
|
| **Verification** | Automated (XPath lookup) | Manual (check source_urls) |
|
|
| **Trust Model** | High (direct extraction) | Low (LLM intermediary) |
|
|
| **Data Tier** | TIER_2 or higher | TIER_4_INFERRED (always) |
|
|
|
|
## LinkupTimelineEvent Provenance Requirements
|
|
|
|
All events MUST have these fields (per schema `LinkupTimelineEvent.yaml`):
|
|
|
|
### Required Fields
|
|
|
|
| Field | Purpose |
|
|
|-------|---------|
|
|
| `linkup_query` | The exact query sent to API (reproducibility) |
|
|
| `linkup_answer` | Full LLM response (audit trail) |
|
|
| `fetch_timestamp` | When API was called |
|
|
| `archive_path` | Path to archived JSON (evidence) |
|
|
| `extraction_method` | How event was extracted |
|
|
| `extraction_timestamp` | When extraction occurred |
|
|
| `data_tier` | Always `TIER_4_INFERRED` initially |
|
|
|
|
### Optional but Recommended
|
|
|
|
| Field | Purpose |
|
|
|-------|---------|
|
|
| `source_urls` | URLs cited by Linkup for manual verification |
|
|
|
|
## Verification Pathway
|
|
|
|
Timeline events can be promoted from TIER_4 to higher tiers through verification:
|
|
|
|
```
|
|
TIER_4_INFERRED (initial)
|
|
↓ Verify against source_urls
|
|
TIER_3_CROWD_SOURCED (if verified against Wikipedia)
|
|
↓ Verify against institutional website
|
|
TIER_2_VERIFIED (if institutional source confirms)
|
|
↓ Verify against official registry/document
|
|
TIER_1_AUTHORITATIVE (rare for events)
|
|
```
|
|
|
|
## Implementation
|
|
|
|
### Current Statistics (December 2025)
|
|
|
|
- **Total events**: 1,199 across 862 custodian files
|
|
- **Event types**: FOUNDING (927), TRANSFER (190), MERGER (57), DISSOLUTION (10), RENAMING (10)
|
|
- **Data tier**: 100% TIER_4_INFERRED
|
|
- **Provenance fields**: 100% complete
|
|
|
|
### Archived JSON Location
|
|
|
|
All Linkup API responses are archived at:
|
|
```
|
|
data/custodian/web/{entry_number}/linkup/linkup_{event_type}_{timestamp}.json
|
|
```
|
|
|
|
Example:
|
|
```
|
|
data/custodian/web/1071/linkup/linkup_founding_20251215T215802Z.json
|
|
```
|
|
|
|
## Schema Reference
|
|
|
|
The formal schema is defined in:
|
|
- `schemas/20251121/linkml/modules/classes/LinkupTimelineEvent.yaml`
|
|
|
|
Key documentation in schema (lines 6-15):
|
|
```yaml
|
|
# Key principle:
|
|
# Linkup API returns LLM-generated answers with source URLs, not XPath locations.
|
|
# Therefore, provenance is different from WebClaim:
|
|
# - Store the query that was sent to Linkup
|
|
# - Store the LLM answer (which may contain hallucinations)
|
|
# - Store source URLs (for manual verification)
|
|
# - Archive the complete API response JSON
|
|
#
|
|
# This acknowledges that Linkup data is TIER_4_INFERRED (LLM-generated)
|
|
# and requires manual verification before promotion to higher tiers.
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
- `AGENTS.md` Rule 6 - WebObservation XPath requirements
|
|
- `.opencode/WEB_OBSERVATION_PROVENANCE_RULES.md` - WebClaim details
|
|
- `schemas/20251121/linkml/modules/classes/WebClaim.yaml` - WebClaim schema
|
|
- `schemas/20251121/linkml/modules/classes/LinkupTimelineEvent.yaml` - LinkupTimelineEvent schema
|
|
|
|
---
|
|
|
|
**Created**: 2025-12-16
|
|
**Status**: ACTIVE
|
|
**Applies to**: Dutch GLAM Timeline Event Enrichment project
|