7.2 KiB
Schema v0.2.2 Changelog
Release Date: 2025-11-10
Previous Version: v0.2.1
Summary
Schema v0.2.2 introduces structured enrichment history tracking to replace unstructured provenance.notes strings. This enhancement provides machine-readable, queryable metadata for data quality activities (Wikidata enrichment, geocoding, identifier verification, etc.) with full ontology alignment.
Changes
New Classes
EnrichmentHistoryEntry (schemas/provenance.yaml)
Tracks individual data enrichment activities with structured metadata:
EnrichmentHistoryEntry:
slots:
- enrichment_date # When enrichment performed (datetime, required)
- enrichment_method # Method used (string, required)
- enrichment_type # Type of enrichment (EnrichmentTypeEnum, required)
- match_score # Fuzzy match confidence 0.0-1.0 (float, optional)
- verified # Manual verification status (boolean, required)
- enrichment_source # Data source URI (uri, optional)
- enrichment_notes # Human-readable details (string, optional)
Ontology Mappings:
enrichment_date→prov:atTime(PROV-O)enrichment_method→prov:hadPlan(PROV-O)enrichment_type→rdf:type(RDF)match_score→adms:confidence(ADMS)verified→adms:status(ADMS)enrichment_source→dcterms:source(Dublin Core)enrichment_notes→dcterms:description(Dublin Core)
New Enumerations
EnrichmentTypeEnum (schemas/enums.yaml)
15 controlled vocabulary values for enrichment activity types:
WIKIDATA_IDENTIFIER- Wikidata Q-number addedGEOCODING- Lat/lon coordinates addedVIAF_IDENTIFIER- VIAF identifier addedISIL_CODE- ISIL code assignedGHCID_GENERATION- GHCID identifier generatedFALSE_POSITIVE_REMOVAL- Incorrect enrichment removedNAME_NORMALIZATION- Institution name normalizedIDENTIFIER_VERIFICATION- Existing identifier verifiedINSTITUTION_TYPE_CLASSIFICATION- Institution type classifiedADDRESS_STANDARDIZATION- Physical address standardizedWEBSITE_URL_VALIDATION- Website URL validatedCOLLECTION_METADATA- Collection metadata addedORGANIZATIONAL_RELATIONSHIP- Org relationships identifiedDIGITAL_PLATFORM_DETECTION- Digital platforms identifiedOTHER- Other enrichment activity
Modified Classes
Provenance (schemas/provenance.yaml)
Added new slot:
enrichment_history:
range: EnrichmentHistoryEntry
multivalued: true
inlined_as_list: true
description: >-
Chronological log of data enrichment activities performed on this record
Note: provenance.notes field remains for backward compatibility but is deprecated. Use enrichment_history for new data.
New Ontology Prefixes
Added to support enrichment metadata:
foaf:- Friend of a Friend (agent/contact information)adms:- Asset Description Metadata Schema (verification/confidence)
Benefits
Before (v0.2.1): Unstructured Notes
provenance:
notes: "Wikidata enriched 2025-11-10 (Q3330723, match: 100%). Geocoded to (36.806495, 10.181532) via Nominatim."
Problems:
- ❌ Hard to parse programmatically
- ❌ Not queryable (can't filter by type, date, confidence)
- ❌ No ontology alignment
- ❌ Mixed concerns (multiple activities in one string)
After (v0.2.2): Structured History
provenance:
enrichment_history:
- enrichment_date: "2025-11-10T14:30:00Z"
enrichment_method: "Wikidata SPARQL fuzzy matching"
enrichment_type: WIKIDATA_IDENTIFIER
match_score: 1.0
verified: true
enrichment_source: "https://www.wikidata.org"
enrichment_notes: "Perfect name match, city verified: Tunis"
- enrichment_date: "2025-11-10T14:35:00Z"
enrichment_method: "Nominatim geocoding API"
enrichment_type: GEOCODING
match_score: 0.95
verified: false
enrichment_source: "https://nominatim.openstreetmap.org"
enrichment_notes: "Geocoded from city name"
Benefits:
- ✅ Machine-readable structured data
- ✅ Queryable (filter by type, confidence, verification status)
- ✅ Ontology-aligned (PROV-O, ADMS, DCTerms, FOAF)
- ✅ Separation of concerns (one entry per activity)
- ✅ Chronological audit log
Query Examples
# Find all unverified enrichments needing manual review
unverified = [
e for e in institution['provenance']['enrichment_history']
if not e['verified']
]
# Find low-confidence enrichments (< 0.85)
low_confidence = [
e for e in institution['provenance']['enrichment_history']
if e['match_score'] and e['match_score'] < 0.85
]
# Count enrichments by type
from collections import Counter
type_counts = Counter(
e['enrichment_type']
for e in institution['provenance']['enrichment_history']
)
# Timeline of enrichment activities
timeline = sorted(
institution['provenance']['enrichment_history'],
key=lambda e: e['enrichment_date']
)
Migration Requirements
Existing instances with provenance.notes strings need migration:
-
Parse notes patterns:
"Wikidata enriched YYYY-MM-DD (Qnumber, match: XX%)""Geocoded to (lat, lon) via Service""False Wikidata match Qnumber removed YYYY-MM-DD"
-
Extract structured data:
- Date, method, type, match score
- Convert to
EnrichmentHistoryEntryobjects
-
Migration script:
scripts/migrate_enrichment_notes_to_history.py(TO BE CREATED)
Files Modified
-
schemas/provenance.yaml- Version: 0.2.1 → 0.2.2
- Added
EnrichmentHistoryEntryclass (7 slots) - Added
enrichment_historyslot toProvenanceclass - Added ontology mappings (PROV-O, ADMS, FOAF, DCTerms)
-
schemas/enums.yaml- Version: 0.2.1 → 0.2.2
- Added
EnrichmentTypeEnum(15 values)
-
schemas/heritage_custodian.yaml- Version: 0.2.1 → 0.2.2
- Version bump to match module versions
Backward Compatibility
- ✅ Fully backward compatible
provenance.notesfield remains available (deprecated)- Existing instances continue to work without changes
- New instances should use
enrichment_history
Testing
Demonstration script: scripts/demo_enrichment_history.py
python scripts/demo_enrichment_history.py
Shows before/after comparison, query examples, and ontology mappings.
Next Steps
- ✅ Schema enhancement complete (v0.2.2)
- ⏳ Create migration script for existing instances
- ⏳ Test with Phase 3 (Chile enrichment workflow)
- ⏳ Update data quality reports to query
enrichment_history - ⏳ Update RDF exporter to serialize enrichment metadata with PROV-O/ADMS
Related Documentation
- Schema Modules:
/docs/SCHEMA_MODULES.md - Ontology Extensions:
/docs/ONTOLOGY_EXTENSIONS.md(to be updated) - Phase 2 Completion Report:
/data/instances/north_africa/PHASE2_COMPLETION_REPORT.md - Agent Instructions:
/AGENTS.md(to be updated)
Contributors
- Schema design: OpenCode AI Agent
- Ontology alignment: Based on W3C PROV-O, ADMS, Dublin Core
- Testing: Demonstration script with query examples
Schema Version: v0.2.2
Release: 2025-11-10
Status: Production-ready