# Schema v0.2.2 Changelog **Release Date**: 2025-11-10 **Previous Version**: v0.2.1 ## Summary Schema v0.2.2 introduces **structured enrichment history tracking** to replace unstructured `provenance.notes` strings. This enhancement provides machine-readable, queryable metadata for data quality activities (Wikidata enrichment, geocoding, identifier verification, etc.) with full ontology alignment. ## Changes ### New Classes #### `EnrichmentHistoryEntry` (schemas/provenance.yaml) Tracks individual data enrichment activities with structured metadata: ```yaml EnrichmentHistoryEntry: slots: - enrichment_date # When enrichment performed (datetime, required) - enrichment_method # Method used (string, required) - enrichment_type # Type of enrichment (EnrichmentTypeEnum, required) - match_score # Fuzzy match confidence 0.0-1.0 (float, optional) - verified # Manual verification status (boolean, required) - enrichment_source # Data source URI (uri, optional) - enrichment_notes # Human-readable details (string, optional) ``` **Ontology Mappings**: - `enrichment_date` → `prov:atTime` (PROV-O) - `enrichment_method` → `prov:hadPlan` (PROV-O) - `enrichment_type` → `rdf:type` (RDF) - `match_score` → `adms:confidence` (ADMS) - `verified` → `adms:status` (ADMS) - `enrichment_source` → `dcterms:source` (Dublin Core) - `enrichment_notes` → `dcterms:description` (Dublin Core) ### New Enumerations #### `EnrichmentTypeEnum` (schemas/enums.yaml) 15 controlled vocabulary values for enrichment activity types: 1. `WIKIDATA_IDENTIFIER` - Wikidata Q-number added 2. `GEOCODING` - Lat/lon coordinates added 3. `VIAF_IDENTIFIER` - VIAF identifier added 4. `ISIL_CODE` - ISIL code assigned 5. `GHCID_GENERATION` - GHCID identifier generated 6. `FALSE_POSITIVE_REMOVAL` - Incorrect enrichment removed 7. `NAME_NORMALIZATION` - Institution name normalized 8. `IDENTIFIER_VERIFICATION` - Existing identifier verified 9. `INSTITUTION_TYPE_CLASSIFICATION` - Institution type classified 10. `ADDRESS_STANDARDIZATION` - Physical address standardized 11. `WEBSITE_URL_VALIDATION` - Website URL validated 12. `COLLECTION_METADATA` - Collection metadata added 13. `ORGANIZATIONAL_RELATIONSHIP` - Org relationships identified 14. `DIGITAL_PLATFORM_DETECTION` - Digital platforms identified 15. `OTHER` - Other enrichment activity ### Modified Classes #### `Provenance` (schemas/provenance.yaml) Added new slot: ```yaml enrichment_history: range: EnrichmentHistoryEntry multivalued: true inlined_as_list: true description: >- Chronological log of data enrichment activities performed on this record ``` **Note**: `provenance.notes` field remains for backward compatibility but is **deprecated**. Use `enrichment_history` for new data. ### New Ontology Prefixes Added to support enrichment metadata: - `foaf:` - Friend of a Friend (agent/contact information) - `adms:` - Asset Description Metadata Schema (verification/confidence) ## Benefits ### Before (v0.2.1): Unstructured Notes ```yaml provenance: notes: "Wikidata enriched 2025-11-10 (Q3330723, match: 100%). Geocoded to (36.806495, 10.181532) via Nominatim." ``` **Problems**: - ❌ Hard to parse programmatically - ❌ Not queryable (can't filter by type, date, confidence) - ❌ No ontology alignment - ❌ Mixed concerns (multiple activities in one string) ### After (v0.2.2): Structured History ```yaml provenance: enrichment_history: - enrichment_date: "2025-11-10T14:30:00Z" enrichment_method: "Wikidata SPARQL fuzzy matching" enrichment_type: WIKIDATA_IDENTIFIER match_score: 1.0 verified: true enrichment_source: "https://www.wikidata.org" enrichment_notes: "Perfect name match, city verified: Tunis" - enrichment_date: "2025-11-10T14:35:00Z" enrichment_method: "Nominatim geocoding API" enrichment_type: GEOCODING match_score: 0.95 verified: false enrichment_source: "https://nominatim.openstreetmap.org" enrichment_notes: "Geocoded from city name" ``` **Benefits**: - ✅ Machine-readable structured data - ✅ Queryable (filter by type, confidence, verification status) - ✅ Ontology-aligned (PROV-O, ADMS, DCTerms, FOAF) - ✅ Separation of concerns (one entry per activity) - ✅ Chronological audit log ## Query Examples ```python # Find all unverified enrichments needing manual review unverified = [ e for e in institution['provenance']['enrichment_history'] if not e['verified'] ] # Find low-confidence enrichments (< 0.85) low_confidence = [ e for e in institution['provenance']['enrichment_history'] if e['match_score'] and e['match_score'] < 0.85 ] # Count enrichments by type from collections import Counter type_counts = Counter( e['enrichment_type'] for e in institution['provenance']['enrichment_history'] ) # Timeline of enrichment activities timeline = sorted( institution['provenance']['enrichment_history'], key=lambda e: e['enrichment_date'] ) ``` ## Migration Requirements Existing instances with `provenance.notes` strings need migration: 1. **Parse notes patterns**: - `"Wikidata enriched YYYY-MM-DD (Qnumber, match: XX%)"` - `"Geocoded to (lat, lon) via Service"` - `"False Wikidata match Qnumber removed YYYY-MM-DD"` 2. **Extract structured data**: - Date, method, type, match score - Convert to `EnrichmentHistoryEntry` objects 3. **Migration script**: `scripts/migrate_enrichment_notes_to_history.py` (TO BE CREATED) ## Files Modified 1. **`schemas/provenance.yaml`** - Version: 0.2.1 → 0.2.2 - Added `EnrichmentHistoryEntry` class (7 slots) - Added `enrichment_history` slot to `Provenance` class - Added ontology mappings (PROV-O, ADMS, FOAF, DCTerms) 2. **`schemas/enums.yaml`** - Version: 0.2.1 → 0.2.2 - Added `EnrichmentTypeEnum` (15 values) 3. **`schemas/heritage_custodian.yaml`** - Version: 0.2.1 → 0.2.2 - Version bump to match module versions ## Backward Compatibility - ✅ **Fully backward compatible** - `provenance.notes` field remains available (deprecated) - Existing instances continue to work without changes - New instances should use `enrichment_history` ## Testing Demonstration script: `scripts/demo_enrichment_history.py` ```bash python scripts/demo_enrichment_history.py ``` Shows before/after comparison, query examples, and ontology mappings. ## Next Steps 1. ✅ **Schema enhancement complete** (v0.2.2) 2. ⏳ **Create migration script** for existing instances 3. ⏳ **Test with Phase 3** (Chile enrichment workflow) 4. ⏳ **Update data quality reports** to query `enrichment_history` 5. ⏳ **Update RDF exporter** to serialize enrichment metadata with PROV-O/ADMS ## Related Documentation - **Schema Modules**: `/docs/SCHEMA_MODULES.md` - **Ontology Extensions**: `/docs/ONTOLOGY_EXTENSIONS.md` (to be updated) - **Phase 2 Completion Report**: `/data/instances/north_africa/PHASE2_COMPLETION_REPORT.md` - **Agent Instructions**: `/AGENTS.md` (to be updated) ## Contributors - Schema design: OpenCode AI Agent - Ontology alignment: Based on W3C PROV-O, ADMS, Dublin Core - Testing: Demonstration script with query examples --- **Schema Version**: v0.2.2 **Release**: 2025-11-10 **Status**: Production-ready