11 KiB
NDE Wikidata Enrichment Report
Date: 2025-11-17
Status: Test Batch Complete (10 records)
Next Phase: Scale to Full Dataset (1,351 records)
Executive Summary
Successfully completed Wikidata enrichment of the first 10 records from the NDE Dutch Heritage Organizations dataset. Achieved an 80% success rate (8 out of 10 organizations matched to Wikidata entities).
Key Achievements
✅ Test Batch Processed: 10 organizations from Drenthe province
✅ YAML File Updated: Added wikidata_id fields to matched records
✅ LinkML Schema Updated: Added wikidata_id and wikidata_enrichment_status fields
✅ Enrichment Logs Created: Complete SPARQL query history and results
✅ Backup Created: Full backup before modifications
Enrichment Results
Summary Statistics
| Metric | Count | Percentage |
|---|---|---|
| Total Records Processed | 10 | 100% |
| Successfully Matched | 8 | 80% |
| No Match Found | 2 | 20% |
| ISIL Codes Present | 9 | 90% |
| Museums | 3 | 30% |
| Archives | 6 | 60% |
| Historical Societies | 0 | 0% |
Detailed Results
| # | Organization | Type | Wikidata ID | Status |
|---|---|---|---|---|
| 1 | Stichting Herinneringscentrum Kamp Westerbork | museum | Q22246632 | ✓ Matched |
| 2 | Stichting Hunebedcentrum | museum | Q2679819 | ✓ Matched |
| 3 | Regionaal Historisch Centrum (RHC) Drents Archief | archief | Q1978308 | ✓ Matched |
| 4 | Stichting Drents Museum | museum | Q1258370 | ✓ Matched |
| 5 | Stichting Drents Museum De Buitenplaats | museum | - | ✗ No match |
| 6 | Gemeente Aa en Hunze | archief | Q300665 | ✓ Matched |
| 7 | Gemeente Borger-Odoorn | archief | Q835118 | ✓ Matched |
| 8 | Gemeente Coevorden | archief | Q60453 | ✓ Matched |
| 9 | Gemeente De Wolden | archief | Q835108 | ✓ Matched |
| 10 | Samenwerkingsorganisatie De Wolden/Hoogeveen | archief | - | ✗ No match |
Enrichment Methodology
Tools Used
-
Wikidata MCP Service (authenticated)
wikidata-authenticated_search_entity- Entity search by namewikidata-authenticated_get_metadata- Verification of Q-numberswikidata-authenticated_execute_sparql- Custom SPARQL queries
-
SPARQL Queries - For precise matching:
SELECT ?item ?itemLabel WHERE { ?item wdt:P31 wd:Q2039348 . # Instance of: Dutch municipality ?item rdfs:label ?label . FILTER(CONTAINS(LCASE(?label), "coevorden")) SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en". } } LIMIT 5
Search Strategy
- Initial Search: Direct entity search by organization name
- Verification: Retrieve metadata (label, description) to confirm match
- Fallback Strategy: SPARQL queries for entities not found via search
- Quality Check: Manual verification of all Q-numbers
Success Factors
- Museums with international recognition: 100% match rate (Drents Museum, Hunebedcentrum, Westerbork)
- Municipal archives: 100% match rate (all municipalities found)
- Regional archives: 100% match rate (Drents Archief)
Challenges
- Branch locations: "De Buitenplaats" (branch of Drents Museum) not in Wikidata
- Collaborative organizations: Inter-municipal partnerships not typically in Wikidata
- Specialized organizations: Small local societies less likely to have Wikidata entries
Technical Implementation
YAML File Structure Update
Before Enrichment:
- plaatsnaam_bezoekadres: Assen
straat_en_huisnummer_bezoekadres: Brink 1
organisatie: Stichting Drents Museum
webadres_organisatie: https://drentsmuseum.nl/
type_organisatie: museum
isil-code_na: NL-AsnDM
After Enrichment:
- plaatsnaam_bezoekadres: Assen
straat_en_huisnummer_bezoekadres: Brink 1
organisatie: Stichting Drents Museum
webadres_organisatie: https://drentsmuseum.nl/
type_organisatie: museum
isil-code_na: NL-AsnDM
wikidata_id: Q1258370 # ← NEW FIELD
Records with No Match:
- plaatsnaam_bezoekadres: Eelde
organisatie: Stichting Drents Museum De Buitenplaats
type_organisatie: museum
wikidata_enrichment_status: no_match_found # ← Status tracking
LinkML Schema Updates
Added two new slots to nde_yaml_target.yaml:
slots:
wikidata_id:
description: Wikidata entity identifier (Q-number)
range: string
required: false
pattern: "^Q[0-9]+$"
comments:
- "Added through Wikidata enrichment process"
- "Links organization to Wikidata knowledge graph"
wikidata_enrichment_status:
description: Status of Wikidata enrichment process
range: string
required: false
comments:
- "Values: 'no_match_found', 'pending', 'verified', 'manual_review_needed'"
Enrichment Log Files
All queries logged in /data/nde/sparql/:
- Prepared queries: 10 JSON files with search parameters
- Master query log: Consolidated history of all SPARQL queries
- Enrichment log: Results summary with timestamps and success rates
Data Quality Assessment
Match Confidence Levels
| Wikidata ID | Organization | Confidence | Verification Method |
|---|---|---|---|
| Q22246632 | Herinneringscentrum Kamp Westerbork | High | Direct match, ISIL verified |
| Q2679819 | Hunebedcentrum | High | Direct match, exact name |
| Q1978308 | Drents Archief | High | Direct match, ISIL verified |
| Q1258370 | Drents Museum | High | Direct match, ISIL verified |
| Q300665 | Gemeente Aa en Hunze | High | SPARQL query, verified label |
| Q835118 | Gemeente Borger-Odoorn | High | SPARQL query, verified label |
| Q60453 | Gemeente Coevorden | High | SPARQL query, exact match |
| Q835108 | Gemeente De Wolden | High | SPARQL query, exact match |
All matches verified with wikidata-authenticated_get_metadata to confirm:
- ✅ Correct Dutch label
- ✅ Appropriate entity description
- ✅ Geographic location (municipality in Drenthe)
No-Match Analysis
1. Stichting Drents Museum De Buitenplaats
- Reason: Branch location/extension of main museum
- Parent Institution: Stichting Drents Museum (Q1258370)
- Recommendation: Manual Wikidata entry creation OR link to parent institution
- Website: https://dmdebuitenplaats.nl/
2. Samenwerkingsorganisatie De Wolden/Hoogeveen
- Reason: Inter-municipal administrative partnership
- Nature: Shared services organization
- Recommendation: May not warrant Wikidata entry (administrative entity, not public-facing heritage institution)
- ISIL Code: NL-HgvSWO (indicates archival function)
Next Steps
Immediate Tasks (Complete by Next Session)
-
Scale to Full Dataset (1,341 remaining records)
- Batch processing script with rate limiting
- Estimated time: 2-3 hours
- Wikidata API limits: 1,000 requests/hour (authenticated)
-
Handle Ambiguous Matches
- Fuzzy matching threshold: 85% similarity
- Manual review queue for low-confidence matches
- Document decision criteria
-
Manual Wikidata Entry Creation
- Identify high-priority organizations without Q-numbers
- Create Wikidata entries for significant institutions
- Follow Wikidata notability guidelines
Quality Assurance
- Verify all Q-numbers resolve to correct entities
- Check for duplicate Q-numbers across dataset
- Validate ISIL code alignment with Wikidata identifiers
- Cross-reference with existing Wikidata ISIL properties (P791)
Documentation Updates
- Update
data/nde/README.mdwith enrichment status - Create batch processing statistics report
- Document manual review decisions
- Update LinkML schema version to v0.2.1
Integration with Main GLAM Project
- Map NDE organizations to
HeritageCustodianclass - Generate GHCIDs for all organizations
- Export to JSON-LD with Wikidata links
- Validate against main project LinkML schema
Lessons Learned
What Worked Well
✅ SPARQL queries: Highly effective for municipalities and government organizations
✅ Metadata verification: Prevented false positives
✅ Incremental approach: Test batch revealed patterns before scaling
✅ Logging everything: Complete audit trail for reproducibility
What Needs Improvement
⚠️ Branch locations: Need strategy for sub-organizations
⚠️ Collaborative entities: May require manual curation
⚠️ Historical societies: Lower Wikidata coverage, may need enrichment workflow
Recommendations for Full Dataset
- Batch by organization type: Process museums → archives → libraries → societies separately
- ISIL code leverage: Use ISIL codes (P791) in SPARQL queries for higher precision
- Municipality fallback: For municipal archives, always search for municipality entity
- Manual review threshold: Flag matches with confidence < 85% for human verification
Files Modified
Updated Files
/data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml(259 KB → 259 KB)/data/nde/linkml/nde_yaml_target.yaml(223 lines → 242 lines)
New Files Created
/scripts/update_nde_yaml_with_wikidata_test_batch.py(enrichment script)/data/nde/sparql/enrichment_log_test_batch_20251117_115941.json(results log)/data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.backup.20251117_115940.yaml(backup)
Existing Files
- 10 prepared SPARQL query JSONs (from previous session)
- Master query log (from previous session)
Appendix: Sample Wikidata Entries
Example 1: Museum (Herinneringscentrum Kamp Westerbork)
Wikidata ID: Q22246632
Label (NL): Herinneringscentrum Kamp Westerbork
Description: memorial center in Hooghalen, Netherlands
Properties:
- P31 (instance of): Q33506 (museum)
- P17 (country): Q55 (Netherlands)
- P131 (located in): Q242103 (Midden-Drenthe)
- P856 (official website): https://kampwesterbork.nl/
- P791 (ISIL): NL-HhlHCKW ← Matches our data!
Example 2: Archive (Regionaal Historisch Centrum Drents Archief)
Wikidata ID: Q1978308
Label (NL): Drents Archief
Description: regional archive in Assen, Netherlands
Properties:
- P31 (instance of): Q7075 (library) + Q166118 (archive)
- P17 (country): Q55 (Netherlands)
- P131 (located in): Q10013 (Assen)
- P856 (official website): https://www.drentsarchief.nl/
- P791 (ISIL): NL-AsnDA ← Matches our data!
Example 3: Municipality (Gemeente Coevorden)
Wikidata ID: Q60453
Label (NL): Coevorden
Description: gemeente in Drenthe, Nederland
Properties:
- P31 (instance of): Q2039348 (Dutch municipality)
- P17 (country): Q55 (Netherlands)
- P131 (located in): Q770 (Drenthe)
- P856 (official website): https://www.coevorden.nl/
Contact & Support
Project Lead: GLAM Data Extraction Project
Repository: /Users/kempersc/apps/glam
Documentation: /docs/NDE_WIKIDATA_ENRICHMENT_REPORT.md
Questions: See AGENTS.md for AI agent instructions and workflows
End of Report