4.7 KiB
4.7 KiB
NDE Wikidata Enrichment - Quick Resume Guide
Last Updated: 2025-11-17
Status: Test batch complete, ready to scale
Current State
- ✅ 10 records enriched from Drenthe province
- ✅ 80% success rate (8 matched, 2 no-match)
- ✅ YAML file updated with
wikidata_idfields - ✅ LinkML schema updated
- ✅ All queries logged
- 📊 1,341 records remaining (99.3% of dataset)
Files Location
/Users/kempersc/apps/glam/
├── data/nde/
│ ├── voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml # Enriched data
│ ├── linkml/nde_yaml_target.yaml # Updated schema
│ └── sparql/ # Query logs
├── scripts/
│ ├── update_nde_yaml_with_wikidata_test_batch.py # Test batch (done)
│ └── enrich_nde_with_wikidata.py # Full dataset (ready)
└── docs/
├── NDE_WIKIDATA_ENRICHMENT_REPORT.md # Full report
└── sessions/SESSION_SUMMARY_20251117_NDE_WIKIDATA_TEST_BATCH.md
To Resume Work
Option 1: Scale to Full Dataset
cd /Users/kempersc/apps/glam
python3 scripts/enrich_nde_with_wikidata.py --start-index 10 --batch-size 50
Expected:
- Processing time: 2-3 hours
- Success rate: 70-85% (based on test batch)
- ~950-1,150 organizations will get Wikidata IDs
Option 2: Check Current Status
python3 -c "
import yaml
with open('data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml', 'r') as f:
orgs = yaml.safe_load(f)
enriched = len([o for o in orgs if 'wikidata_id' in o])
print(f'Enriched: {enriched}/{len(orgs)} ({enriched/len(orgs)*100:.1f}%)')
"
Option 3: Validate Enrichment
# Check for duplicates and verify Q-numbers
python3 scripts/validate_wikidata_enrichment.py
Key Commands
Search Wikidata Entity
from wikidata_mcp import search_entity
q_number = search_entity("Rijksmuseum Amsterdam")
# Returns: Q190804
Verify Q-Number
from wikidata_mcp import get_metadata
metadata = get_metadata("Q190804", language="nl")
# Returns: {"Label": "Rijksmuseum", "Description": "museum in Amsterdam"}
SPARQL Query (for municipalities)
SELECT ?item ?itemLabel WHERE {
?item wdt:P31 wd:Q2039348 . # Dutch municipality
?item rdfs:label "Amsterdam"@nl .
SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en". }
}
Enriched Records (Test Batch)
| Organization | Wikidata ID | URL |
|---|---|---|
| Herinneringscentrum Kamp Westerbork | Q22246632 | https://www.wikidata.org/wiki/Q22246632 |
| Hunebedcentrum | Q2679819 | https://www.wikidata.org/wiki/Q2679819 |
| Drents Archief | Q1978308 | https://www.wikidata.org/wiki/Q1978308 |
| Drents Museum | Q1258370 | https://www.wikidata.org/wiki/Q1258370 |
| Gemeente Aa en Hunze | Q300665 | https://www.wikidata.org/wiki/Q300665 |
| Gemeente Borger-Odoorn | Q835118 | https://www.wikidata.org/wiki/Q835118 |
| Gemeente Coevorden | Q60453 | https://www.wikidata.org/wiki/Q60453 |
| Gemeente De Wolden | Q835108 | https://www.wikidata.org/wiki/Q835108 |
Success Patterns
✅ Museums with national/international recognition: 100% match
✅ Municipal archives: 100% match (use municipality Q-number)
✅ Regional archives: 100% match
⚠️ Branch locations: Low coverage (often not in Wikidata)
⚠️ Inter-municipal partnerships: Low coverage
⚠️ Small local societies: Varies (50-70% expected)
Next Milestones
- First 100 records enriched → Review success rate, adjust strategy
- First 500 records enriched → Identify patterns in no-matches
- First 1,000 records enriched → Statistical analysis
- All 1,351 records processed → Final report and integration
Important Notes
- Rate limit: 1,000 requests/hour (Wikidata API)
- Batch size: 50 records per batch (recommended)
- Verification: Always verify Q-numbers with
get_metadata - Backup: Created before enrichment (
.backup.*.yaml) - Logging: All queries logged in
/data/nde/sparql/
Questions to Address
- Should we create Wikidata entries for missing institutions?
- How to handle branch locations (link to parent or omit)?
- Manual review threshold (currently 85% confidence)?
- When to integrate with main GLAM project schema?
Documentation
📄 Full Report: /docs/NDE_WIKIDATA_ENRICHMENT_REPORT.md
📄 Dataset Guide: /data/nde/README.md
📄 Session Summary: /docs/sessions/SESSION_SUMMARY_20251117_NDE_WIKIDATA_TEST_BATCH.md
📄 Agent Instructions: /AGENTS.md (Wikidata enrichment section)
Ready to Scale: YES ✅
Confidence: HIGH
Expected Success: 70-85%