# Netherlands Phase 2 Enrichment Report **Date**: 1762880966.934019 **Script**: `scripts/enrich_phase2_netherlands.py` **Target**: 622 Dutch institutions **Methodology**: SPARQL batch query + fuzzy name matching (Dutch normalization, 70% threshold) --- ## 📊 Overall Results | Metric | Value | |--------|-------| | **Total Dutch Institutions** | 622 | | **With Wikidata** | 396 (63.7%) | | **Without Wikidata** | 226 (36.3%) | | **Phase 2 Enriched** | 203 institutions | | **Wikidata Pool** | 3,550 Dutch institutions in Wikidata | | **Match Threshold** | 70% similarity (Dutch normalization) | ### Coverage Progression - **Before Phase 2**: 193 institutions (31.0%) - **After Phase 2**: 396 institutions (63.7%) - **Improvement**: +203 institutions (+32.6 percentage points) - **🎯 Target Achieved**: 62%+ coverage ✅ --- ## 🏛️ Coverage by Institution Type | Type | With Wikidata | Without Wikidata | Total | Coverage | |------|---------------|------------------|-------|----------| | ARCHIVE | 135 | 16 | 151 | 89.4% | | COLLECTING_SOCIETY | 1 | 17 | 18 | 5.6% | | L | 1 | 0 | 1 | 100.0% | | LIBRARY | 5 | 2 | 7 | 71.4% | | MIXED | 176 | 151 | 327 | 53.8% | | MUSEUM | 71 | 27 | 98 | 72.4% | | OFFICIAL_INSTITUTION | 4 | 4 | 8 | 50.0% | | RESEARCH_CENTER | 3 | 8 | 11 | 27.3% | | UNDEFINED | 0 | 1 | 1 | 0.0% | ### Highlights - **MUSEUM**: 71/98 (72.4%) - Highest absolute coverage - **ARCHIVE**: 135/151 (89.4%) - Significant improvement from 21.2% - **MIXED**: 176/327 (53.8%) - Largest group, improved from 30.6% --- ## 🏙️ Geographic Distribution ### Top 10 Cities with Wikidata Coverage | City | With Wikidata | Without Wikidata | Total | Coverage | |------|---------------|------------------|-------|----------| | Den Haag | 18 | 31 | 49 | 36.7% | | Amsterdam | 25 | 14 | 39 | 64.1% | | Utrecht | 22 | 11 | 33 | 66.7% | | Arnhem | 15 | 2 | 17 | 88.2% | | Zwolle | 4 | 11 | 15 | 26.7% | | Rotterdam | 13 | 1 | 14 | 92.9% | | Leiden | 6 | 5 | 11 | 54.5% | | Groningen | 5 | 5 | 10 | 50.0% | | Zeeland | 9 | 1 | 10 | 90.0% | | Maastricht | 6 | 3 | 9 | 66.7% | ### Top 10 Cities Needing Enrichment | City | Institutions Without Wikidata | |------|-------------------------------| | Den Haag | 31 | | Amsterdam | 14 | | Utrecht | 11 | | Zwolle | 11 | | Enschede | 5 | | Groningen | 5 | | Leiden | 5 | | Roermond | 5 | | Deventer | 4 | | Leeuwarden | 4 | --- ## 🎯 Remaining Work ### Institutions Without Wikidata: 226 **By Type:** - **MIXED**: 151 institutions - **MUSEUM**: 27 institutions - **COLLECTING_SOCIETY**: 17 institutions - **ARCHIVE**: 16 institutions - **RESEARCH_CENTER**: 8 institutions - **OFFICIAL_INSTITUTION**: 4 institutions - **LIBRARY**: 2 institutions - **UNDEFINED**: 1 institutions ### Recommended Next Steps 1. **Phase 3 Netherlands**: Alternative name search for remaining 226 institutions - Target: COLLECTING_SOCIETY (0% coverage currently) - Target: Generic "Museum" institutions (common names) - Target: Regional archives with variant spellings 2. **Manual Curation**: Review institutions with unique names not found in Wikidata 3. **ISIL Code Matching**: Cross-reference with Dutch ISIL registry for remaining institutions --- ## 🔍 Sample Enriched Institutions ### 1. Regionaal Archief Alkmaar - **Location**: Alkmaar - **Type**: ARCHIVE - **Wikidata**: [Q2189005](https://www.wikidata.org/wiki/Q2189005) - **Match Score**: 1.000 ### 2. Gemeente Almelo - **Location**: Almelo - **Type**: MIXED - **Wikidata**: [Q110891755](https://www.wikidata.org/wiki/Q110891755) - **Match Score**: 0.811 ### 3. Gemeentearchief Alphen aan den Rijn - **Location**: Alphen aan den Rijn - **Type**: ARCHIVE - **Wikidata**: [Q111190988](https://www.wikidata.org/wiki/Q111190988) - **Match Score**: 1.000 ### 4. Huygens Instituut (HI) - **Location**: Amsterdam - **Type**: MIXED - **Wikidata**: [Q487857](https://www.wikidata.org/wiki/Q487857) - **Match Score**: 0.743 ### 5. IHLIA LGBT Heritage - **Location**: Amsterdam - **Type**: MIXED - **Wikidata**: [Q1417841](https://www.wikidata.org/wiki/Q1417841) - **Match Score**: 0.974 ### 6. Nationale Opera & Ballet - **Location**: Amsterdam - **Type**: MIXED - **Wikidata**: [Q110996017](https://www.wikidata.org/wiki/Q110996017) - **Match Score**: 0.714 ### 7. Rijksmuseum - **Location**: Amsterdam - **Type**: MUSEUM - **Wikidata**: [Q124624215](https://www.wikidata.org/wiki/Q124624215) - **Match Score**: 0.909 ### 8. Gemeente Appingedam - **Location**: Appingedam - **Type**: ARCHIVE - **Wikidata**: [Q81181191](https://www.wikidata.org/wiki/Q81181191) - **Match Score**: 0.844 ### 9. Museum Arnhem (MA) - **Location**: Arnhem - **Type**: MUSEUM - **Wikidata**: [Q2114028](https://www.wikidata.org/wiki/Q2114028) - **Match Score**: 1.000 ### 10. Drents Archief - **Location**: Assen - **Type**: ARCHIVE - **Wikidata**: [Q1978308](https://www.wikidata.org/wiki/Q1978308) - **Match Score**: 1.000 --- ## 📈 Performance Metrics - **Wikidata Query Time**: 58.9 seconds - **Institutions Matched**: 203 - **Match Rate**: 47.3% (203 matched out of 429 without Wikidata) - **Total Processing Time**: 2.5 minutes - **Dataset Write Time**: 16.4 seconds --- ## ✅ Success Factors 1. **Strong Dutch Wikipedia Coverage**: Netherlands has extensive cultural heritage documentation 2. **ISIL Code Integration**: Many institutions already have ISIL codes for validation 3. **Dutch Normalization**: Effective handling of Dutch-specific prefixes/suffixes 4. **High Wikidata Pool**: 3,550 Dutch institutions available (vs. 1,845 for Mexico) 5. **Type Compatibility Checks**: Prevented museum → library mismatches --- ## 🔄 Comparison with Mexico Phase 2 | Metric | Mexico | Netherlands | |--------|--------|-------------| | **Total Institutions** | 192 | 622 | | **Starting Coverage** | 17.7% (34) | 31.0% (193) | | **Ending Coverage** | 50.0% (96) | 63.7% (396) | | **Institutions Enriched** | 62 | 203 | | **Coverage Gain** | +32.3pp | +32.6pp | | **Match Rate** | 39.2% | 47.3% | | **Wikidata Pool** | 1,845 | 3,550 | | **Processing Time** | 2.1 min | 2.5 min | **Netherlands outperformed Mexico** in absolute numbers (203 vs 62 enriched) and match rate (47.3% vs 39.2%), demonstrating the value of targeting well-documented European heritage institutions. --- ## 📊 Phase 2 Summary Across Countries | Country | Total | Before | After | Enriched | Coverage Gain | |---------|-------|--------|-------|----------|---------------| | 🇧🇷 Brazil | 241 | 13.7% | 32.5% | 45 | +18.8pp | | 🇲🇽 Mexico | 192 | 17.7% | 50.0% | 62 | +32.3pp | | 🇳🇱 **Netherlands** | **622** | **31.0%** | **63.7%** | **203** | **+32.6pp** | **Netherlands Phase 2 is the largest successful enrichment to date** with 203 institutions enriched, bringing total Wikidata coverage to 63.7%. --- **Generated**: /Users/kempersc/apps/glam **Script**: `scripts/enrich_phase2_netherlands.py` **Dataset**: `data/instances/all/globalglam-20251111.yaml`