244 lines
6.8 KiB
Markdown
244 lines
6.8 KiB
Markdown
# Netherlands Phase 2 Enrichment Report
|
|
|
|
**Date**: 1762880966.934019
|
|
**Script**: `scripts/enrich_phase2_netherlands.py`
|
|
**Target**: 622 Dutch institutions
|
|
**Methodology**: SPARQL batch query + fuzzy name matching (Dutch normalization, 70% threshold)
|
|
|
|
---
|
|
|
|
## 📊 Overall Results
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| **Total Dutch Institutions** | 622 |
|
|
| **With Wikidata** | 396 (63.7%) |
|
|
| **Without Wikidata** | 226 (36.3%) |
|
|
| **Phase 2 Enriched** | 203 institutions |
|
|
| **Wikidata Pool** | 3,550 Dutch institutions in Wikidata |
|
|
| **Match Threshold** | 70% similarity (Dutch normalization) |
|
|
|
|
### Coverage Progression
|
|
|
|
- **Before Phase 2**: 193 institutions (31.0%)
|
|
- **After Phase 2**: 396 institutions (63.7%)
|
|
- **Improvement**: +203 institutions (+32.6 percentage points)
|
|
- **🎯 Target Achieved**: 62%+ coverage ✅
|
|
|
|
---
|
|
|
|
## 🏛️ Coverage by Institution Type
|
|
|
|
| Type | With Wikidata | Without Wikidata | Total | Coverage |
|
|
|------|---------------|------------------|-------|----------|
|
|
| ARCHIVE | 135 | 16 | 151 | 89.4% |
|
|
| COLLECTING_SOCIETY | 1 | 17 | 18 | 5.6% |
|
|
| L | 1 | 0 | 1 | 100.0% |
|
|
| LIBRARY | 5 | 2 | 7 | 71.4% |
|
|
| MIXED | 176 | 151 | 327 | 53.8% |
|
|
| MUSEUM | 71 | 27 | 98 | 72.4% |
|
|
| OFFICIAL_INSTITUTION | 4 | 4 | 8 | 50.0% |
|
|
| RESEARCH_CENTER | 3 | 8 | 11 | 27.3% |
|
|
| UNDEFINED | 0 | 1 | 1 | 0.0% |
|
|
|
|
|
|
### Highlights
|
|
|
|
- **MUSEUM**: 71/98 (72.4%) - Highest absolute coverage
|
|
- **ARCHIVE**: 135/151 (89.4%) - Significant improvement from 21.2%
|
|
- **MIXED**: 176/327 (53.8%) - Largest group, improved from 30.6%
|
|
|
|
---
|
|
|
|
## 🏙️ Geographic Distribution
|
|
|
|
### Top 10 Cities with Wikidata Coverage
|
|
|
|
| City | With Wikidata | Without Wikidata | Total | Coverage |
|
|
|------|---------------|------------------|-------|----------|
|
|
| Den Haag | 18 | 31 | 49 | 36.7% |
|
|
| Amsterdam | 25 | 14 | 39 | 64.1% |
|
|
| Utrecht | 22 | 11 | 33 | 66.7% |
|
|
| Arnhem | 15 | 2 | 17 | 88.2% |
|
|
| Zwolle | 4 | 11 | 15 | 26.7% |
|
|
| Rotterdam | 13 | 1 | 14 | 92.9% |
|
|
| Leiden | 6 | 5 | 11 | 54.5% |
|
|
| Groningen | 5 | 5 | 10 | 50.0% |
|
|
| Zeeland | 9 | 1 | 10 | 90.0% |
|
|
| Maastricht | 6 | 3 | 9 | 66.7% |
|
|
|
|
|
|
### Top 10 Cities Needing Enrichment
|
|
|
|
| City | Institutions Without Wikidata |
|
|
|------|-------------------------------|
|
|
| Den Haag | 31 |
|
|
| Amsterdam | 14 |
|
|
| Utrecht | 11 |
|
|
| Zwolle | 11 |
|
|
| Enschede | 5 |
|
|
| Groningen | 5 |
|
|
| Leiden | 5 |
|
|
| Roermond | 5 |
|
|
| Deventer | 4 |
|
|
| Leeuwarden | 4 |
|
|
|
|
|
|
---
|
|
|
|
## 🎯 Remaining Work
|
|
|
|
### Institutions Without Wikidata: 226
|
|
|
|
**By Type:**
|
|
|
|
- **MIXED**: 151 institutions
|
|
- **MUSEUM**: 27 institutions
|
|
- **COLLECTING_SOCIETY**: 17 institutions
|
|
- **ARCHIVE**: 16 institutions
|
|
- **RESEARCH_CENTER**: 8 institutions
|
|
- **OFFICIAL_INSTITUTION**: 4 institutions
|
|
- **LIBRARY**: 2 institutions
|
|
- **UNDEFINED**: 1 institutions
|
|
|
|
### Recommended Next Steps
|
|
|
|
1. **Phase 3 Netherlands**: Alternative name search for remaining 226 institutions
|
|
- Target: COLLECTING_SOCIETY (0% coverage currently)
|
|
- Target: Generic "Museum" institutions (common names)
|
|
- Target: Regional archives with variant spellings
|
|
|
|
2. **Manual Curation**: Review institutions with unique names not found in Wikidata
|
|
|
|
3. **ISIL Code Matching**: Cross-reference with Dutch ISIL registry for remaining institutions
|
|
|
|
---
|
|
|
|
## 🔍 Sample Enriched Institutions
|
|
|
|
|
|
### 1. Regionaal Archief Alkmaar
|
|
|
|
- **Location**: Alkmaar
|
|
- **Type**: ARCHIVE
|
|
- **Wikidata**: [Q2189005](https://www.wikidata.org/wiki/Q2189005)
|
|
- **Match Score**: 1.000
|
|
|
|
### 2. Gemeente Almelo
|
|
|
|
- **Location**: Almelo
|
|
- **Type**: MIXED
|
|
- **Wikidata**: [Q110891755](https://www.wikidata.org/wiki/Q110891755)
|
|
- **Match Score**: 0.811
|
|
|
|
### 3. Gemeentearchief Alphen aan den Rijn
|
|
|
|
- **Location**: Alphen aan den Rijn
|
|
- **Type**: ARCHIVE
|
|
- **Wikidata**: [Q111190988](https://www.wikidata.org/wiki/Q111190988)
|
|
- **Match Score**: 1.000
|
|
|
|
### 4. Huygens Instituut (HI)
|
|
|
|
- **Location**: Amsterdam
|
|
- **Type**: MIXED
|
|
- **Wikidata**: [Q487857](https://www.wikidata.org/wiki/Q487857)
|
|
- **Match Score**: 0.743
|
|
|
|
### 5. IHLIA LGBT Heritage
|
|
|
|
- **Location**: Amsterdam
|
|
- **Type**: MIXED
|
|
- **Wikidata**: [Q1417841](https://www.wikidata.org/wiki/Q1417841)
|
|
- **Match Score**: 0.974
|
|
|
|
### 6. Nationale Opera & Ballet
|
|
|
|
- **Location**: Amsterdam
|
|
- **Type**: MIXED
|
|
- **Wikidata**: [Q110996017](https://www.wikidata.org/wiki/Q110996017)
|
|
- **Match Score**: 0.714
|
|
|
|
### 7. Rijksmuseum
|
|
|
|
- **Location**: Amsterdam
|
|
- **Type**: MUSEUM
|
|
- **Wikidata**: [Q124624215](https://www.wikidata.org/wiki/Q124624215)
|
|
- **Match Score**: 0.909
|
|
|
|
### 8. Gemeente Appingedam
|
|
|
|
- **Location**: Appingedam
|
|
- **Type**: ARCHIVE
|
|
- **Wikidata**: [Q81181191](https://www.wikidata.org/wiki/Q81181191)
|
|
- **Match Score**: 0.844
|
|
|
|
### 9. Museum Arnhem (MA)
|
|
|
|
- **Location**: Arnhem
|
|
- **Type**: MUSEUM
|
|
- **Wikidata**: [Q2114028](https://www.wikidata.org/wiki/Q2114028)
|
|
- **Match Score**: 1.000
|
|
|
|
### 10. Drents Archief
|
|
|
|
- **Location**: Assen
|
|
- **Type**: ARCHIVE
|
|
- **Wikidata**: [Q1978308](https://www.wikidata.org/wiki/Q1978308)
|
|
- **Match Score**: 1.000
|
|
|
|
|
|
---
|
|
|
|
## 📈 Performance Metrics
|
|
|
|
- **Wikidata Query Time**: 58.9 seconds
|
|
- **Institutions Matched**: 203
|
|
- **Match Rate**: 47.3% (203 matched out of 429 without Wikidata)
|
|
- **Total Processing Time**: 2.5 minutes
|
|
- **Dataset Write Time**: 16.4 seconds
|
|
|
|
---
|
|
|
|
## ✅ Success Factors
|
|
|
|
1. **Strong Dutch Wikipedia Coverage**: Netherlands has extensive cultural heritage documentation
|
|
2. **ISIL Code Integration**: Many institutions already have ISIL codes for validation
|
|
3. **Dutch Normalization**: Effective handling of Dutch-specific prefixes/suffixes
|
|
4. **High Wikidata Pool**: 3,550 Dutch institutions available (vs. 1,845 for Mexico)
|
|
5. **Type Compatibility Checks**: Prevented museum → library mismatches
|
|
|
|
---
|
|
|
|
## 🔄 Comparison with Mexico Phase 2
|
|
|
|
| Metric | Mexico | Netherlands |
|
|
|--------|--------|-------------|
|
|
| **Total Institutions** | 192 | 622 |
|
|
| **Starting Coverage** | 17.7% (34) | 31.0% (193) |
|
|
| **Ending Coverage** | 50.0% (96) | 63.7% (396) |
|
|
| **Institutions Enriched** | 62 | 203 |
|
|
| **Coverage Gain** | +32.3pp | +32.6pp |
|
|
| **Match Rate** | 39.2% | 47.3% |
|
|
| **Wikidata Pool** | 1,845 | 3,550 |
|
|
| **Processing Time** | 2.1 min | 2.5 min |
|
|
|
|
**Netherlands outperformed Mexico** in absolute numbers (203 vs 62 enriched) and match rate (47.3% vs 39.2%), demonstrating the value of targeting well-documented European heritage institutions.
|
|
|
|
---
|
|
|
|
## 📊 Phase 2 Summary Across Countries
|
|
|
|
| Country | Total | Before | After | Enriched | Coverage Gain |
|
|
|---------|-------|--------|-------|----------|---------------|
|
|
| 🇧🇷 Brazil | 241 | 13.7% | 32.5% | 45 | +18.8pp |
|
|
| 🇲🇽 Mexico | 192 | 17.7% | 50.0% | 62 | +32.3pp |
|
|
| 🇳🇱 **Netherlands** | **622** | **31.0%** | **63.7%** | **203** | **+32.6pp** |
|
|
|
|
**Netherlands Phase 2 is the largest successful enrichment to date** with 203 institutions enriched, bringing total Wikidata coverage to 63.7%.
|
|
|
|
---
|
|
|
|
**Generated**: /Users/kempersc/apps/glam
|
|
**Script**: `scripts/enrich_phase2_netherlands.py`
|
|
**Dataset**: `data/instances/all/globalglam-20251111.yaml`
|