glam/data/instances/netherlands/NETHERLANDS_PHASE2_ENRICHMENT_REPORT.md
2025-11-19 23:25:22 +01:00

6.8 KiB

Netherlands Phase 2 Enrichment Report

Date: 1762880966.934019
Script: scripts/enrich_phase2_netherlands.py
Target: 622 Dutch institutions
Methodology: SPARQL batch query + fuzzy name matching (Dutch normalization, 70% threshold)


📊 Overall Results

Metric Value
Total Dutch Institutions 622
With Wikidata 396 (63.7%)
Without Wikidata 226 (36.3%)
Phase 2 Enriched 203 institutions
Wikidata Pool 3,550 Dutch institutions in Wikidata
Match Threshold 70% similarity (Dutch normalization)

Coverage Progression

  • Before Phase 2: 193 institutions (31.0%)
  • After Phase 2: 396 institutions (63.7%)
  • Improvement: +203 institutions (+32.6 percentage points)
  • 🎯 Target Achieved: 62%+ coverage

🏛️ Coverage by Institution Type

Type With Wikidata Without Wikidata Total Coverage
ARCHIVE 135 16 151 89.4%
COLLECTING_SOCIETY 1 17 18 5.6%
L 1 0 1 100.0%
LIBRARY 5 2 7 71.4%
MIXED 176 151 327 53.8%
MUSEUM 71 27 98 72.4%
OFFICIAL_INSTITUTION 4 4 8 50.0%
RESEARCH_CENTER 3 8 11 27.3%
UNDEFINED 0 1 1 0.0%

Highlights

  • MUSEUM: 71/98 (72.4%) - Highest absolute coverage
  • ARCHIVE: 135/151 (89.4%) - Significant improvement from 21.2%
  • MIXED: 176/327 (53.8%) - Largest group, improved from 30.6%

🏙️ Geographic Distribution

Top 10 Cities with Wikidata Coverage

City With Wikidata Without Wikidata Total Coverage
Den Haag 18 31 49 36.7%
Amsterdam 25 14 39 64.1%
Utrecht 22 11 33 66.7%
Arnhem 15 2 17 88.2%
Zwolle 4 11 15 26.7%
Rotterdam 13 1 14 92.9%
Leiden 6 5 11 54.5%
Groningen 5 5 10 50.0%
Zeeland 9 1 10 90.0%
Maastricht 6 3 9 66.7%

Top 10 Cities Needing Enrichment

City Institutions Without Wikidata
Den Haag 31
Amsterdam 14
Utrecht 11
Zwolle 11
Enschede 5
Groningen 5
Leiden 5
Roermond 5
Deventer 4
Leeuwarden 4

🎯 Remaining Work

Institutions Without Wikidata: 226

By Type:

  • MIXED: 151 institutions
  • MUSEUM: 27 institutions
  • COLLECTING_SOCIETY: 17 institutions
  • ARCHIVE: 16 institutions
  • RESEARCH_CENTER: 8 institutions
  • OFFICIAL_INSTITUTION: 4 institutions
  • LIBRARY: 2 institutions
  • UNDEFINED: 1 institutions
  1. Phase 3 Netherlands: Alternative name search for remaining 226 institutions

    • Target: COLLECTING_SOCIETY (0% coverage currently)
    • Target: Generic "Museum" institutions (common names)
    • Target: Regional archives with variant spellings
  2. Manual Curation: Review institutions with unique names not found in Wikidata

  3. ISIL Code Matching: Cross-reference with Dutch ISIL registry for remaining institutions


🔍 Sample Enriched Institutions

1. Regionaal Archief Alkmaar

  • Location: Alkmaar
  • Type: ARCHIVE
  • Wikidata: Q2189005
  • Match Score: 1.000

2. Gemeente Almelo

  • Location: Almelo
  • Type: MIXED
  • Wikidata: Q110891755
  • Match Score: 0.811

3. Gemeentearchief Alphen aan den Rijn

  • Location: Alphen aan den Rijn
  • Type: ARCHIVE
  • Wikidata: Q111190988
  • Match Score: 1.000

4. Huygens Instituut (HI)

  • Location: Amsterdam
  • Type: MIXED
  • Wikidata: Q487857
  • Match Score: 0.743

5. IHLIA LGBT Heritage

  • Location: Amsterdam
  • Type: MIXED
  • Wikidata: Q1417841
  • Match Score: 0.974

6. Nationale Opera & Ballet

  • Location: Amsterdam
  • Type: MIXED
  • Wikidata: Q110996017
  • Match Score: 0.714

7. Rijksmuseum

  • Location: Amsterdam
  • Type: MUSEUM
  • Wikidata: Q124624215
  • Match Score: 0.909

8. Gemeente Appingedam

  • Location: Appingedam
  • Type: ARCHIVE
  • Wikidata: Q81181191
  • Match Score: 0.844

9. Museum Arnhem (MA)

  • Location: Arnhem
  • Type: MUSEUM
  • Wikidata: Q2114028
  • Match Score: 1.000

10. Drents Archief

  • Location: Assen
  • Type: ARCHIVE
  • Wikidata: Q1978308
  • Match Score: 1.000

📈 Performance Metrics

  • Wikidata Query Time: 58.9 seconds
  • Institutions Matched: 203
  • Match Rate: 47.3% (203 matched out of 429 without Wikidata)
  • Total Processing Time: 2.5 minutes
  • Dataset Write Time: 16.4 seconds

Success Factors

  1. Strong Dutch Wikipedia Coverage: Netherlands has extensive cultural heritage documentation
  2. ISIL Code Integration: Many institutions already have ISIL codes for validation
  3. Dutch Normalization: Effective handling of Dutch-specific prefixes/suffixes
  4. High Wikidata Pool: 3,550 Dutch institutions available (vs. 1,845 for Mexico)
  5. Type Compatibility Checks: Prevented museum → library mismatches

🔄 Comparison with Mexico Phase 2

Metric Mexico Netherlands
Total Institutions 192 622
Starting Coverage 17.7% (34) 31.0% (193)
Ending Coverage 50.0% (96) 63.7% (396)
Institutions Enriched 62 203
Coverage Gain +32.3pp +32.6pp
Match Rate 39.2% 47.3%
Wikidata Pool 1,845 3,550
Processing Time 2.1 min 2.5 min

Netherlands outperformed Mexico in absolute numbers (203 vs 62 enriched) and match rate (47.3% vs 39.2%), demonstrating the value of targeting well-documented European heritage institutions.


📊 Phase 2 Summary Across Countries

Country Total Before After Enriched Coverage Gain
🇧🇷 Brazil 241 13.7% 32.5% 45 +18.8pp
🇲🇽 Mexico 192 17.7% 50.0% 62 +32.3pp
🇳🇱 Netherlands 622 31.0% 63.7% 203 +32.6pp

Netherlands Phase 2 is the largest successful enrichment to date with 203 institutions enriched, bringing total Wikidata coverage to 63.7%.


Generated: /Users/kempersc/apps/glam
Script: scripts/enrich_phase2_netherlands.py
Dataset: data/instances/all/globalglam-20251111.yaml