glam/QUICK_STATUS_20251119.md
2025-11-19 23:25:22 +01:00

2 KiB

GLAM Project Quick Status - 2025-11-19

Completed Today

1. German Data Unification

  • 20,761 institutions (ISIL 16,979 + DDB 4,937 → unified)
  • File: data/isil/germany/german_institutions_unified_20251119_181857.json (39.2 MB)
  • 82% with ISIL codes, 71.3% geocoded

2. Austrian Data Consolidation

  • 4,348 institutions (ISIL 1,928 + Wikidata 4,859 + OSM 627 → deduplicated)
  • File: data/isil/austria/austrian_institutions_consolidated_20251119_181541.json (1.78 MB)
  • 67.5% geocoded, 62.8% with Wikidata IDs

3. Scripts Created

  • scripts/scrapers/harvest_ddb_institutions.py - DDB API harvester
  • scripts/scrapers/consolidate_austrian_data.py - Austrian multi-source merger
  • scripts/scrapers/crossreference_german_data.py - German ISIL+DDB cross-reference

📊 Current Data Status

Country Institutions Status
🇩🇪 Germany 20,761 Unified
🇨🇿 Czech Republic 8,694 Complete
🇦🇹 Austria 4,348 Consolidated
🇨🇭 Switzerland 2,379 Complete
🇳🇱 Netherlands ~1,400 Complete
🇧🇪 Belgium 438 Complete
Total 37,582 38.7% of global target

🚀 Next Steps

  1. Denmark ISIL harvest → Complete Phase 1
  2. Data quality audit → Review 100 random samples
  3. LinkML conversion → Export to HeritageCustodian schema
  4. Wikidata enrichment → Add Q-numbers to German institutions

🔑 Key Files

/data/isil/germany/
└── german_institutions_unified_20251119_181857.json (39.2 MB)

/data/isil/austria/
└── austrian_institutions_consolidated_20251119_181541.json (1.78 MB)

/scripts/scrapers/
├── harvest_ddb_institutions.py
├── consolidate_austrian_data.py
└── crossreference_german_data.py

Last Updated: 2025-11-19T18:30:00Z
Session: DDB Harvest & Unification Complete
Status: Ready for Phase 1 Completion