glam/QUICK_STATUS_BAVARIA_DECISION.md
2025-11-21 22:12:33 +01:00

89 lines
2.5 KiB
Markdown

# Bavaria Status - Quick Decision Guide
**Last Updated**: 2025-11-20 22:30
**Current State**: Enrichment exploration complete
---
## Current Bayern Dataset
**1,245 institutions** (100% ISIL coverage)
**699 cities** (best rural coverage in project)
**42% completeness** (all core fields present)
**64% completeness** (projected with enrichment)
---
## Three Paths Forward
### A) Complete Bayern Enrichment (30 min)
```bash
cd /Users/kempersc/apps/glam
# Fix address parsing in scripts/scrapers/enrich_bayern_museums.py
python3 scripts/scrapers/enrich_bayern_museums.py
```
**Result**: Bavaria at 85% completeness, then move to next state
---
### B) Move to Baden-Württemberg NOW (0 min) ⭐ RECOMMENDED
```bash
cd /Users/kempersc/apps/glam
# Use the same pattern as Bavaria/Saxony
# 1. Extract foundation institutions (archives, libraries)
# 2. Run isil.museum harvester for museums
# 3. Merge and export
```
**Result**: 6/16 states, ~6,100 institutions
**Why**: Batch enrichment more efficient (saves 3.3 hours)
---
### C) Quick Bayern Enrichment (25 min)
```bash
cd /Users/kempersc/apps/glam
python3 scripts/scrapers/enrich_bayern_museums.py
# Don't fix address parsing, accept 64% completeness
```
**Result**: Bavaria at 64% completeness, then move to next state
---
## Recommendation: **Option B**
**Move to Baden-Württemberg extraction using proven pattern.**
Return to metadata enrichment as batch operation after all 16 states complete (saves 200 minutes).
---
## Quick Stats
| State | Institutions | ISIL | Completeness | Status |
|-------|--------------|------|--------------|--------|
| Nordrhein-Westfalen | 1,893 | 99.2% | 68.4% | ✅ |
| **Bayern** | **1,245** | **99.9%** | **42%** | ✅ |
| Thüringen | 1,061 | 97.8% | 66.7% | ✅ |
| Sachsen | 411 | 99.8% | 43.0% | ✅ |
| Sachsen-Anhalt | 317 | 98.4% | 62.8% | ✅ |
| **Total** | **4,927** | **98.8%** | - | **5/16** |
**Next**: Baden-Württemberg (~1,200 institutions, ~1.5 hours)
---
## Files Ready
**Enrichment scripts**:
- `scripts/scrapers/enrich_bayern_museums.py` (full, 25 min)
- `scripts/scrapers/enrich_bayern_museums_sample.py` (sample, 1 min)
**Data**:
- `data/isil/germany/bayern_complete_20251120_213349.json` (1,245 institutions)
- `data/isil/germany/bayern_museums_enriched_sample_20251120_221708.json` (100 enriched sample)
**Documentation**:
- `SESSION_SUMMARY_20251120_BAVARIA_ENRICHMENT_COMPLETE.md` (detailed)
- `SESSION_SUMMARY_20251120_BAVARIA_ENRICHMENT.md` (decision guide)
- `GERMAN_HARVEST_STATUS.md` (updated)