89 lines
2.5 KiB
Markdown
89 lines
2.5 KiB
Markdown
# Bavaria Status - Quick Decision Guide
|
|
|
|
**Last Updated**: 2025-11-20 22:30
|
|
**Current State**: Enrichment exploration complete
|
|
|
|
---
|
|
|
|
## Current Bayern Dataset
|
|
|
|
✅ **1,245 institutions** (100% ISIL coverage)
|
|
✅ **699 cities** (best rural coverage in project)
|
|
✅ **42% completeness** (all core fields present)
|
|
✅ **64% completeness** (projected with enrichment)
|
|
|
|
---
|
|
|
|
## Three Paths Forward
|
|
|
|
### A) Complete Bayern Enrichment (30 min)
|
|
```bash
|
|
cd /Users/kempersc/apps/glam
|
|
# Fix address parsing in scripts/scrapers/enrich_bayern_museums.py
|
|
python3 scripts/scrapers/enrich_bayern_museums.py
|
|
```
|
|
**Result**: Bavaria at 85% completeness, then move to next state
|
|
|
|
---
|
|
|
|
### B) Move to Baden-Württemberg NOW (0 min) ⭐ RECOMMENDED
|
|
```bash
|
|
cd /Users/kempersc/apps/glam
|
|
# Use the same pattern as Bavaria/Saxony
|
|
# 1. Extract foundation institutions (archives, libraries)
|
|
# 2. Run isil.museum harvester for museums
|
|
# 3. Merge and export
|
|
```
|
|
**Result**: 6/16 states, ~6,100 institutions
|
|
|
|
**Why**: Batch enrichment more efficient (saves 3.3 hours)
|
|
|
|
---
|
|
|
|
### C) Quick Bayern Enrichment (25 min)
|
|
```bash
|
|
cd /Users/kempersc/apps/glam
|
|
python3 scripts/scrapers/enrich_bayern_museums.py
|
|
# Don't fix address parsing, accept 64% completeness
|
|
```
|
|
**Result**: Bavaria at 64% completeness, then move to next state
|
|
|
|
---
|
|
|
|
## Recommendation: **Option B**
|
|
|
|
**Move to Baden-Württemberg extraction using proven pattern.**
|
|
|
|
Return to metadata enrichment as batch operation after all 16 states complete (saves 200 minutes).
|
|
|
|
---
|
|
|
|
## Quick Stats
|
|
|
|
| State | Institutions | ISIL | Completeness | Status |
|
|
|-------|--------------|------|--------------|--------|
|
|
| Nordrhein-Westfalen | 1,893 | 99.2% | 68.4% | ✅ |
|
|
| **Bayern** | **1,245** | **99.9%** | **42%** | ✅ |
|
|
| Thüringen | 1,061 | 97.8% | 66.7% | ✅ |
|
|
| Sachsen | 411 | 99.8% | 43.0% | ✅ |
|
|
| Sachsen-Anhalt | 317 | 98.4% | 62.8% | ✅ |
|
|
| **Total** | **4,927** | **98.8%** | - | **5/16** |
|
|
|
|
**Next**: Baden-Württemberg (~1,200 institutions, ~1.5 hours)
|
|
|
|
---
|
|
|
|
## Files Ready
|
|
|
|
**Enrichment scripts**:
|
|
- `scripts/scrapers/enrich_bayern_museums.py` (full, 25 min)
|
|
- `scripts/scrapers/enrich_bayern_museums_sample.py` (sample, 1 min)
|
|
|
|
**Data**:
|
|
- `data/isil/germany/bayern_complete_20251120_213349.json` (1,245 institutions)
|
|
- `data/isil/germany/bayern_museums_enriched_sample_20251120_221708.json` (100 enriched sample)
|
|
|
|
**Documentation**:
|
|
- `SESSION_SUMMARY_20251120_BAVARIA_ENRICHMENT_COMPLETE.md` (detailed)
|
|
- `SESSION_SUMMARY_20251120_BAVARIA_ENRICHMENT.md` (decision guide)
|
|
- `GERMAN_HARVEST_STATUS.md` (updated)
|