5.8 KiB
Next Session: Mexican Institution Geocoding
Quick Start Guide for Next Session
Objective
Geocode 117 Mexican heritage institutions to achieve 60%+ coordinate coverage (70+ institutions).
Current Status
- Input file:
data/instances/mexican_institutions_curated.yaml - Current geocoding: 5.9% (7 out of 117 institutions)
- Target: 60%+ (70+ institutions)
Step-by-Step Workflow
1. Create Mexican Geocoding Script (5 minutes)
Copy the Chilean script and adapt for Mexico:
cd /Users/kempersc/apps/glam
cp scripts/geocode_chilean_institutions.py scripts/geocode_mexican_institutions.py
Configuration changes needed (lines 22-26):
INPUT_FILE = Path("data/instances/mexican_institutions_curated.yaml")
OUTPUT_FILE = Path("data/instances/mexican_institutions_geocoded_v2.yaml")
REPORT_FILE = Path("data/instances/mexican_geocoding_report_v2.md")
CACHE_FILE = Path("data/instances/.geocoding_cache_mexico.yaml")
Report title change (line 2-3, line 439):
# Change "Chilean" to "Mexican" in:
# - Docstring title
# - Print statements in main()
# - Report generation
2. Run Geocoding (4-5 minutes first run)
cd /Users/kempersc/apps/glam
python scripts/geocode_mexican_institutions.py
Expected output:
- ~200-250 API calls (117 institutions × ~2 avg queries with fallbacks)
- ~4-5 minutes total time (1.1 sec/request rate limit)
- Target: 70+ institutions geocoded (60%+)
3. Validate Results (1 minute)
python scripts/validate_yaml_instance.py data/instances/mexican_institutions_geocoded_v2.yaml
Should show: "✅ All instances are valid!"
4. Create Backup (30 seconds)
tar -czf data/instances/backups/2025-11-06_mexican-geocoded-v2.tar.gz \
data/instances/mexican_institutions_geocoded_v2.yaml \
data/instances/mexican_geocoding_report_v2.md \
data/instances/.geocoding_cache_mexico.yaml
5. Review Report
Check data/instances/mexican_geocoding_report_v2.md for:
- Total geocoded percentage (should be ≥60%)
- Failed institutions (for manual review)
- API usage statistics
Expected Results
Optimistic Scenario (similar to Chilean success):
- 85%+ geocoded (99+ institutions)
- ~18 failed institutions
- Fallback strategies highly effective
Realistic Scenario:
- 70-80% geocoded (82-93 institutions)
- ~25-35 failed institutions
- Some Mexican regions have lower OSM coverage
Worst Case (below target):
- 50-60% geocoded (58-70 institutions)
- Requires manual geocoding or refined queries for failed cases
Mexican-Specific Considerations
Institution Name Patterns
Mexican institution names may differ from Chilean patterns:
Chilean: "Museo de...", "Archivo Histórico", "Biblioteca Pública"
Mexican: "Museo Nacional de...", "Archivo General del Estado de...", "Biblioteca Pública Municipal"
Potential Adjustments (if needed):
Update fallback query generation (around line 200) to handle:
- "Archivo General del Estado" → "Archivo Estado"
- "Biblioteca Pública Municipal" → "Biblioteca Municipal"
- State names vs. Chilean region names
Mexican State Names (Examples)
Common states in dataset:
- Ciudad de México (CDMX)
- Jalisco (Guadalajara)
- Nuevo León (Monterrey)
- Puebla
- Veracruz
- Oaxaca
- Guanajuato
- Yucatán
OSM Coverage: Generally good in major cities, variable in rural areas.
Troubleshooting
If geocoding rate is < 60%:
-
Review failed institutions:
grep -A3 "No results found (all strategies exhausted)" \ data/instances/mexican_geocoding_report_v2.md -
Check for patterns in failures:
- Are they all from one region? (May indicate OSM coverage issue)
- Are they generic names? (Need more specific queries)
- Are they university archives? (May need different query pattern)
-
Potential fixes:
- Add Mexico-specific fallback strategies
- Mine conversation text for street addresses
- Manual geocoding for critical institutions
If script errors occur:
- Check INPUT_FILE exists:
ls data/instances/mexican_institutions_curated.yaml - Verify network connection for API calls
- Check cache file permissions:
ls -la data/instances/.geocoding_cache_mexico.yaml
After Mexican Geocoding
Once Mexican institutions are geocoded, we'll have:
Dataset Summary:
- Brazilian institutions: 97 (59.8% geocoded)
- Chilean institutions: 90 (86.7% geocoded) ✅
- Mexican institutions: 117 (target: 60%+ geocoded)
- Total: 304 institutions
Next priorities:
- Create combined geocoding report across all 3 countries
- Generate geographic visualization (map with all institutions)
- Export to multiple formats (GeoJSON, RDF, CSV)
- Update PROGRESS.md with geocoding achievements
Files to Create/Review
Created by script:
scripts/geocode_mexican_institutions.py- Geocoding scriptdata/instances/mexican_institutions_geocoded_v2.yaml- Output datadata/instances/mexican_geocoding_report_v2.md- Statisticsdata/instances/.geocoding_cache_mexico.yaml- API cachedata/instances/backups/2025-11-06_mexican-geocoded-v2.tar.gz- Backup
For review:
- Mexican geocoding report (coverage %, failed institutions)
- Validation output (should be 0 errors)
Quick Reference: Key Metrics from Chilean Success
For comparison with Mexican results:
| Metric | Chilean Result | Mexican Target |
|---|---|---|
| Total institutions | 90 | 117 |
| Geocoded | 78 (86.7%) | 70+ (60%+) |
| Failed | 12 (13.3%) | < 47 (< 40%) |
| API calls | ~150-200 | ~200-250 |
| Execution time | 3-4 minutes | 4-5 minutes |
| Fallback effectiveness | +33.4% pts | Target: +20% pts |
Document Created: 2025-11-06
Estimated Time for Next Session: 20 minutes total
Prerequisites: Chilean geocoding complete ✅