4.6 KiB
OpenStreetMap Enrichment Report
Date: 2025-11-06
Dataset: Latin American GLAM Institutions
Phase: 5 - OpenStreetMap Enrichment
Executive Summary
Successfully enriched 83 out of 186 institutions (44.6%) with OpenStreetMap data, adding precise coordinates, street addresses, contact information, opening hours, and alternative names.
Processing Statistics
- Total institutions processed: 304
- Institutions with OSM IDs: 186 (61.2%)
- OSM records successfully fetched: 152 (81.7% fetch success rate)
- Institutions enriched: 83 (44.6% enrichment rate)
- OSM fetch errors: 34 (18.3%)
Enrichment Breakdown
| Enrichment Type | Count | Description |
|---|---|---|
| Street addresses | 33 | Added detailed street addresses from addr:street and addr:housenumber tags |
| Contact information | 19 | Added phone numbers and/or email addresses |
| Websites | 16 | Added institutional website URLs |
| Alternative names | 13 | Added alternative, official, or multilingual names |
| Opening hours | 10 | Added opening hours information |
Data Quality
Fetch Success Rate by Country
The OSM enrichment relied on the Overpass API, which experienced:
- 504 Gateway Timeout errors: Server overload during peak processing
- 429 Rate Limiting errors: Managed through 3-second delays and retry logic
- Overall fetch success: 152/186 = 81.7%
Enrichment Quality
Enriched data includes:
- Precise coordinates: Building-level accuracy from OSM nodes/ways
- Structured addresses: Street names, house numbers, postal codes
- Verified contact info: Phone/email from OSM contributors
- Operating hours: Opening hours in OSM standard format
- Multilingual names: Alternative names in English, Spanish, Portuguese
All enrichments are tracked in provenance notes with timestamps and sources.
Technical Implementation
Scripts Created
-
scripts/enrich_from_osm.py(569 lines)- Original implementation with retry logic
- Rate limiting: 2 seconds between requests
- Timeout handling with exponential backoff
-
scripts/enrich_from_osm_batched.py(452 lines)- Batch processing (20 institutions per batch)
- Incremental progress saving
- Resilient to timeouts
-
scripts/resume_osm_enrichment.py(365 lines)- Resume from institution 101
- Extended rate limiting (3 seconds)
- Completed remaining 204 institutions
Overpass API Configuration
- Primary endpoint:
https://overpass-api.de/api/interpreter - Mirror failover: Kumi Systems, OpenStreetMap Russia
- Query timeout: 30 seconds
- Rate limiting: 2-3 seconds between requests
- Retry logic: Max 3 attempts with 10-second delays
Example Enrichments
Museu Sacaca (Amapá, Brazil)
- Added: Street address (Avenida Feliciano Coelho 1502)
- Added: Postal code
- Added: Website
Teatro da Paz (Pará, Brazil)
- Added: Full street address
- Added: Phone (+55 91 98590-3523)
- Added: Website
- Added: 2 alternative names
Universidade Federal do Piauí
- Added: Coordinates (building-level precision)
- Added: Complete address
- Added: Phone and email
- Added: Website
- Added: Opening hours
Output Files
- Primary output:
data/instances/latin_american_institutions_osm_enriched.yaml(456 KB) - Processing log:
osm_resume_log.txt - This report:
docs/osm_enrichment_report.md
Next Steps
- Generate exports: JSON-LD, CSV, GeoJSON for geographic visualization
- Update PROGRESS.md: Document Phase 4-5 findings
- Manual review: Verify enrichment quality for high-value institutions
- National library outreach: Send emails to request ISIL codes
Known Issues
- VIAF API unavailable: All 19 VIAF IDs return HTTP 404 (documented separately)
- Partial OSM coverage: Only 44.6% of institutions with OSM IDs were enriched
- Reasons: Missing tags in OSM, no building-level data, fetch errors
- Coordinate precision: Not all OSM records improved coordinate precision
- OSM city-level nodes don't improve existing city-level coordinates
Conclusion
The OSM enrichment successfully added valuable metadata to 83 institutions, improving data quality with verified street addresses, contact information, and precise geographic coordinates. The 44.6% enrichment rate reflects the reality that many heritage institutions lack detailed tagging in OpenStreetMap, highlighting an opportunity for future crowdsourced contributions.
Report generated: 2025-11-06
Author: Global GLAM Dataset Project