124 lines
4.6 KiB
Markdown
124 lines
4.6 KiB
Markdown
# OpenStreetMap Enrichment Report
|
|
|
|
**Date**: 2025-11-06
|
|
**Dataset**: Latin American GLAM Institutions
|
|
**Phase**: 5 - OpenStreetMap Enrichment
|
|
|
|
## Executive Summary
|
|
|
|
Successfully enriched **83 out of 186 institutions** (44.6%) with OpenStreetMap data, adding precise coordinates, street addresses, contact information, opening hours, and alternative names.
|
|
|
|
## Processing Statistics
|
|
|
|
- **Total institutions processed**: 304
|
|
- **Institutions with OSM IDs**: 186 (61.2%)
|
|
- **OSM records successfully fetched**: 152 (81.7% fetch success rate)
|
|
- **Institutions enriched**: 83 (44.6% enrichment rate)
|
|
- **OSM fetch errors**: 34 (18.3%)
|
|
|
|
## Enrichment Breakdown
|
|
|
|
| Enrichment Type | Count | Description |
|
|
|---|---|---|
|
|
| **Street addresses** | 33 | Added detailed street addresses from addr:street and addr:housenumber tags |
|
|
| **Contact information** | 19 | Added phone numbers and/or email addresses |
|
|
| **Websites** | 16 | Added institutional website URLs |
|
|
| **Alternative names** | 13 | Added alternative, official, or multilingual names |
|
|
| **Opening hours** | 10 | Added opening hours information |
|
|
|
|
## Data Quality
|
|
|
|
### Fetch Success Rate by Country
|
|
|
|
The OSM enrichment relied on the Overpass API, which experienced:
|
|
- **504 Gateway Timeout errors**: Server overload during peak processing
|
|
- **429 Rate Limiting errors**: Managed through 3-second delays and retry logic
|
|
- **Overall fetch success**: 152/186 = 81.7%
|
|
|
|
### Enrichment Quality
|
|
|
|
Enriched data includes:
|
|
1. **Precise coordinates**: Building-level accuracy from OSM nodes/ways
|
|
2. **Structured addresses**: Street names, house numbers, postal codes
|
|
3. **Verified contact info**: Phone/email from OSM contributors
|
|
4. **Operating hours**: Opening hours in OSM standard format
|
|
5. **Multilingual names**: Alternative names in English, Spanish, Portuguese
|
|
|
|
All enrichments are tracked in provenance notes with timestamps and sources.
|
|
|
|
## Technical Implementation
|
|
|
|
### Scripts Created
|
|
|
|
1. **`scripts/enrich_from_osm.py`** (569 lines)
|
|
- Original implementation with retry logic
|
|
- Rate limiting: 2 seconds between requests
|
|
- Timeout handling with exponential backoff
|
|
|
|
2. **`scripts/enrich_from_osm_batched.py`** (452 lines)
|
|
- Batch processing (20 institutions per batch)
|
|
- Incremental progress saving
|
|
- Resilient to timeouts
|
|
|
|
3. **`scripts/resume_osm_enrichment.py`** (365 lines)
|
|
- Resume from institution 101
|
|
- Extended rate limiting (3 seconds)
|
|
- Completed remaining 204 institutions
|
|
|
|
### Overpass API Configuration
|
|
|
|
- **Primary endpoint**: `https://overpass-api.de/api/interpreter`
|
|
- **Mirror failover**: Kumi Systems, OpenStreetMap Russia
|
|
- **Query timeout**: 30 seconds
|
|
- **Rate limiting**: 2-3 seconds between requests
|
|
- **Retry logic**: Max 3 attempts with 10-second delays
|
|
|
|
## Example Enrichments
|
|
|
|
### Museu Sacaca (Amapá, Brazil)
|
|
- **Added**: Street address (Avenida Feliciano Coelho 1502)
|
|
- **Added**: Postal code
|
|
- **Added**: Website
|
|
|
|
### Teatro da Paz (Pará, Brazil)
|
|
- **Added**: Full street address
|
|
- **Added**: Phone (+55 91 98590-3523)
|
|
- **Added**: Website
|
|
- **Added**: 2 alternative names
|
|
|
|
### Universidade Federal do Piauí
|
|
- **Added**: Coordinates (building-level precision)
|
|
- **Added**: Complete address
|
|
- **Added**: Phone and email
|
|
- **Added**: Website
|
|
- **Added**: Opening hours
|
|
|
|
## Output Files
|
|
|
|
- **Primary output**: `data/instances/latin_american_institutions_osm_enriched.yaml` (456 KB)
|
|
- **Processing log**: `osm_resume_log.txt`
|
|
- **This report**: `docs/osm_enrichment_report.md`
|
|
|
|
## Next Steps
|
|
|
|
1. **Generate exports**: JSON-LD, CSV, GeoJSON for geographic visualization
|
|
2. **Update PROGRESS.md**: Document Phase 4-5 findings
|
|
3. **Manual review**: Verify enrichment quality for high-value institutions
|
|
4. **National library outreach**: Send emails to request ISIL codes
|
|
|
|
## Known Issues
|
|
|
|
1. **VIAF API unavailable**: All 19 VIAF IDs return HTTP 404 (documented separately)
|
|
2. **Partial OSM coverage**: Only 44.6% of institutions with OSM IDs were enriched
|
|
- Reasons: Missing tags in OSM, no building-level data, fetch errors
|
|
3. **Coordinate precision**: Not all OSM records improved coordinate precision
|
|
- OSM city-level nodes don't improve existing city-level coordinates
|
|
|
|
## Conclusion
|
|
|
|
The OSM enrichment successfully added valuable metadata to 83 institutions, improving data quality with verified street addresses, contact information, and precise geographic coordinates. The 44.6% enrichment rate reflects the reality that many heritage institutions lack detailed tagging in OpenStreetMap, highlighting an opportunity for future crowdsourced contributions.
|
|
|
|
---
|
|
|
|
**Report generated**: 2025-11-06
|
|
**Author**: Global GLAM Dataset Project
|