glam/docs/osm_enrichment_report.md
2025-11-19 23:25:22 +01:00

4.6 KiB

OpenStreetMap Enrichment Report

Date: 2025-11-06
Dataset: Latin American GLAM Institutions
Phase: 5 - OpenStreetMap Enrichment

Executive Summary

Successfully enriched 83 out of 186 institutions (44.6%) with OpenStreetMap data, adding precise coordinates, street addresses, contact information, opening hours, and alternative names.

Processing Statistics

  • Total institutions processed: 304
  • Institutions with OSM IDs: 186 (61.2%)
  • OSM records successfully fetched: 152 (81.7% fetch success rate)
  • Institutions enriched: 83 (44.6% enrichment rate)
  • OSM fetch errors: 34 (18.3%)

Enrichment Breakdown

Enrichment Type Count Description
Street addresses 33 Added detailed street addresses from addr:street and addr:housenumber tags
Contact information 19 Added phone numbers and/or email addresses
Websites 16 Added institutional website URLs
Alternative names 13 Added alternative, official, or multilingual names
Opening hours 10 Added opening hours information

Data Quality

Fetch Success Rate by Country

The OSM enrichment relied on the Overpass API, which experienced:

  • 504 Gateway Timeout errors: Server overload during peak processing
  • 429 Rate Limiting errors: Managed through 3-second delays and retry logic
  • Overall fetch success: 152/186 = 81.7%

Enrichment Quality

Enriched data includes:

  1. Precise coordinates: Building-level accuracy from OSM nodes/ways
  2. Structured addresses: Street names, house numbers, postal codes
  3. Verified contact info: Phone/email from OSM contributors
  4. Operating hours: Opening hours in OSM standard format
  5. Multilingual names: Alternative names in English, Spanish, Portuguese

All enrichments are tracked in provenance notes with timestamps and sources.

Technical Implementation

Scripts Created

  1. scripts/enrich_from_osm.py (569 lines)

    • Original implementation with retry logic
    • Rate limiting: 2 seconds between requests
    • Timeout handling with exponential backoff
  2. scripts/enrich_from_osm_batched.py (452 lines)

    • Batch processing (20 institutions per batch)
    • Incremental progress saving
    • Resilient to timeouts
  3. scripts/resume_osm_enrichment.py (365 lines)

    • Resume from institution 101
    • Extended rate limiting (3 seconds)
    • Completed remaining 204 institutions

Overpass API Configuration

  • Primary endpoint: https://overpass-api.de/api/interpreter
  • Mirror failover: Kumi Systems, OpenStreetMap Russia
  • Query timeout: 30 seconds
  • Rate limiting: 2-3 seconds between requests
  • Retry logic: Max 3 attempts with 10-second delays

Example Enrichments

Museu Sacaca (Amapá, Brazil)

  • Added: Street address (Avenida Feliciano Coelho 1502)
  • Added: Postal code
  • Added: Website

Teatro da Paz (Pará, Brazil)

  • Added: Full street address
  • Added: Phone (+55 91 98590-3523)
  • Added: Website
  • Added: 2 alternative names

Universidade Federal do Piauí

  • Added: Coordinates (building-level precision)
  • Added: Complete address
  • Added: Phone and email
  • Added: Website
  • Added: Opening hours

Output Files

  • Primary output: data/instances/latin_american_institutions_osm_enriched.yaml (456 KB)
  • Processing log: osm_resume_log.txt
  • This report: docs/osm_enrichment_report.md

Next Steps

  1. Generate exports: JSON-LD, CSV, GeoJSON for geographic visualization
  2. Update PROGRESS.md: Document Phase 4-5 findings
  3. Manual review: Verify enrichment quality for high-value institutions
  4. National library outreach: Send emails to request ISIL codes

Known Issues

  1. VIAF API unavailable: All 19 VIAF IDs return HTTP 404 (documented separately)
  2. Partial OSM coverage: Only 44.6% of institutions with OSM IDs were enriched
    • Reasons: Missing tags in OSM, no building-level data, fetch errors
  3. Coordinate precision: Not all OSM records improved coordinate precision
    • OSM city-level nodes don't improve existing city-level coordinates

Conclusion

The OSM enrichment successfully added valuable metadata to 83 institutions, improving data quality with verified street addresses, contact information, and precise geographic coordinates. The 44.6% enrichment rate reflects the reality that many heritage institutions lack detailed tagging in OpenStreetMap, highlighting an opportunity for future crowdsourced contributions.


Report generated: 2025-11-06
Author: Global GLAM Dataset Project