6.6 KiB
6.6 KiB
Brazilian GLAM Geocoding Enrichment Report - v3.0
Generated: 2025-11-06 09:34:04
Summary
This report documents the geocoding enrichment process for Brazilian heritage institutions using the Nominatim API (OpenStreetMap).
Input/Output Files
- Input:
brazilian_institutions_curated_v2.yaml - Output:
brazilian_institutions_geocoded_v3.yaml - Cache:
geocoding_cache.yaml
Overall Statistics
| Metric | Before (v2) | After (v3) | Change |
|---|---|---|---|
| Total records | 97 | 97 | - |
| Records with cities | 8 (8.2%) | 58 (59.8%) | +50 |
| Records with coordinates | 0 (0.0%) | 50 (51.5%) | +50 |
| OpenStreetMap identifiers | 0 | 50 | +50 |
Geocoding Performance
| Category | Count | Percentage |
|---|---|---|
| Already had cities | 8 | 8.2% |
| Successfully geocoded | 50 | 51.5% |
| Failed geocoding | 39 | 40.2% |
| Total with cities (v3) | 58 | 59.8% |
Target Achievement
- Target: 60% city coverage (58 records minimum)
- Achieved: 58 records (59.8%)
- Status: ✓ TARGET MET
Geographic Distribution
Cities Found (41 unique cities)
Top 15 cities by institution count:
- Belém: 4 institutions
- Brasília: 3 institutions
- Recife: 3 institutions
- Rio de Janeiro: 3 institutions
- Rio Branco: 2 institutions
- União dos Palmares: 2 institutions
- Macapá: 2 institutions
- Manaus: 2 institutions
- Campo Grande: 2 institutions
- Teresina: 2 institutions
- Natal: 2 institutions
- Palmas: 2 institutions
- Maceió: 1 institution
- Santo Amaro: 1 institution
- Salvador: 1 institution
States with Geocoded Institutions (26 states)
- ACRE: 2 institutions in 1 city
- ALAGOAS: 3 institutions in 2 cities
- AMAPÁ: 2 institutions in 1 city
- AMAZONAS: 2 institutions in 1 city
- BAHIA: 2 institutions in 2 cities
- CEARÁ: 2 institutions in 2 cities
- DISTRITO FEDERAL: 3 institutions in 1 city
- GOIÁS: 1 institutions in 1 city
- MARANHÃO: 3 institutions in 3 cities
- MATO GROSSO: 1 institutions in 1 city
- MATO GROSSO DO SUL: 2 institutions in 1 city
- MINAS GERAIS: 2 institutions in 2 cities
- PARANÁ: 3 institutions in 3 cities
- PARAÍBA: 3 institutions in 3 cities
- PARÁ: 4 institutions in 1 city
- PERNAMBUCO: 3 institutions in 1 city
- PIAUÍ: 2 institutions in 1 city
- RIO DE JANEIRO: 2 institutions in 1 city
- RIO GRANDE DO NORTE: 3 institutions in 2 cities
- RIO GRANDE DO SUL: 2 institutions in 2 cities
- RONDÔNIA: 2 institutions in 2 cities
- RORAIMA: 1 institutions in 1 city
- SANTA CATARINA: 1 institutions in 1 city
- SERGIPE: 3 institutions in 3 cities
- SÃO PAULO: 2 institutions in 2 cities
- TOCANTINS: 2 institutions in 1 city
Failed Geocoding Attempts (39 institutions)
These institutions have state information but could not be geocoded:
- Fundação de Cultura Elias Mansour (OFFICIAL_INSTITUTION) - ACRE
- UFAC Repository (EDUCATION_PROVIDER) - ACRE
- Instituto Histórico e Geográfico de Alagoas (RESEARCH_CENTER) - ALAGOAS
- SECULT (OFFICIAL_INSTITUTION) - AMAPÁ
- Museu de Arqueologia e Etnologia (MUSEUM) - AMAPÁ
- CEPAP-UNIFAP (EDUCATION_PROVIDER) - AMAPÁ
- Centro Cultural Povos da Amazônia (MIXED) - AMAZONAS
- FPC/IPAC (OFFICIAL_INSTITUTION) - BAHIA
- UFBA Repository (EDUCATION_PROVIDER) - BAHIA
- UFC Repository (EDUCATION_PROVIDER) - CEARÁ
- Mapa Cultural (OFFICIAL_INSTITUTION) - CEARÁ
- UFES Digital Libraries (EDUCATION_PROVIDER) - ESPÍRITO SANTO
- State Archives (ARCHIVE) - ESPÍRITO SANTO
- UNESCO Goiás Velho (MUSEUM) - GOIÁS
- UFG Repositories (EDUCATION_PROVIDER) - GOIÁS
- Casa das Minas/Casa de Nagô (MIXED) - MARANHÃO
- MUSEAR/UFMT (MUSEUM) - MATO GROSSO
- Guarani-Kaiowá Projects (MIXED) - MATO GROSSO DO SUL
- UFMS Repositories (EDUCATION_PROVIDER) - MATO GROSSO DO SUL
- UFMG Tainacan Lab (EDUCATION_PROVIDER) - MINAS GERAIS
- MM Gerdau (MIXED) - MINAS GERAIS
- DEAP Archives (ARCHIVE) - PARANÁ
- UFPB/UEPB (EDUCATION_PROVIDER) - PARAÍBA
- MEPE/IAHGP (MIXED) - PERNAMBUCO
- FUMDHAM (MIXED) - PIAUÍ
- FCRB (OFFICIAL_INSTITUTION) - RIO DE JANEIRO
- MAR/MAM (MUSEUM) - RIO DE JANEIRO
- Museu Tronco, Ramos e Raízes (MUSEUM) - RIO GRANDE DO NORTE
- UFRGS LUME (EDUCATION_PROVIDER) - RIO GRANDE DO SUL
- Railway Museum (MUSEUM) - RONDÔNIA
- Instituto Insikiran (MIXED) - RORAIMA
- UFSC Digital Art (EDUCATION_PROVIDER) - SANTA CATARINA
- Tainacan implementations (MIXED) - SANTA CATARINA
- APESP (MIXED) - SÃO PAULO
- USP/UNICAMP/UNESP (EDUCATION_PROVIDER) - SÃO PAULO
- Jalapão Heritage (RESEARCH_CENTER) - TOCANTINS
- Secult (OFFICIAL_INSTITUTION) - TOCANTINS
- Brasiliana Museus (MUSEUM) - TOCANTINS
- Hemeroteca Digital (MIXED) - TOCANTINS
API Cache Statistics
| Metric | Value |
|---|---|
| Total cache entries | 89 |
| Successful lookups | 50 (56.2%) |
| Failed lookups | 39 (43.8%) |
Data Quality Enhancements
The geocoding process added:
- City names - Extracted from OpenStreetMap address data
- Geographic coordinates - Latitude/longitude for mapping
- OpenStreetMap identifiers - OSM type/ID for cross-referencing
- Provenance updates - Extraction timestamps and confidence adjustments
Confidence Score Adjustments
- Successfully geocoded records received a +0.05 confidence boost (capped at 0.85)
- Extraction method updated to include "+ Nominatim geocoding"
Next Steps
Recommended Improvements
- Manual verification of failed geocoding attempts (39 institutions)
- Website enrichment - Extract URLs to improve coverage from current 9.3%
- Wikidata integration - Cross-reference institutions with Wikidata Q-IDs
- Address enrichment - Add street addresses where available
- Collection metadata - Extract collection information from institutional websites
Priority Actions
- Review failed geocoding cases to identify patterns
- Attempt alternative geocoding strategies (city+state only, abbreviations, etc.)
- Cross-reference with IBRAM registry for official museum locations
- Implement web scraping for institutional websites
Technical Notes
- API: OpenStreetMap Nominatim
- Rate limiting: 1.1 seconds per request
- Total processing time: ~1.6 minutes
- Cache format: YAML (persistent across runs)
- User-Agent: GLAM-Data-Extraction/0.2.0
Report Version: 3.0
Data Version: v3 (geocoded)
Schema Compliance: LinkML v0.2.0
Generated by: generate_geocoding_report.py