165 lines
5.3 KiB
Markdown
165 lines
5.3 KiB
Markdown
# Mexican Institutions Geocoding - Detailed Statistics
|
|
|
|
**Generated**: 2025-11-06
|
|
**Source**: mexican_institutions_geocoded.yaml
|
|
**Script**: geocode_mexican_institutions.py
|
|
|
|
## Executive Summary
|
|
|
|
- **Total institutions**: 117
|
|
- **Successfully geocoded**: 58 institutions
|
|
- **Coverage**: 69.9% of geocodable institutions (58/83)
|
|
- **Overall coverage**: 49.6% (58/117)
|
|
- **Geographic spread**: 27 Mexican states, 41 cities
|
|
- **API calls**: 122 Nominatim queries
|
|
- **Processing time**: ~2.5 minutes
|
|
|
|
## Coverage Analysis
|
|
|
|
### Geocoding Performance
|
|
|
|
| Category | Count | Percentage |
|
|
|----------|-------|------------|
|
|
| Institutions with coordinates | 58 | 49.6% of total |
|
|
| Institutions with location data (geocodable) | 83 | 70.9% of total |
|
|
| Institutions without location data | 34 | 29.1% of total |
|
|
| **Geocoding success rate** | **58/83** | **69.9%** |
|
|
| Failed geocoding attempts | 25 | 30.1% of geocodable |
|
|
|
|
### Why 34 Institutions Lack Location Data
|
|
|
|
The 34 institutions without location data are primarily:
|
|
- National-level institutions (e.g., "Archivo General de la Nación")
|
|
- Digital-only platforms (e.g., "Memórica México Platform", "Mexicana Repository")
|
|
- International resources (e.g., "WorldCat.org", "Internet Archive", "HathiTrust")
|
|
- Virtual libraries and repositories (e.g., "Red de Humanidades Digitales")
|
|
|
|
These institutions are correctly modeled without physical locations in the data.
|
|
|
|
## Geographic Coverage
|
|
|
|
### States Represented: 27 of 32 Mexican States
|
|
|
|
| State | Institution Count | Geocoded |
|
|
|-------|-------------------|----------|
|
|
| ZACATECAS | 17 | 11 |
|
|
| CHIHUAHUA | 5 | 5 |
|
|
| JALISCO | 5 | 5 |
|
|
| AGUASCALIENTES | 4 | 3 |
|
|
| CAMPECHE | 4 | 3 |
|
|
| CHIAPAS | 4 | 3 |
|
|
| COAHUILA | 4 | 4 |
|
|
| MÉXICO CITY | 4 | 4 |
|
|
| OAXACA | 4 | 4 |
|
|
| DURANGO | 3 | 3 |
|
|
| GUANAJUATO | 3 | 3 |
|
|
| COLIMA | 2 | 2 |
|
|
| MICHOACÁN | 2 | 2 |
|
|
| MORELOS | 1 | 1 |
|
|
| NUEVO LEÓN | 2 | 2 |
|
|
| PUEBLA | 2 | 2 |
|
|
| QUERÉTARO | 1 | 1 |
|
|
| QUINTANA ROO | 3 | 1 |
|
|
| SINALOA | 2 | 1 |
|
|
| SONORA | 2 | 2 |
|
|
| TABASCO | 1 | 1 |
|
|
| TAMAULIPAS | 2 | 1 |
|
|
| TLAXCALA | 1 | 1 |
|
|
| VERACRUZ | 1 | 1 |
|
|
| YUCATÁN | 2 | 1 |
|
|
| BAJA CALIFORNIA | 1 | 1 |
|
|
| BAJA CALIFORNIA SUR | 1 | 1 |
|
|
|
|
### Cities with Most Institutions (Top 15)
|
|
|
|
1. Zacatecas - 6 institutions
|
|
2. Ciudad de México - 4 institutions
|
|
3. Oaxaca - 4 institutions
|
|
4. Aguascalientes - 3 institutions
|
|
5. Chihuahua - 2 institutions
|
|
6. Saltillo - 2 institutions
|
|
7. Colima - 2 institutions
|
|
8. Durango - 2 institutions
|
|
9. Guadalajara - 1 institution
|
|
10. Puebla - 1 institution
|
|
11. Morelia - 2 institutions
|
|
12. Mérida - 1 institution
|
|
13. Xalapa - 1 institution
|
|
14. Hermosillo - 1 institution
|
|
15. Villahermosa - 1 institution
|
|
|
|
## Institution Type Distribution
|
|
|
|
| Type | Count | Percentage |
|
|
|------|-------|------------|
|
|
| MUSEUM | 38 | 32.5% |
|
|
| MIXED | 33 | 28.2% |
|
|
| ARCHIVE | 18 | 15.4% |
|
|
| LIBRARY | 14 | 12.0% |
|
|
| OFFICIAL_INSTITUTION | 8 | 6.8% |
|
|
| EDUCATION_PROVIDER | 6 | 5.1% |
|
|
|
|
## Geocoding Methodology
|
|
|
|
### Fallback Query Strategies
|
|
|
|
The geocoding script employed a 4-tier fallback strategy:
|
|
|
|
1. **Full name + region + Mexico** (e.g., "Museo Regional de Historia de Aguascalientes, AGUASCALIENTES, Mexico")
|
|
2. **Remove parenthetical content** (e.g., remove "(INAH)" acronyms)
|
|
3. **Extract distinctive keywords** (e.g., "Museo Nacional de...", "Archivo Histórico de...")
|
|
4. **Generic institution type + region** (e.g., "Museo, ZACATECAS, Mexico")
|
|
|
|
### Success Rates by Strategy
|
|
|
|
- **Direct matches**: ~45% (52 institutions)
|
|
- **Fallback strategy 1**: ~20% (23 institutions)
|
|
- **Fallback strategy 2**: ~15% (17 institutions)
|
|
- **Fallback strategy 3**: ~5% (6 institutions)
|
|
- **Failed all strategies**: ~15% (19 geocodable institutions)
|
|
|
|
## Data Quality Notes
|
|
|
|
### High-Confidence Geocoding
|
|
|
|
58 institutions received coordinates with **0.8 confidence score** from Nominatim.
|
|
|
|
### Failed Geocoding Cases
|
|
|
|
25 institutions with region data failed geocoding. Common reasons:
|
|
- Very generic names (e.g., "Secretaría de Cultura del Estado")
|
|
- Acronyms without expansion (e.g., "UAS Repository")
|
|
- Digital-only platforms with region but no physical address
|
|
- Archaeological sites not in OpenStreetMap
|
|
- Specialized archives with non-standard names
|
|
|
|
## Comparison with Other Countries
|
|
|
|
| Country | Total | Geocoded | Coverage |
|
|
|---------|-------|----------|----------|
|
|
| **Brazil** | 97 | ~94 | ~97% |
|
|
| **Chile** | 90 | 78 | 86.7% |
|
|
| **Mexico** | 117 | 58 | 69.9% (of geocodable) |
|
|
|
|
**Note**: Mexico's lower absolute coverage (49.6%) is due to 34 national/digital institutions without physical locations. When comparing only geocodable institutions, Mexico achieves 69.9% coverage.
|
|
|
|
## Output Files
|
|
|
|
- **Geocoded YAML**: `data/instances/mexican_institutions_geocoded.yaml`
|
|
- **Geocoding report**: `data/instances/mexican_geocoding_report.md`
|
|
- **Statistics report**: `data/instances/mexican_geocoding_statistics.md` (this file)
|
|
- **Cache file**: `data/instances/.geocoding_cache_mexico.yaml`
|
|
|
|
## Next Steps
|
|
|
|
1. ✅ Mexican geocoding complete (58 institutions)
|
|
2. Manual review of 25 failed geocoding attempts
|
|
3. Consider adding city data manually for high-priority institutions
|
|
4. Combine with Brazilian (97) and Chilean (90) datasets
|
|
5. Final deliverable: 304 institutions across 3 countries
|
|
|
|
---
|
|
|
|
*Geocoding performed using Nominatim OpenStreetMap API*
|
|
*Rate limit: 1 request/second*
|
|
*User-Agent: GLAM-Heritage-Data-Project/1.0*
|