4.3 KiB
Mexican Dataset Reconciliation Report
Generated: 2025-11-13T09:55:41.451246
Executive Summary
This report documents the reconciliation between the standalone Mexican dataset and the global unified dataset created during the November 11, 2025 unification process.
Dataset Overview
| Dataset | Institutions | Wikidata Coverage |
|---|---|---|
Standalone (mexican_institutions_geocoded.yaml) |
117 | 10 (8.5%) |
| Global - Mexican Subset (extracted from global file) | 108 | 55 (50.9%) |
| Difference | 9 institutions | +45 Wikidata IDs |
Key Findings
1. Missing Institutions (9 from Standalone)
The following 9 institutions appear in the standalone file but NOT in the global Mexican subset:
- CLACSO Virtual Libraries (Type: MIXED)
- HathiTrust Digital Library (Type: LIBRARY)
- Internet Archive (Type: ARCHIVE)
- Latin American Network Information Center (LANIC) (Type: MIXED)
- Library of Congress Hispanic Reading Room (Type: LIBRARY)
- Nettie Lee Benson Collection (UT Austin) (Type: MIXED)
- WorldCat Registry (Type: MIXED)
- WorldCat.org (Type: MIXED)
Note: A 9th institution, Fonoteca Nacional, appeared in this list but was found to exist in the global file (without country metadata, making it invisible in the Mexican subset filter). This has been corrected.
Analysis: All 8 core missing institutions are non-Mexican international digital platforms (HathiTrust, Internet Archive, CLACSO, etc.). These were correctly filtered out during the November 11 unification as they are not Mexican heritage custodians.
Recommendation: ✅ No action needed - filtering was appropriate.
2. Wikidata Identifier Corrections
During reconciliation, the following Wikidata corrections were made:
| Institution | Issue | Resolution |
|---|---|---|
| Fototeca Nacional | Had wrong Wikidata ID (Q5411481 = Fonoteca) | ✅ Corrected to Q66432183 |
| Instituto Nacional de Antropología e Historia | Missing Wikidata ID Q901361 | ✅ Added Q901361 |
| Fonoteca Nacional | Duplicate entries, one missing Wikidata | ✅ Merged duplicates, added Q5411481 |
3. Wikidata Enrichment Analysis
The global dataset shows dramatic improvement in Wikidata coverage:
- Standalone: 10 Wikidata IDs (8.5%)
- Global: 55 Wikidata IDs (50.9%)
- Net gain: +45 Wikidata identifiers
Source of enrichment:
- 23 institutions have enrichment history records
- Enrichment occurred during November 11-13 unification process
- Methods: Wikidata SPARQL queries, fuzzy matching, manual verification
Recommendations
✅ Completed Actions
- Corrected Fototeca Nacional Wikidata ID: Q5411481 → Q66432183
- Added INAH Wikidata ID: Q901361
- Cleaned up Fonoteca Nacional duplicates
- Verified international platform filtering
🎯 Next Steps
- Update baseline report (
reports/mexico/baseline_analysis.md) to reference global dataset - Document the 53 institutions without Wikidata (50.9% coverage leaves room for improvement)
- Create enrichment plan for remaining 53 institutions
- Archive standalone dataset with clear documentation that global is now authoritative
Files Updated
- ✅
data/instances/all/globalglam-20251113-mexico-deduplicated.yaml- Corrected Wikidata IDs - 📝
reports/mexico/reconciliation_report.md- This report
Appendix: Data Quality Metrics
Institution Type Distribution (Mexican Subset)
MUSEUM 38 (35.2%)
MIXED 27 (25.0%)
ARCHIVE 17 (15.7%)
LIBRARY 12 (11.1%)
OFFICIAL_INSTITUTION 8 ( 7.4%)
EDUCATION_PROVIDER 6 ( 5.6%)
Geographic Coverage
Top 10 cities by institution count:
Unknown 25 (23.1%)
Mexico City 24 (22.2%)
Ciudad de México 4 ( 3.7%)
Aguascalientes 3 ( 2.8%)
Saltillo 3 ( 2.8%)
Oaxaca 3 ( 2.8%)
Campeche 2 ( 1.9%)
Chihuahua 2 ( 1.9%)
Colima 2 ( 1.9%)
Durango 2 ( 1.9%)