# Mexican Wikidata Enrichment Campaign - Baseline Analysis **Campaign Start Date:** November 12, 2025 **Dataset:** `data/instances/mexico/mexican_institutions_geocoded.yaml` **Methodology:** Following proven Brazilian campaign framework (Nov 6-11, 2025) --- ## Current State ### Coverage Statistics - **Total Mexican institutions:** 117 - **Current Wikidata coverage:** 0/117 (0.0%) - **Institutions without Wikidata:** 117 (100%) ### Comparison to Brazilian Campaign - **Brazil starting point:** 19.0% (24/126 institutions) - **Mexico starting point:** 0.0% (0/117 institutions) - **Mexico advantage:** Clean slate, no prior partial enrichment to reconcile --- ## Institution Type Distribution | Type | Count | % of Total | Target for Enrichment | |------|-------|------------|----------------------| | MUSEUM | 38 | 32.5% | ✅ High priority | | MIXED | 33 | 28.2% | ⚠️ Aggregations - selective enrichment | | ARCHIVE | 18 | 15.4% | ✅ High priority | | LIBRARY | 14 | 12.0% | ✅ High priority | | OFFICIAL_INSTITUTION | 8 | 6.8% | ✅ Medium priority | | EDUCATION_PROVIDER | 6 | 5.1% | ✅ Medium priority | | **Total** | **117** | **100%** | | **Non-MIXED institutions:** 84 (71.8% of dataset) **MIXED institutions:** 33 (28.2% - aggregations, not individual institutions) --- ## Geographic Distribution **Institutions with geocoded cities:** 58/117 (49.6%) ### Top 15 Cities 1. **Ciudad de México** - 4 institutions (national institutions) 2. **Aguascalientes** - 3 institutions 3. **Saltillo** (Coahuila) - 3 institutions 4. **Oaxaca** - 3 institutions 5. **Campeche** - 2 institutions 6. **Chihuahua** - 2 institutions 7. **Colima** - 2 institutions 8. **Durango** - 2 institutions 9. **Guadalajara** (Jalisco) - 2 institutions 10. **Morelia** (Michoacán) - 2 institutions 11. **Puebla** - 2 institutions 12. **Zacatecas** - 2 institutions 13. **Mexicali** (Baja California) - 1 institution 14. **La Paz** (Baja California Sur) - 1 institution 15. **Tuxtla Gutiérrez** (Chiapas) - 1 institution **Note:** 59 institutions (50.4%) lack precise city data - will need manual geocoding during enrichment. --- ## Priority Candidates for Batch 1 ### National Institutions (Highest Priority) These institutions are nationally significant and most likely to have Wikidata entries: 1. **Museo Nacional de Antropología** (MUSEUM) - Mexico's flagship anthropology museum 2. **Museo Nacional de Arte (MUNAL)** (MUSEUM) - National art museum 3. **Biblioteca Nacional de México** (LIBRARY) - National library 4. **Cineteca Nacional** (ARCHIVE) - National film archive, Ciudad de México 5. **Fototeca Nacional** (ARCHIVE) - National photo archive 6. **Instituto Nacional de Antropología e Historia (INAH)** (OFFICIAL_INSTITUTION) - Federal heritage agency ### Regional INAH Museums (Second Priority) 7. **Museo Regional de Antropología e Historia (INAH)** - Regional anthropology museum 8. **Museo Regional de Chiapas (INAH)** - Parque Madero, Chiapas 9. **Museo Regional de Historia de Aguascalientes (INAH)** - Aguascalientes 10. **Museo Regional de Sonora (INAH)** - Sonora state museum --- ## Campaign Goals ### Coverage Targets Following the proven Brazilian methodology: - **Minimum target:** 65% coverage (76/117 institutions) - **Stretch target:** 70% coverage (82/117 institutions) - **Focus:** Non-MIXED institutions (84 total) - 65% of 84 = 55 institutions - 70% of 84 = 59 institutions ### Quality Standards - **Match threshold:** ≥0.85 confidence score - **Identifier policy:** 100% real Wikidata Q-numbers (zero synthetic identifiers) - **Verification:** Manual review of all matches before committing ### Batch Strategy - **Batch size:** 5-6 institutions per batch - **Estimated batches:** 8-10 batches - **Priority order:** 1. National institutions (Museo Nacional, Biblioteca Nacional, etc.) 2. Regional INAH museums (Museo Regional de X) 3. State archives and libraries 4. Municipal museums 5. Specialized collections --- ## Expected Challenges ### 1. Spanish Language Queries - Similar to Brazilian Portuguese campaign - SPARQL queries will need Spanish labels: `SERVICE wikibase:label { bd:serviceParam wikibase:language "es,en" }` - Many institutions may have only Spanish Wikipedia articles ### 2. Complex Institutional Structure - **INAH system:** Multiple "Museo Regional" institutions with same name pattern - **Federal vs. State:** Mexico has both federal museums (INAH) and state museums - **Municipal archives:** City-level archives with similar naming (Archivo Municipal de X) ### 3. Geocoding Gaps - 50.4% of institutions lack precise city data - Will need to infer from institution names during enrichment - Example: "Museo Regional de Chiapas" → likely in Tuxtla Gutiérrez (state capital) ### 4. MIXED Institutions - 28.2% are aggregations (digital platforms, catalogs, portals) - Not appropriate for Wikidata enrichment (no single physical institution) - Will skip these to maintain data quality --- ## Campaign Timeline (Projected) **Estimated duration:** 5-7 days **Based on:** Brazilian campaign completed in 6 days (9 batches) | Phase | Batches | Target Institutions | Estimated Days | |-------|---------|---------------------|----------------| | Phase 1: National | Batch 1-2 | 10-12 institutions | 1-2 days | | Phase 2: Regional | Batch 3-5 | 15-18 institutions | 2-3 days | | Phase 3: State/Municipal | Batch 6-8 | 15-18 institutions | 2 days | | Phase 4: Stretch Goal | Batch 9-10 (optional) | 10-12 institutions | 1-2 days | **Stop criteria:** When match quality falls below 0.85 threshold or diminishing returns. --- ## Success Metrics ### Quantitative - [ ] Minimum 65% Wikidata coverage achieved - [ ] Zero synthetic Q-numbers generated - [ ] Average confidence score ≥0.90 - [ ] Zero false positives (manual verification) ### Qualitative - [ ] All national institutions enriched - [ ] Major regional museums enriched - [ ] State archives/libraries covered - [ ] Documentation complete for replication ### Deliverables - [ ] Batch reports for each enrichment round (8-10 reports) - [ ] Updated Mexican institution YAML with Wikidata identifiers - [ ] Campaign summary report (following `reports/brazil/brazil_campaign_summary.md` template) - [ ] Updated PROGRESS.md with Mexican enrichment section - [ ] Handoff document for next campaign (India or Argentina) --- ## Next Steps 1. **Execute Batch 1:** Enrich 5-6 national institutions - Museo Nacional de Antropología - Museo Nacional de Arte (MUNAL) - Biblioteca Nacional de México - Cineteca Nacional - Fototeca Nacional - Instituto Nacional de Antropología e Historia (INAH) 2. **Document results:** Create `reports/mexico/batch01_report.md` 3. **Iterate:** Continue with Batch 2-8 following proven methodology 4. **Monitor quality:** Stop if confidence scores drop below 0.85 --- ## References - **Brazilian campaign:** `reports/brazil/brazil_campaign_summary.md` (67.5% coverage achieved) - **Methodology:** `AGENTS.md` - Wikidata enrichment workflow - **Schema:** `schemas/core.yaml` - Identifier class - **Collision handling:** `docs/PERSISTENT_IDENTIFIERS.md` - Q-number policy **Campaign Status:** ✅ Baseline complete, ready for Batch 1 execution