7.1 KiB
Mexican Wikidata Enrichment Campaign - Baseline Analysis
Campaign Start Date: November 12, 2025
Dataset: data/instances/mexico/mexican_institutions_geocoded.yaml
Methodology: Following proven Brazilian campaign framework (Nov 6-11, 2025)
Current State
Coverage Statistics
- Total Mexican institutions: 117
- Current Wikidata coverage: 0/117 (0.0%)
- Institutions without Wikidata: 117 (100%)
Comparison to Brazilian Campaign
- Brazil starting point: 19.0% (24/126 institutions)
- Mexico starting point: 0.0% (0/117 institutions)
- Mexico advantage: Clean slate, no prior partial enrichment to reconcile
Institution Type Distribution
| Type | Count | % of Total | Target for Enrichment |
|---|---|---|---|
| MUSEUM | 38 | 32.5% | ✅ High priority |
| MIXED | 33 | 28.2% | ⚠️ Aggregations - selective enrichment |
| ARCHIVE | 18 | 15.4% | ✅ High priority |
| LIBRARY | 14 | 12.0% | ✅ High priority |
| OFFICIAL_INSTITUTION | 8 | 6.8% | ✅ Medium priority |
| EDUCATION_PROVIDER | 6 | 5.1% | ✅ Medium priority |
| Total | 117 | 100% |
Non-MIXED institutions: 84 (71.8% of dataset)
MIXED institutions: 33 (28.2% - aggregations, not individual institutions)
Geographic Distribution
Institutions with geocoded cities: 58/117 (49.6%)
Top 15 Cities
- Ciudad de México - 4 institutions (national institutions)
- Aguascalientes - 3 institutions
- Saltillo (Coahuila) - 3 institutions
- Oaxaca - 3 institutions
- Campeche - 2 institutions
- Chihuahua - 2 institutions
- Colima - 2 institutions
- Durango - 2 institutions
- Guadalajara (Jalisco) - 2 institutions
- Morelia (Michoacán) - 2 institutions
- Puebla - 2 institutions
- Zacatecas - 2 institutions
- Mexicali (Baja California) - 1 institution
- La Paz (Baja California Sur) - 1 institution
- Tuxtla Gutiérrez (Chiapas) - 1 institution
Note: 59 institutions (50.4%) lack precise city data - will need manual geocoding during enrichment.
Priority Candidates for Batch 1
National Institutions (Highest Priority)
These institutions are nationally significant and most likely to have Wikidata entries:
- Museo Nacional de Antropología (MUSEUM) - Mexico's flagship anthropology museum
- Museo Nacional de Arte (MUNAL) (MUSEUM) - National art museum
- Biblioteca Nacional de México (LIBRARY) - National library
- Cineteca Nacional (ARCHIVE) - National film archive, Ciudad de México
- Fototeca Nacional (ARCHIVE) - National photo archive
- Instituto Nacional de Antropología e Historia (INAH) (OFFICIAL_INSTITUTION) - Federal heritage agency
Regional INAH Museums (Second Priority)
- Museo Regional de Antropología e Historia (INAH) - Regional anthropology museum
- Museo Regional de Chiapas (INAH) - Parque Madero, Chiapas
- Museo Regional de Historia de Aguascalientes (INAH) - Aguascalientes
- Museo Regional de Sonora (INAH) - Sonora state museum
Campaign Goals
Coverage Targets
Following the proven Brazilian methodology:
- Minimum target: 65% coverage (76/117 institutions)
- Stretch target: 70% coverage (82/117 institutions)
- Focus: Non-MIXED institutions (84 total)
- 65% of 84 = 55 institutions
- 70% of 84 = 59 institutions
Quality Standards
- Match threshold: ≥0.85 confidence score
- Identifier policy: 100% real Wikidata Q-numbers (zero synthetic identifiers)
- Verification: Manual review of all matches before committing
Batch Strategy
- Batch size: 5-6 institutions per batch
- Estimated batches: 8-10 batches
- Priority order:
- National institutions (Museo Nacional, Biblioteca Nacional, etc.)
- Regional INAH museums (Museo Regional de X)
- State archives and libraries
- Municipal museums
- Specialized collections
Expected Challenges
1. Spanish Language Queries
- Similar to Brazilian Portuguese campaign
- SPARQL queries will need Spanish labels:
SERVICE wikibase:label { bd:serviceParam wikibase:language "es,en" } - Many institutions may have only Spanish Wikipedia articles
2. Complex Institutional Structure
- INAH system: Multiple "Museo Regional" institutions with same name pattern
- Federal vs. State: Mexico has both federal museums (INAH) and state museums
- Municipal archives: City-level archives with similar naming (Archivo Municipal de X)
3. Geocoding Gaps
- 50.4% of institutions lack precise city data
- Will need to infer from institution names during enrichment
- Example: "Museo Regional de Chiapas" → likely in Tuxtla Gutiérrez (state capital)
4. MIXED Institutions
- 28.2% are aggregations (digital platforms, catalogs, portals)
- Not appropriate for Wikidata enrichment (no single physical institution)
- Will skip these to maintain data quality
Campaign Timeline (Projected)
Estimated duration: 5-7 days
Based on: Brazilian campaign completed in 6 days (9 batches)
| Phase | Batches | Target Institutions | Estimated Days |
|---|---|---|---|
| Phase 1: National | Batch 1-2 | 10-12 institutions | 1-2 days |
| Phase 2: Regional | Batch 3-5 | 15-18 institutions | 2-3 days |
| Phase 3: State/Municipal | Batch 6-8 | 15-18 institutions | 2 days |
| Phase 4: Stretch Goal | Batch 9-10 (optional) | 10-12 institutions | 1-2 days |
Stop criteria: When match quality falls below 0.85 threshold or diminishing returns.
Success Metrics
Quantitative
- Minimum 65% Wikidata coverage achieved
- Zero synthetic Q-numbers generated
- Average confidence score ≥0.90
- Zero false positives (manual verification)
Qualitative
- All national institutions enriched
- Major regional museums enriched
- State archives/libraries covered
- Documentation complete for replication
Deliverables
- Batch reports for each enrichment round (8-10 reports)
- Updated Mexican institution YAML with Wikidata identifiers
- Campaign summary report (following
reports/brazil/brazil_campaign_summary.mdtemplate) - Updated PROGRESS.md with Mexican enrichment section
- Handoff document for next campaign (India or Argentina)
Next Steps
-
Execute Batch 1: Enrich 5-6 national institutions
- Museo Nacional de Antropología
- Museo Nacional de Arte (MUNAL)
- Biblioteca Nacional de México
- Cineteca Nacional
- Fototeca Nacional
- Instituto Nacional de Antropología e Historia (INAH)
-
Document results: Create
reports/mexico/batch01_report.md -
Iterate: Continue with Batch 2-8 following proven methodology
-
Monitor quality: Stop if confidence scores drop below 0.85
References
- Brazilian campaign:
reports/brazil/brazil_campaign_summary.md(67.5% coverage achieved) - Methodology:
AGENTS.md- Wikidata enrichment workflow - Schema:
schemas/core.yaml- Identifier class - Collision handling:
docs/PERSISTENT_IDENTIFIERS.md- Q-number policy
Campaign Status: ✅ Baseline complete, ready for Batch 1 execution