9.4 KiB
Mexican Wikidata Enrichment - Batch 1 Report
Campaign: Mexican Heritage Institutions Wikidata Enrichment
Batch: 1 (National Priority Institutions)
Date: November 12, 2025
Operator: AI Agent (OpenCODE)
Executive Summary
Successfully enriched 6 national priority Mexican heritage institutions with Wikidata identifiers, achieving 5.1% coverage (6/117 institutions). This represents the foundation of the Mexican enrichment campaign, focusing on the most significant cultural institutions with verified Wikidata presence.
Key Metrics:
- Institutions enriched: 6
- Wikidata identifiers added: 6
- VIAF identifiers added: 4
- ISIL codes added: 1
- Coverage increase: 0.0% → 5.1%
- Average confidence score: 0.97
Batch 1 Institutions
1. Museo Nacional de Antropología
Status: ✅ Enriched
Wikidata: Q524249 (view)
VIAF: 139462066 (view)
Institution Type: MUSEUM
Confidence: 0.98
Verification:
- ✅ SPARQL query confirmed Q524249 matches "Museo Nacional de Antropología"
- ✅ VIAF record 139462066 matches institution name
- ✅ Website https://mna.inah.gob.mx/ matches Wikidata official website property
Notable Collections:
- Pre-Columbian artifacts
- Aztec and Maya collections
- Anthropological and archaeological materials
2. Museo Nacional de Arte (MUNAL)
Status: ✅ Enriched
Wikidata: Q1138147 (view)
VIAF: 137951343 (view)
Institution Type: MUSEUM
Confidence: 0.98
Verification:
- ✅ SPARQL query confirmed Q1138147 matches "Museo Nacional de Arte"
- ✅ VIAF record 137951343 matches institution name and location (Mexico City)
- ✅ Website https://munal.mx/ matches Wikidata official website property
Notable Collections:
- Mexican art from 16th to 20th century
- Colonial to modern period collections
- Digital catalog at https://munal.emuseum.com/
3. Biblioteca Nacional de México
Status: ✅ Enriched
Wikidata: Q5495070 (view)
VIAF: 147873206 (view)
ISIL: MX-MXBN (view)
Institution Type: LIBRARY
Confidence: 0.98
Verification:
- ✅ SPARQL query confirmed Q5495070 matches "Biblioteca Nacional de México"
- ✅ VIAF record 147873206 matches institution name
- ✅ ISIL code MX-MXBN registered in international ISIL registry
- ✅ Website https://bnm.iib.unam.mx/ matches Wikidata official website
Notable Collections:
- Part of UNAM (Universidad Nacional Autónoma de México)
- Historical Mexican bibliographic materials
- Digital catalogs: LibrUNAM, UNAM catalog
4. Cineteca Nacional
Status: ✅ Enriched
Wikidata: Q1092492 (view)
Institution Type: ARCHIVE
Confidence: 0.95
Verification:
- ✅ SPARQL query confirmed Q1092492 matches "Cineteca Nacional"
- ✅ Institution type matches: film archive
- ✅ Location matches: Mexico City
Notable Collections:
- 12,000+ films
- Mexican cinema heritage
- YouTube channel for digital access
5. Fototeca Nacional
Status: ✅ Enriched
Wikidata: Q66432183 (view)
Institution Type: ARCHIVE
Confidence: 0.95
Verification:
- ✅ SPARQL query confirmed Q66432183 matches "Fototeca Nacional"
- ✅ Institution type matches: photographic archive
- ✅ Part of INAH (Instituto Nacional de Antropología e Historia)
Notable Collections:
- Nearly 900,000 cultural photographic assets
- Historical photographs of Mexico
- Part of SINAFO (Sistema Nacional de Fototecas)
6. Instituto Nacional de Antropología e Historia (INAH)
Status: ✅ Enriched
Wikidata: Q901361 (view)
VIAF: 139735572 (view)
Institution Type: OFFICIAL_INSTITUTION
Confidence: 0.98
Verification:
- ✅ SPARQL query confirmed Q901361 matches "Instituto Nacional de Antropología e Historia"
- ✅ VIAF record 139735572 matches institution name
- ✅ Multiple official websites confirmed (inah.gob.mx, mediateca.inah.gob.mx, sinafo.inah.gob.mx)
Notable Properties:
- Government heritage agency overseeing Mexican cultural heritage
- Operates multiple museums, archives, and research centers
- Digital platforms: Mediateca INAH, SINAFO, Codices INAH
- Network of regional museums and archives
Enrichment Methodology
Data Sources
- Wikidata SPARQL Endpoint - Primary identifier verification
- VIAF API - Cross-reference for institutional identifiers
- ISIL Registry - International library/archive codes
- Institutional Websites - Verification of official URLs
Verification Process
For each institution:
- ✅ SPARQL query to Wikidata using institution name + location
- ✅ Fuzzy matching with threshold > 0.85
- ✅ VIAF cross-reference where available
- ✅ Website verification against Wikidata properties
- ✅ Manual review of match quality
Confidence Scoring
- 0.98 (High): Wikidata + VIAF match + website verification (4 institutions)
- 0.95 (Very Good): Wikidata match + type/location verification (2 institutions)
Technical Implementation
Script
- File:
scripts/enrich_mexico_batch01.py - Method: Direct YAML manipulation with PyYAML
- Identifiers Added:
- Wikidata: Q-numbers with URLs
- VIAF: Numeric IDs with URLs
- ISIL: International codes with URLs
Provenance Tracking
Each enriched institution received:
provenance:
enrichment_history:
- enrichment_date: "2025-11-12T..."
enrichment_method: "Wikidata SPARQL query + VIAF cross-reference"
identifiers_added: ["Wikidata:Qxxxxxx", "VIAF:xxxxxxx"]
confidence_score: 0.95-0.98
notes: "Verified via SPARQL query and VIAF match"
File Modified
- Path:
data/instances/mexico/mexican_institutions_geocoded.yaml - Size: 117 institutions
- Lines modified: 6 institution records updated
Coverage Analysis
Before Batch 1
- Total institutions: 117
- With Wikidata IDs: 0
- Coverage: 0.0%
After Batch 1
- Total institutions: 117
- With Wikidata IDs: 6
- Coverage: 5.1%
Identifier Breakdown
| Identifier Type | Count | Coverage |
|---|---|---|
| Wikidata | 6 | 5.1% |
| VIAF | 4 | 3.4% |
| ISIL | 1 | 0.9% |
| Website | 6 | 5.1% |
Next Steps: Batch 2 Planning
Candidate Institutions (Regional Museums)
Based on baseline analysis, Batch 2 should target regional museums with high Wikidata match probability:
Priority 2 Candidates (15-20 institutions):
- Regional INAH museums (Museo Regional de X)
- State museums with established Wikipedia presence
- University museums (UNAM system)
- Major city museums (Guadalajara, Monterrey, Puebla)
Target Coverage: 20-25% (24-29 institutions)
Recommended Workflow
- Query Wikidata for Mexican museums by geographic region
- Fuzzy match against 111 remaining institutions
- Verify top 20 matches with confidence > 0.85
- Add identifiers using same methodology as Batch 1
- Document in
batch02_report.md
Quality Assurance
Manual Verification
- ✅ All 6 Q-numbers resolve to correct Wikidata entities
- ✅ All VIAF IDs resolve to correct authority records
- ✅ ISIL code MX-MXBN verified in international registry
- ✅ No duplicate identifiers introduced
Schema Compliance
- ✅ All identifiers follow LinkML schema v0.2.1
- ✅ Provenance metadata includes enrichment_history
- ✅ YAML structure preserved (list format with hyphens)
Linked Data Integrity
- ✅ All identifier URLs resolve correctly
- ✅ Wikidata entities link back to institutional websites
- ✅ VIAF records match Wikidata entities
Campaign Progress
Timeline
- Nov 11, 2025: Baseline analysis completed (117 institutions, 0.0% coverage)
- Nov 12, 2025: Batch 1 completed (6 institutions, 5.1% coverage)
Campaign Goals
- Target coverage: 65-70% (76-82 institutions)
- Remaining: 111 institutions
- Estimated batches: 5-8 batches
Projected Timeline (Based on Brazilian Model)
- Batch 2: Nov 13 (Regional museums, +15-20 institutions)
- Batch 3: Nov 14 (State archives/libraries, +15-20 institutions)
- Batch 4: Nov 15 (University collections, +10-15 institutions)
- Batch 5: Nov 16 (Specialized archives, +10-15 institutions)
- Batch 6+: Nov 17-18 (Remaining institutions, +10-20 institutions)
References
Data Files
- Source:
data/instances/mexico/mexican_institutions_geocoded.yaml - Baseline:
reports/mexico/baseline_analysis.md - Script:
scripts/enrich_mexico_batch01.py
External Resources
- Wikidata SPARQL: https://query.wikidata.org/
- VIAF API: https://viaf.org/
- ISIL Registry: https://isil.org/
Methodology
- Framework: Brazilian enrichment campaign (67.5% coverage in 6 days)
- Schema: LinkML v0.2.1 (modular)
- Provenance: PROV-O ontology patterns
Report generated: November 12, 2025
Next action: Plan and execute Batch 2 (Regional Museums)
Campaign status: ✅ On track for 65-70% coverage target