297 lines
9.4 KiB
Markdown
297 lines
9.4 KiB
Markdown
# Mexican Wikidata Enrichment - Batch 1 Report
|
|
|
|
**Campaign:** Mexican Heritage Institutions Wikidata Enrichment
|
|
**Batch:** 1 (National Priority Institutions)
|
|
**Date:** November 12, 2025
|
|
**Operator:** AI Agent (OpenCODE)
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
Successfully enriched 6 national priority Mexican heritage institutions with Wikidata identifiers, achieving **5.1% coverage** (6/117 institutions). This represents the foundation of the Mexican enrichment campaign, focusing on the most significant cultural institutions with verified Wikidata presence.
|
|
|
|
**Key Metrics:**
|
|
- **Institutions enriched:** 6
|
|
- **Wikidata identifiers added:** 6
|
|
- **VIAF identifiers added:** 4
|
|
- **ISIL codes added:** 1
|
|
- **Coverage increase:** 0.0% → 5.1%
|
|
- **Average confidence score:** 0.97
|
|
|
|
---
|
|
|
|
## Batch 1 Institutions
|
|
|
|
### 1. Museo Nacional de Antropología
|
|
**Status:** ✅ Enriched
|
|
**Wikidata:** Q524249 ([view](https://www.wikidata.org/wiki/Q524249))
|
|
**VIAF:** 139462066 ([view](https://viaf.org/viaf/139462066))
|
|
**Institution Type:** MUSEUM
|
|
**Confidence:** 0.98
|
|
|
|
**Verification:**
|
|
- ✅ SPARQL query confirmed Q524249 matches "Museo Nacional de Antropología"
|
|
- ✅ VIAF record 139462066 matches institution name
|
|
- ✅ Website https://mna.inah.gob.mx/ matches Wikidata official website property
|
|
|
|
**Notable Collections:**
|
|
- Pre-Columbian artifacts
|
|
- Aztec and Maya collections
|
|
- Anthropological and archaeological materials
|
|
|
|
---
|
|
|
|
### 2. Museo Nacional de Arte (MUNAL)
|
|
**Status:** ✅ Enriched
|
|
**Wikidata:** Q1138147 ([view](https://www.wikidata.org/wiki/Q1138147))
|
|
**VIAF:** 137951343 ([view](https://viaf.org/viaf/137951343))
|
|
**Institution Type:** MUSEUM
|
|
**Confidence:** 0.98
|
|
|
|
**Verification:**
|
|
- ✅ SPARQL query confirmed Q1138147 matches "Museo Nacional de Arte"
|
|
- ✅ VIAF record 137951343 matches institution name and location (Mexico City)
|
|
- ✅ Website https://munal.mx/ matches Wikidata official website property
|
|
|
|
**Notable Collections:**
|
|
- Mexican art from 16th to 20th century
|
|
- Colonial to modern period collections
|
|
- Digital catalog at https://munal.emuseum.com/
|
|
|
|
---
|
|
|
|
### 3. Biblioteca Nacional de México
|
|
**Status:** ✅ Enriched
|
|
**Wikidata:** Q5495070 ([view](https://www.wikidata.org/wiki/Q5495070))
|
|
**VIAF:** 147873206 ([view](https://viaf.org/viaf/147873206))
|
|
**ISIL:** MX-MXBN ([view](https://isil.org/MX-MXBN))
|
|
**Institution Type:** LIBRARY
|
|
**Confidence:** 0.98
|
|
|
|
**Verification:**
|
|
- ✅ SPARQL query confirmed Q5495070 matches "Biblioteca Nacional de México"
|
|
- ✅ VIAF record 147873206 matches institution name
|
|
- ✅ ISIL code MX-MXBN registered in international ISIL registry
|
|
- ✅ Website https://bnm.iib.unam.mx/ matches Wikidata official website
|
|
|
|
**Notable Collections:**
|
|
- Part of UNAM (Universidad Nacional Autónoma de México)
|
|
- Historical Mexican bibliographic materials
|
|
- Digital catalogs: LibrUNAM, UNAM catalog
|
|
|
|
---
|
|
|
|
### 4. Cineteca Nacional
|
|
**Status:** ✅ Enriched
|
|
**Wikidata:** Q1092492 ([view](https://www.wikidata.org/wiki/Q1092492))
|
|
**Institution Type:** ARCHIVE
|
|
**Confidence:** 0.95
|
|
|
|
**Verification:**
|
|
- ✅ SPARQL query confirmed Q1092492 matches "Cineteca Nacional"
|
|
- ✅ Institution type matches: film archive
|
|
- ✅ Location matches: Mexico City
|
|
|
|
**Notable Collections:**
|
|
- 12,000+ films
|
|
- Mexican cinema heritage
|
|
- YouTube channel for digital access
|
|
|
|
---
|
|
|
|
### 5. Fototeca Nacional
|
|
**Status:** ✅ Enriched
|
|
**Wikidata:** Q66432183 ([view](https://www.wikidata.org/wiki/Q66432183))
|
|
**Institution Type:** ARCHIVE
|
|
**Confidence:** 0.95
|
|
|
|
**Verification:**
|
|
- ✅ SPARQL query confirmed Q66432183 matches "Fototeca Nacional"
|
|
- ✅ Institution type matches: photographic archive
|
|
- ✅ Part of INAH (Instituto Nacional de Antropología e Historia)
|
|
|
|
**Notable Collections:**
|
|
- Nearly 900,000 cultural photographic assets
|
|
- Historical photographs of Mexico
|
|
- Part of SINAFO (Sistema Nacional de Fototecas)
|
|
|
|
---
|
|
|
|
### 6. Instituto Nacional de Antropología e Historia (INAH)
|
|
**Status:** ✅ Enriched
|
|
**Wikidata:** Q901361 ([view](https://www.wikidata.org/wiki/Q901361))
|
|
**VIAF:** 139735572 ([view](https://viaf.org/viaf/139735572))
|
|
**Institution Type:** OFFICIAL_INSTITUTION
|
|
**Confidence:** 0.98
|
|
|
|
**Verification:**
|
|
- ✅ SPARQL query confirmed Q901361 matches "Instituto Nacional de Antropología e Historia"
|
|
- ✅ VIAF record 139735572 matches institution name
|
|
- ✅ Multiple official websites confirmed (inah.gob.mx, mediateca.inah.gob.mx, sinafo.inah.gob.mx)
|
|
|
|
**Notable Properties:**
|
|
- Government heritage agency overseeing Mexican cultural heritage
|
|
- Operates multiple museums, archives, and research centers
|
|
- Digital platforms: Mediateca INAH, SINAFO, Codices INAH
|
|
- Network of regional museums and archives
|
|
|
|
---
|
|
|
|
## Enrichment Methodology
|
|
|
|
### Data Sources
|
|
1. **Wikidata SPARQL Endpoint** - Primary identifier verification
|
|
2. **VIAF API** - Cross-reference for institutional identifiers
|
|
3. **ISIL Registry** - International library/archive codes
|
|
4. **Institutional Websites** - Verification of official URLs
|
|
|
|
### Verification Process
|
|
For each institution:
|
|
1. ✅ SPARQL query to Wikidata using institution name + location
|
|
2. ✅ Fuzzy matching with threshold > 0.85
|
|
3. ✅ VIAF cross-reference where available
|
|
4. ✅ Website verification against Wikidata properties
|
|
5. ✅ Manual review of match quality
|
|
|
|
### Confidence Scoring
|
|
- **0.98 (High):** Wikidata + VIAF match + website verification (4 institutions)
|
|
- **0.95 (Very Good):** Wikidata match + type/location verification (2 institutions)
|
|
|
|
---
|
|
|
|
## Technical Implementation
|
|
|
|
### Script
|
|
- **File:** `scripts/enrich_mexico_batch01.py`
|
|
- **Method:** Direct YAML manipulation with PyYAML
|
|
- **Identifiers Added:**
|
|
- Wikidata: Q-numbers with URLs
|
|
- VIAF: Numeric IDs with URLs
|
|
- ISIL: International codes with URLs
|
|
|
|
### Provenance Tracking
|
|
Each enriched institution received:
|
|
```yaml
|
|
provenance:
|
|
enrichment_history:
|
|
- enrichment_date: "2025-11-12T..."
|
|
enrichment_method: "Wikidata SPARQL query + VIAF cross-reference"
|
|
identifiers_added: ["Wikidata:Qxxxxxx", "VIAF:xxxxxxx"]
|
|
confidence_score: 0.95-0.98
|
|
notes: "Verified via SPARQL query and VIAF match"
|
|
```
|
|
|
|
### File Modified
|
|
- **Path:** `data/instances/mexico/mexican_institutions_geocoded.yaml`
|
|
- **Size:** 117 institutions
|
|
- **Lines modified:** 6 institution records updated
|
|
|
|
---
|
|
|
|
## Coverage Analysis
|
|
|
|
### Before Batch 1
|
|
- **Total institutions:** 117
|
|
- **With Wikidata IDs:** 0
|
|
- **Coverage:** 0.0%
|
|
|
|
### After Batch 1
|
|
- **Total institutions:** 117
|
|
- **With Wikidata IDs:** 6
|
|
- **Coverage:** 5.1%
|
|
|
|
### Identifier Breakdown
|
|
| Identifier Type | Count | Coverage |
|
|
|----------------|-------|----------|
|
|
| Wikidata | 6 | 5.1% |
|
|
| VIAF | 4 | 3.4% |
|
|
| ISIL | 1 | 0.9% |
|
|
| Website | 6 | 5.1% |
|
|
|
|
---
|
|
|
|
## Next Steps: Batch 2 Planning
|
|
|
|
### Candidate Institutions (Regional Museums)
|
|
Based on baseline analysis, Batch 2 should target regional museums with high Wikidata match probability:
|
|
|
|
**Priority 2 Candidates (15-20 institutions):**
|
|
1. Regional INAH museums (Museo Regional de X)
|
|
2. State museums with established Wikipedia presence
|
|
3. University museums (UNAM system)
|
|
4. Major city museums (Guadalajara, Monterrey, Puebla)
|
|
|
|
**Target Coverage:** 20-25% (24-29 institutions)
|
|
|
|
### Recommended Workflow
|
|
1. **Query Wikidata** for Mexican museums by geographic region
|
|
2. **Fuzzy match** against 111 remaining institutions
|
|
3. **Verify** top 20 matches with confidence > 0.85
|
|
4. **Add identifiers** using same methodology as Batch 1
|
|
5. **Document** in `batch02_report.md`
|
|
|
|
---
|
|
|
|
## Quality Assurance
|
|
|
|
### Manual Verification
|
|
- ✅ All 6 Q-numbers resolve to correct Wikidata entities
|
|
- ✅ All VIAF IDs resolve to correct authority records
|
|
- ✅ ISIL code MX-MXBN verified in international registry
|
|
- ✅ No duplicate identifiers introduced
|
|
|
|
### Schema Compliance
|
|
- ✅ All identifiers follow LinkML schema v0.2.1
|
|
- ✅ Provenance metadata includes enrichment_history
|
|
- ✅ YAML structure preserved (list format with hyphens)
|
|
|
|
### Linked Data Integrity
|
|
- ✅ All identifier URLs resolve correctly
|
|
- ✅ Wikidata entities link back to institutional websites
|
|
- ✅ VIAF records match Wikidata entities
|
|
|
|
---
|
|
|
|
## Campaign Progress
|
|
|
|
### Timeline
|
|
- **Nov 11, 2025:** Baseline analysis completed (117 institutions, 0.0% coverage)
|
|
- **Nov 12, 2025:** Batch 1 completed (6 institutions, 5.1% coverage)
|
|
|
|
### Campaign Goals
|
|
- **Target coverage:** 65-70% (76-82 institutions)
|
|
- **Remaining:** 111 institutions
|
|
- **Estimated batches:** 5-8 batches
|
|
|
|
### Projected Timeline (Based on Brazilian Model)
|
|
- **Batch 2:** Nov 13 (Regional museums, +15-20 institutions)
|
|
- **Batch 3:** Nov 14 (State archives/libraries, +15-20 institutions)
|
|
- **Batch 4:** Nov 15 (University collections, +10-15 institutions)
|
|
- **Batch 5:** Nov 16 (Specialized archives, +10-15 institutions)
|
|
- **Batch 6+:** Nov 17-18 (Remaining institutions, +10-20 institutions)
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
### Data Files
|
|
- **Source:** `data/instances/mexico/mexican_institutions_geocoded.yaml`
|
|
- **Baseline:** `reports/mexico/baseline_analysis.md`
|
|
- **Script:** `scripts/enrich_mexico_batch01.py`
|
|
|
|
### External Resources
|
|
- **Wikidata SPARQL:** https://query.wikidata.org/
|
|
- **VIAF API:** https://viaf.org/
|
|
- **ISIL Registry:** https://isil.org/
|
|
|
|
### Methodology
|
|
- **Framework:** Brazilian enrichment campaign (67.5% coverage in 6 days)
|
|
- **Schema:** LinkML v0.2.1 (modular)
|
|
- **Provenance:** PROV-O ontology patterns
|
|
|
|
---
|
|
|
|
**Report generated:** November 12, 2025
|
|
**Next action:** Plan and execute Batch 2 (Regional Museums)
|
|
**Campaign status:** ✅ On track for 65-70% coverage target
|