glam/reports/mexico/batch01_report.md
2025-11-19 23:25:22 +01:00

9.4 KiB

Mexican Wikidata Enrichment - Batch 1 Report

Campaign: Mexican Heritage Institutions Wikidata Enrichment
Batch: 1 (National Priority Institutions)
Date: November 12, 2025
Operator: AI Agent (OpenCODE)


Executive Summary

Successfully enriched 6 national priority Mexican heritage institutions with Wikidata identifiers, achieving 5.1% coverage (6/117 institutions). This represents the foundation of the Mexican enrichment campaign, focusing on the most significant cultural institutions with verified Wikidata presence.

Key Metrics:

  • Institutions enriched: 6
  • Wikidata identifiers added: 6
  • VIAF identifiers added: 4
  • ISIL codes added: 1
  • Coverage increase: 0.0% → 5.1%
  • Average confidence score: 0.97

Batch 1 Institutions

1. Museo Nacional de Antropología

Status: Enriched
Wikidata: Q524249 (view)
VIAF: 139462066 (view)
Institution Type: MUSEUM
Confidence: 0.98

Verification:

  • SPARQL query confirmed Q524249 matches "Museo Nacional de Antropología"
  • VIAF record 139462066 matches institution name
  • Website https://mna.inah.gob.mx/ matches Wikidata official website property

Notable Collections:

  • Pre-Columbian artifacts
  • Aztec and Maya collections
  • Anthropological and archaeological materials

2. Museo Nacional de Arte (MUNAL)

Status: Enriched
Wikidata: Q1138147 (view)
VIAF: 137951343 (view)
Institution Type: MUSEUM
Confidence: 0.98

Verification:

  • SPARQL query confirmed Q1138147 matches "Museo Nacional de Arte"
  • VIAF record 137951343 matches institution name and location (Mexico City)
  • Website https://munal.mx/ matches Wikidata official website property

Notable Collections:


3. Biblioteca Nacional de México

Status: Enriched
Wikidata: Q5495070 (view)
VIAF: 147873206 (view)
ISIL: MX-MXBN (view)
Institution Type: LIBRARY
Confidence: 0.98

Verification:

  • SPARQL query confirmed Q5495070 matches "Biblioteca Nacional de México"
  • VIAF record 147873206 matches institution name
  • ISIL code MX-MXBN registered in international ISIL registry
  • Website https://bnm.iib.unam.mx/ matches Wikidata official website

Notable Collections:

  • Part of UNAM (Universidad Nacional Autónoma de México)
  • Historical Mexican bibliographic materials
  • Digital catalogs: LibrUNAM, UNAM catalog

4. Cineteca Nacional

Status: Enriched
Wikidata: Q1092492 (view)
Institution Type: ARCHIVE
Confidence: 0.95

Verification:

  • SPARQL query confirmed Q1092492 matches "Cineteca Nacional"
  • Institution type matches: film archive
  • Location matches: Mexico City

Notable Collections:

  • 12,000+ films
  • Mexican cinema heritage
  • YouTube channel for digital access

5. Fototeca Nacional

Status: Enriched
Wikidata: Q66432183 (view)
Institution Type: ARCHIVE
Confidence: 0.95

Verification:

  • SPARQL query confirmed Q66432183 matches "Fototeca Nacional"
  • Institution type matches: photographic archive
  • Part of INAH (Instituto Nacional de Antropología e Historia)

Notable Collections:

  • Nearly 900,000 cultural photographic assets
  • Historical photographs of Mexico
  • Part of SINAFO (Sistema Nacional de Fototecas)

6. Instituto Nacional de Antropología e Historia (INAH)

Status: Enriched
Wikidata: Q901361 (view)
VIAF: 139735572 (view)
Institution Type: OFFICIAL_INSTITUTION
Confidence: 0.98

Verification:

  • SPARQL query confirmed Q901361 matches "Instituto Nacional de Antropología e Historia"
  • VIAF record 139735572 matches institution name
  • Multiple official websites confirmed (inah.gob.mx, mediateca.inah.gob.mx, sinafo.inah.gob.mx)

Notable Properties:

  • Government heritage agency overseeing Mexican cultural heritage
  • Operates multiple museums, archives, and research centers
  • Digital platforms: Mediateca INAH, SINAFO, Codices INAH
  • Network of regional museums and archives

Enrichment Methodology

Data Sources

  1. Wikidata SPARQL Endpoint - Primary identifier verification
  2. VIAF API - Cross-reference for institutional identifiers
  3. ISIL Registry - International library/archive codes
  4. Institutional Websites - Verification of official URLs

Verification Process

For each institution:

  1. SPARQL query to Wikidata using institution name + location
  2. Fuzzy matching with threshold > 0.85
  3. VIAF cross-reference where available
  4. Website verification against Wikidata properties
  5. Manual review of match quality

Confidence Scoring

  • 0.98 (High): Wikidata + VIAF match + website verification (4 institutions)
  • 0.95 (Very Good): Wikidata match + type/location verification (2 institutions)

Technical Implementation

Script

  • File: scripts/enrich_mexico_batch01.py
  • Method: Direct YAML manipulation with PyYAML
  • Identifiers Added:
    • Wikidata: Q-numbers with URLs
    • VIAF: Numeric IDs with URLs
    • ISIL: International codes with URLs

Provenance Tracking

Each enriched institution received:

provenance:
  enrichment_history:
    - enrichment_date: "2025-11-12T..."
      enrichment_method: "Wikidata SPARQL query + VIAF cross-reference"
      identifiers_added: ["Wikidata:Qxxxxxx", "VIAF:xxxxxxx"]
      confidence_score: 0.95-0.98
      notes: "Verified via SPARQL query and VIAF match"

File Modified

  • Path: data/instances/mexico/mexican_institutions_geocoded.yaml
  • Size: 117 institutions
  • Lines modified: 6 institution records updated

Coverage Analysis

Before Batch 1

  • Total institutions: 117
  • With Wikidata IDs: 0
  • Coverage: 0.0%

After Batch 1

  • Total institutions: 117
  • With Wikidata IDs: 6
  • Coverage: 5.1%

Identifier Breakdown

Identifier Type Count Coverage
Wikidata 6 5.1%
VIAF 4 3.4%
ISIL 1 0.9%
Website 6 5.1%

Next Steps: Batch 2 Planning

Candidate Institutions (Regional Museums)

Based on baseline analysis, Batch 2 should target regional museums with high Wikidata match probability:

Priority 2 Candidates (15-20 institutions):

  1. Regional INAH museums (Museo Regional de X)
  2. State museums with established Wikipedia presence
  3. University museums (UNAM system)
  4. Major city museums (Guadalajara, Monterrey, Puebla)

Target Coverage: 20-25% (24-29 institutions)

  1. Query Wikidata for Mexican museums by geographic region
  2. Fuzzy match against 111 remaining institutions
  3. Verify top 20 matches with confidence > 0.85
  4. Add identifiers using same methodology as Batch 1
  5. Document in batch02_report.md

Quality Assurance

Manual Verification

  • All 6 Q-numbers resolve to correct Wikidata entities
  • All VIAF IDs resolve to correct authority records
  • ISIL code MX-MXBN verified in international registry
  • No duplicate identifiers introduced

Schema Compliance

  • All identifiers follow LinkML schema v0.2.1
  • Provenance metadata includes enrichment_history
  • YAML structure preserved (list format with hyphens)

Linked Data Integrity

  • All identifier URLs resolve correctly
  • Wikidata entities link back to institutional websites
  • VIAF records match Wikidata entities

Campaign Progress

Timeline

  • Nov 11, 2025: Baseline analysis completed (117 institutions, 0.0% coverage)
  • Nov 12, 2025: Batch 1 completed (6 institutions, 5.1% coverage)

Campaign Goals

  • Target coverage: 65-70% (76-82 institutions)
  • Remaining: 111 institutions
  • Estimated batches: 5-8 batches

Projected Timeline (Based on Brazilian Model)

  • Batch 2: Nov 13 (Regional museums, +15-20 institutions)
  • Batch 3: Nov 14 (State archives/libraries, +15-20 institutions)
  • Batch 4: Nov 15 (University collections, +10-15 institutions)
  • Batch 5: Nov 16 (Specialized archives, +10-15 institutions)
  • Batch 6+: Nov 17-18 (Remaining institutions, +10-20 institutions)

References

Data Files

  • Source: data/instances/mexico/mexican_institutions_geocoded.yaml
  • Baseline: reports/mexico/baseline_analysis.md
  • Script: scripts/enrich_mexico_batch01.py

External Resources

Methodology

  • Framework: Brazilian enrichment campaign (67.5% coverage in 6 days)
  • Schema: LinkML v0.2.1 (modular)
  • Provenance: PROV-O ontology patterns

Report generated: November 12, 2025
Next action: Plan and execute Batch 2 (Regional Museums)
Campaign status: On track for 65-70% coverage target