glam/data/instances/chile/SESSION_SUMMARY_BATCH9.md
2025-11-19 23:25:22 +01:00

4.8 KiB

Chilean GLAM Wikidata Enrichment - Batch 9 Session Summary

What We Did in This Session

1. Executed Archive Enrichment Query (Batch 9)

  • Ran scripts/query_wikidata_chilean_archives.py
  • Found 11 Chilean archives in Wikidata
  • Fuzzy matching found 0 automatic matches
  • Manual verification confirmed no valid matches

2. Discovered Data Quality Issue

  • "USACH's Archivo Patrimonial" is actually Archivo Nacional de Chile
  • Verified via OpenStreetMap ID way/187712689 (has wikidata: Q6970429)
  • This is a duplicate entry - same institution listed twice with different names
  • Original "Archivo Nacional" already has Q6970429 in our dataset

3. Analyzed Why No Matches

10 archives need enrichment, but:

  • 1 is a duplicate (data quality issue)
  • 6 have generic names without locations
  • 3 are specialized/regional archives not in Wikidata

Conclusion: Archive enrichment exhausted - no more matches available

Current Status

Coverage: 60.0% (54/90 institutions) - unchanged from Batch 8

Coverage by Type:

Type Have Wikidata Total Coverage Status
EDUCATION_PROVIDER 12 12 100.0% Complete
MUSEUM 38 51 74.5% 📈 Good
LIBRARY 2 9 22.2% 📈 Improved (B8)
ARCHIVE 2 12 16.7% Attempted (B9)
MIXED 0 3 0.0% 🎯 Next target
RESEARCH_CENTER 0 2 0.0% 🎯 Next target
OFFICIAL_INSTITUTION 0 1 0.0% 🎯 High priority

What Needs to Happen Next

Priority 1: Official Institution (Batch 10A)

Target: Servicio Nacional del Patrimonio Cultural

  • Chile's National Heritage Service (major government agency)
  • Operates Archivo Nacional, Museo Histórico Nacional, etc.
  • Official website: https://www.patrimoniocultural.gob.cl/
  • NOT currently in Wikidata (verified by SPARQL query)
  • Action: Manual web research or consider creating Wikidata entry

Priority 2: Research Centers (Batch 10B)

Targets:

  • Fundación Buen Pastor
  • Fundación Iglesias Patrimoniales (Church Heritage Foundation)

Both are foundations - may be in Wikidata as organizations

Priority 3: Mixed Institutions (Batch 10C)

Targets:

  • Centro de Interpretación Histórica
  • Instituto Alemán Puerto Montt (German Institute - likely has Wikidata)
  • Centro Cultural Sofia Hott (Osorno cultural center)

Priority 4: Remaining Museums (Batch 11)

  • 13 museums still need Wikidata (could reach 80%+ coverage)
  • More likely to be in Wikidata than archives

Key Files

Active Dataset:

  • data/instances/chile/chilean_institutions_batch8_enriched.yaml (54 with Wikidata)

Batch 9 Outputs:

  • data/instances/chile/wikidata_matches_batch9_archives.json (empty - no matches)
  • data/instances/chile/BATCH9_ARCHIVES_ANALYSIS.md
  • data/instances/chile/BATCH9_COMPLETE_SUMMARY.md

Scripts Available:

  • scripts/query_wikidata_chilean_archives.py
  • scripts/query_wikidata_chilean_libraries.py
  • scripts/query_wikidata_chilean_museums.py
  • scripts/enrich_chilean_batch7.py (museums)
  • scripts/enrich_chilean_batch8.py (libraries)

Option A: Query Official Institutions (10 min)

  1. Create scripts/query_wikidata_chilean_official.py
  2. Search for government cultural agencies
  3. Manual verification for "Servicio Nacional del Patrimonio Cultural"

Option B: Query Research Centers (15 min)

  1. Create scripts/query_wikidata_chilean_foundations.py
  2. Search for Chilean foundations (wd:Q157031)
  3. Match "Fundación Buen Pastor" and "Fundación Iglesias Patrimoniales"

Option C: Query Mixed/Cultural Centers (15 min)

  1. Create scripts/query_wikidata_chilean_mixed.py
  2. Search for cultural centers and interpretation centers
  3. Search for German schools/institutes (Instituto Alemán)

Option D: Return to Museums (30 min)

  1. Refine scripts/query_wikidata_chilean_museums.py
  2. Expand search to include smaller regional museums
  3. Could add 5-8 more museums

Progress Tracking

Batch History:

  • Batch 0-6: Foundation work (manual + CSV imports) → 57.8%
  • Batch 7: SPARQL museums (+32 Q-numbers) → 57.8%
  • Batch 8: SPARQL libraries (+2 Q-numbers) → 60.0%
  • Batch 9: SPARQL archives (+0 Q-numbers) → 60.0% (no matches found)

Next Milestone: 65% coverage (59 institutions)

  • Need: 5 more Q-numbers
  • Likely sources: Mixed institutions, research centers, official institution

Data Quality Issues to Address

  1. Duplicate entry: "USACH's Archivo Patrimonial" → Remove or merge with "Archivo Nacional"
  2. Missing location data for many archives (6 institutions)
  3. Generic institution names without distinguishing information

Session End

Date: 2025-11-09
Time: ~45 minutes
Next Session Goal: Target official institution and research centers (Batch 10)