4.8 KiB
4.8 KiB
Chilean GLAM Wikidata Enrichment - Batch 9 Session Summary
What We Did in This Session
1. Executed Archive Enrichment Query (Batch 9)
- Ran
scripts/query_wikidata_chilean_archives.py - Found 11 Chilean archives in Wikidata
- Fuzzy matching found 0 automatic matches
- Manual verification confirmed no valid matches
2. Discovered Data Quality Issue
- "USACH's Archivo Patrimonial" is actually Archivo Nacional de Chile
- Verified via OpenStreetMap ID way/187712689 (has
wikidata: Q6970429) - This is a duplicate entry - same institution listed twice with different names
- Original "Archivo Nacional" already has Q6970429 in our dataset
3. Analyzed Why No Matches
10 archives need enrichment, but:
- 1 is a duplicate (data quality issue)
- 6 have generic names without locations
- 3 are specialized/regional archives not in Wikidata
Conclusion: Archive enrichment exhausted - no more matches available
Current Status
Coverage: 60.0% (54/90 institutions) - unchanged from Batch 8
Coverage by Type:
| Type | Have Wikidata | Total | Coverage | Status |
|---|---|---|---|---|
| EDUCATION_PROVIDER | 12 | 12 | 100.0% | ✅ Complete |
| MUSEUM | 38 | 51 | 74.5% | 📈 Good |
| LIBRARY | 2 | 9 | 22.2% | 📈 Improved (B8) |
| ARCHIVE | 2 | 12 | 16.7% | ⭐ Attempted (B9) |
| MIXED | 0 | 3 | 0.0% | 🎯 Next target |
| RESEARCH_CENTER | 0 | 2 | 0.0% | 🎯 Next target |
| OFFICIAL_INSTITUTION | 0 | 1 | 0.0% | 🎯 High priority |
What Needs to Happen Next
Priority 1: Official Institution (Batch 10A)
Target: Servicio Nacional del Patrimonio Cultural
- Chile's National Heritage Service (major government agency)
- Operates Archivo Nacional, Museo Histórico Nacional, etc.
- Official website: https://www.patrimoniocultural.gob.cl/
- NOT currently in Wikidata (verified by SPARQL query)
- Action: Manual web research or consider creating Wikidata entry
Priority 2: Research Centers (Batch 10B)
Targets:
- Fundación Buen Pastor
- Fundación Iglesias Patrimoniales (Church Heritage Foundation)
Both are foundations - may be in Wikidata as organizations
Priority 3: Mixed Institutions (Batch 10C)
Targets:
- Centro de Interpretación Histórica
- Instituto Alemán Puerto Montt (German Institute - likely has Wikidata)
- Centro Cultural Sofia Hott (Osorno cultural center)
Priority 4: Remaining Museums (Batch 11)
- 13 museums still need Wikidata (could reach 80%+ coverage)
- More likely to be in Wikidata than archives
Key Files
Active Dataset:
data/instances/chile/chilean_institutions_batch8_enriched.yaml(54 with Wikidata)
Batch 9 Outputs:
data/instances/chile/wikidata_matches_batch9_archives.json(empty - no matches)data/instances/chile/BATCH9_ARCHIVES_ANALYSIS.mddata/instances/chile/BATCH9_COMPLETE_SUMMARY.md
Scripts Available:
scripts/query_wikidata_chilean_archives.py✅scripts/query_wikidata_chilean_libraries.py✅scripts/query_wikidata_chilean_museums.py✅scripts/enrich_chilean_batch7.py(museums) ✅scripts/enrich_chilean_batch8.py(libraries) ✅
Recommended Next Actions
Option A: Query Official Institutions (10 min)
- Create
scripts/query_wikidata_chilean_official.py - Search for government cultural agencies
- Manual verification for "Servicio Nacional del Patrimonio Cultural"
Option B: Query Research Centers (15 min)
- Create
scripts/query_wikidata_chilean_foundations.py - Search for Chilean foundations (
wd:Q157031) - Match "Fundación Buen Pastor" and "Fundación Iglesias Patrimoniales"
Option C: Query Mixed/Cultural Centers (15 min)
- Create
scripts/query_wikidata_chilean_mixed.py - Search for cultural centers and interpretation centers
- Search for German schools/institutes (Instituto Alemán)
Option D: Return to Museums (30 min)
- Refine
scripts/query_wikidata_chilean_museums.py - Expand search to include smaller regional museums
- Could add 5-8 more museums
Progress Tracking
Batch History:
- Batch 0-6: Foundation work (manual + CSV imports) → 57.8%
- Batch 7: SPARQL museums (+32 Q-numbers) → 57.8%
- Batch 8: SPARQL libraries (+2 Q-numbers) → 60.0%
- Batch 9: SPARQL archives (+0 Q-numbers) → 60.0% (no matches found)
Next Milestone: 65% coverage (59 institutions)
- Need: 5 more Q-numbers
- Likely sources: Mixed institutions, research centers, official institution
Data Quality Issues to Address
- Duplicate entry: "USACH's Archivo Patrimonial" → Remove or merge with "Archivo Nacional"
- Missing location data for many archives (6 institutions)
- Generic institution names without distinguishing information
Session End
Date: 2025-11-09
Time: ~45 minutes
Next Session Goal: Target official institution and research centers (Batch 10)