126 lines
4.8 KiB
Markdown
126 lines
4.8 KiB
Markdown
# Chilean GLAM Wikidata Enrichment - Batch 9 Session Summary
|
|
|
|
## What We Did in This Session
|
|
|
|
### 1. Executed Archive Enrichment Query (Batch 9)
|
|
- Ran `scripts/query_wikidata_chilean_archives.py`
|
|
- Found 11 Chilean archives in Wikidata
|
|
- Fuzzy matching found **0 automatic matches**
|
|
- Manual verification confirmed no valid matches
|
|
|
|
### 2. Discovered Data Quality Issue
|
|
- **"USACH's Archivo Patrimonial"** is actually **Archivo Nacional de Chile**
|
|
- Verified via OpenStreetMap ID way/187712689 (has `wikidata: Q6970429`)
|
|
- This is a duplicate entry - same institution listed twice with different names
|
|
- Original "Archivo Nacional" already has Q6970429 in our dataset
|
|
|
|
### 3. Analyzed Why No Matches
|
|
**10 archives need enrichment, but:**
|
|
- 1 is a duplicate (data quality issue)
|
|
- 6 have generic names without locations
|
|
- 3 are specialized/regional archives not in Wikidata
|
|
|
|
**Conclusion**: Archive enrichment exhausted - no more matches available
|
|
|
|
## Current Status
|
|
|
|
**Coverage**: **60.0%** (54/90 institutions) - unchanged from Batch 8
|
|
|
|
**Coverage by Type**:
|
|
| Type | Have Wikidata | Total | Coverage | Status |
|
|
|------|---------------|-------|----------|--------|
|
|
| EDUCATION_PROVIDER | 12 | 12 | 100.0% | ✅ Complete |
|
|
| MUSEUM | 38 | 51 | 74.5% | 📈 Good |
|
|
| LIBRARY | 2 | 9 | 22.2% | 📈 Improved (B8) |
|
|
| ARCHIVE | 2 | 12 | 16.7% | ⭐ Attempted (B9) |
|
|
| MIXED | 0 | 3 | 0.0% | 🎯 Next target |
|
|
| RESEARCH_CENTER | 0 | 2 | 0.0% | 🎯 Next target |
|
|
| OFFICIAL_INSTITUTION | 0 | 1 | 0.0% | 🎯 High priority |
|
|
|
|
## What Needs to Happen Next
|
|
|
|
### Priority 1: Official Institution (Batch 10A)
|
|
**Target**: Servicio Nacional del Patrimonio Cultural
|
|
- Chile's National Heritage Service (major government agency)
|
|
- Operates Archivo Nacional, Museo Histórico Nacional, etc.
|
|
- Official website: https://www.patrimoniocultural.gob.cl/
|
|
- **NOT currently in Wikidata** (verified by SPARQL query)
|
|
- **Action**: Manual web research or consider creating Wikidata entry
|
|
|
|
### Priority 2: Research Centers (Batch 10B)
|
|
**Targets**:
|
|
- Fundación Buen Pastor
|
|
- Fundación Iglesias Patrimoniales (Church Heritage Foundation)
|
|
|
|
Both are foundations - may be in Wikidata as organizations
|
|
|
|
### Priority 3: Mixed Institutions (Batch 10C)
|
|
**Targets**:
|
|
- Centro de Interpretación Histórica
|
|
- Instituto Alemán Puerto Montt (German Institute - likely has Wikidata)
|
|
- Centro Cultural Sofia Hott (Osorno cultural center)
|
|
|
|
### Priority 4: Remaining Museums (Batch 11)
|
|
- 13 museums still need Wikidata (could reach 80%+ coverage)
|
|
- More likely to be in Wikidata than archives
|
|
|
|
## Key Files
|
|
|
|
**Active Dataset**:
|
|
- `data/instances/chile/chilean_institutions_batch8_enriched.yaml` (54 with Wikidata)
|
|
|
|
**Batch 9 Outputs**:
|
|
- `data/instances/chile/wikidata_matches_batch9_archives.json` (empty - no matches)
|
|
- `data/instances/chile/BATCH9_ARCHIVES_ANALYSIS.md`
|
|
- `data/instances/chile/BATCH9_COMPLETE_SUMMARY.md`
|
|
|
|
**Scripts Available**:
|
|
- `scripts/query_wikidata_chilean_archives.py` ✅
|
|
- `scripts/query_wikidata_chilean_libraries.py` ✅
|
|
- `scripts/query_wikidata_chilean_museums.py` ✅
|
|
- `scripts/enrich_chilean_batch7.py` (museums) ✅
|
|
- `scripts/enrich_chilean_batch8.py` (libraries) ✅
|
|
|
|
## Recommended Next Actions
|
|
|
|
### Option A: Query Official Institutions (10 min)
|
|
1. Create `scripts/query_wikidata_chilean_official.py`
|
|
2. Search for government cultural agencies
|
|
3. Manual verification for "Servicio Nacional del Patrimonio Cultural"
|
|
|
|
### Option B: Query Research Centers (15 min)
|
|
1. Create `scripts/query_wikidata_chilean_foundations.py`
|
|
2. Search for Chilean foundations (`wd:Q157031`)
|
|
3. Match "Fundación Buen Pastor" and "Fundación Iglesias Patrimoniales"
|
|
|
|
### Option C: Query Mixed/Cultural Centers (15 min)
|
|
1. Create `scripts/query_wikidata_chilean_mixed.py`
|
|
2. Search for cultural centers and interpretation centers
|
|
3. Search for German schools/institutes (Instituto Alemán)
|
|
|
|
### Option D: Return to Museums (30 min)
|
|
1. Refine `scripts/query_wikidata_chilean_museums.py`
|
|
2. Expand search to include smaller regional museums
|
|
3. Could add 5-8 more museums
|
|
|
|
## Progress Tracking
|
|
|
|
**Batch History**:
|
|
- Batch 0-6: Foundation work (manual + CSV imports) → 57.8%
|
|
- **Batch 7**: SPARQL museums (+32 Q-numbers) → 57.8%
|
|
- **Batch 8**: SPARQL libraries (+2 Q-numbers) → 60.0%
|
|
- **Batch 9**: SPARQL archives (+0 Q-numbers) → 60.0% (no matches found)
|
|
|
|
**Next Milestone**: 65% coverage (59 institutions)
|
|
- Need: 5 more Q-numbers
|
|
- Likely sources: Mixed institutions, research centers, official institution
|
|
|
|
## Data Quality Issues to Address
|
|
1. Duplicate entry: "USACH's Archivo Patrimonial" → Remove or merge with "Archivo Nacional"
|
|
2. Missing location data for many archives (6 institutions)
|
|
3. Generic institution names without distinguishing information
|
|
|
|
## Session End
|
|
**Date**: 2025-11-09
|
|
**Time**: ~45 minutes
|
|
**Next Session Goal**: Target official institution and research centers (Batch 10)
|