11 KiB
11 KiB
Brazil Wikidata Enrichment - Batch 16 Plan
Date: November 11, 2025
Current Coverage: 63.2% (79/125)
Target Coverage: 65-68% (81-85/125)
Goal: Enrich 5-10 institutions from remaining 46 without Wikidata
Current Status
Coverage Statistics
- Total Brazilian institutions: 125
- With Wikidata: 79 (63.2%)
- Without Wikidata: 46 (36.8%)
- Remaining to reach 70% target: 9 institutions
Institution Breakdown (46 without Wikidata)
After filtering out technical systems:
- Real institutions: 35
- Technical systems/platforms: 11 (APIs, DSpace, Tainacan, etc.)
Batch 16 Strategy
Priority Targeting
Focus on institutions with highest Wikidata likelihood:
- ✅ State/regional museums (likely documented)
- ✅ Official cultural foundations (government institutions)
- ✅ University collections (academic institutions)
- ✅ Major state archives (public institutions)
- ⚠️ Regional heritage projects (lower priority - may lack Wikidata)
Search Improvements
- Portuguese-language queries: Use native Portuguese names
- State-level searches: "Museu [State]", "Arquivo [State]"
- Alternative name variants: Test abbreviations (MHAM, FEM, etc.)
- Geographic context: Include city/state in searches
Top 15 Candidates for Batch 16
Tier 1: High Likelihood (Archives, Major Museums, Official Institutions)
1. DEAP Archives ⭐⭐⭐
- Type: ARCHIVE
- Location: Paraná state
- Description: 100,000+ immigrant records online
- Search Strategy:
- "Departamento Estadual de Arquivo Público Paraná"
- "DEAP Paraná"
- "Arquivo Público do Paraná"
- Likelihood: VERY HIGH (state archive with large collection)
2. APESP ⭐⭐⭐
- Type: MIXED (likely ARCHIVE)
- Location: São Paulo
- Description: 25M documents, 1M+ digitized pages
- Search Strategy:
- "Arquivo Público do Estado de São Paulo"
- "APESP São Paulo"
- Likelihood: VERY HIGH (major state archive, largest in Brazil)
3. Museu dos Povos Acreanos ⭐⭐⭐
- Type: MUSEUM
- Location: Rio Branco, Acre
- Description: Opened 2023, $2.8M World Bank funding
- Search Strategy:
- "Museu dos Povos Acreanos"
- "Museum of Acrean Peoples"
- "Rio Branco museum"
- Likelihood: HIGH (new museum, recent World Bank project)
4. Museu Histórico de Alcântara ⭐⭐
- Type: MUSEUM
- Location: Alcântara, Maranhão
- Description: 10,000 pieces
- Search Strategy:
- "Museu Histórico de Alcântara"
- "Alcântara historical museum"
- "Museu Casa Histórica Alcântara"
- Likelihood: MEDIUM-HIGH (colonial city, UNESCO site area)
5. Sistema Brasileiro de Museus (SBM) ⭐⭐⭐
- Type: OFFICIAL_INSTITUTION
- Location: National (Brasília)
- Description: Created 2004, updated 2013 (Decree 8.124), IBRAM coordination
- Search Strategy:
- "Sistema Brasileiro de Museus"
- "Brazilian Museum System"
- "SBM Brasil IBRAM"
- Likelihood: VERY HIGH (federal system, official government program)
6. Fundação de Cultura Elias Mansour ⭐⭐
- Type: OFFICIAL_INSTITUTION
- Location: Acre state
- Description: State cultural foundation (https://www.femcultura.ac.gov.br/)
- Search Strategy:
- "Fundação de Cultura Elias Mansour"
- "FEM Acre"
- "Elias Mansour Cultural Foundation"
- Likelihood: MEDIUM-HIGH (state foundation with active website)
7. FCRB ⭐⭐⭐
- Type: MIXED (likely LIBRARY/OFFICIAL_INSTITUTION)
- Location: Rio de Janeiro
- Description: RUBI repository (DSpace), UNESCO recognition
- Search Strategy:
- "Fundação Casa de Rui Barbosa"
- "FCRB Rio de Janeiro"
- "Casa de Rui Barbosa"
- Likelihood: VERY HIGH (major federal foundation, UNESCO recognized)
8. FUMDHAM ⭐⭐
- Type: MIXED (likely OFFICIAL_INSTITUTION/RESEARCH_CENTER)
- Location: São Raimundo Nonato, Piauí
- Description: Rock art preservation (Serra da Capivara)
- Search Strategy:
- "Fundação Museu do Homem Americano"
- "FUMDHAM"
- "Serra da Capivara foundation"
- Likelihood: MEDIUM-HIGH (UNESCO World Heritage site management)
9. Museu Memória Rondoniense ⭐
- Type: MUSEUM
- Location: Porto Velho, Rondônia
- Description: @museudamemoriarondoniense, 10,000+ records
- Search Strategy:
- "Museu da Memória Rondoniense"
- "Porto Velho memory museum"
- Likelihood: MEDIUM (state capital museum, active social media)
10. MuseusBr ⭐⭐⭐
- Type: OFFICIAL_INSTITUTION
- Location: National platform
- Description: National platform with thousands of museum records, IBRAM governance
- Search Strategy:
- "MuseusBr platform"
- "MuseusBr IBRAM"
- "Brazilian museums platform"
- Likelihood: HIGH (official IBRAM platform, government system)
Tier 2: Medium Likelihood (University Collections, Regional Museums)
11. MUSEAR/UFMT ⭐
- Type: EDUCATION_PROVIDER
- Location: Mato Grosso (UFMT - Federal University)
- Description: @musearufmt, 3,000+ pieces, 29+ tribes
- Search Strategy:
- "MUSEAR UFMT"
- "Museu Arqueologia Etnologia UFMT"
- "Universidade Federal Mato Grosso museum"
- Likelihood: MEDIUM (university museum, anthropology focus)
12. Instituto Insikiran ⭐
- Type: OFFICIAL_INSTITUTION
- Location: Roraima
- Description: Indigenous higher education, 300+ graduates
- Search Strategy:
- "Instituto Insikiran"
- "UFRR Insikiran"
- "Indigenous education Roraima"
- Likelihood: MEDIUM (UFRR institute, indigenous focus)
13. Natural History Museum Campina Grande ⭐
- Type: MUSEUM
- Location: Campina Grande, Paraíba
- Description: Natural history focus
- Search Strategy:
- "Museu História Natural Campina Grande"
- "Natural history museum Paraíba"
- "MUHNA Campina Grande"
- Likelihood: MEDIUM (regional natural history museum)
14. SECULT Amapá ⭐
- Type: OFFICIAL_INSTITUTION
- Location: Amapá state
- Description: State Culture Secretariat (gab.secult@secult.ap.gov.br)
- Search Strategy:
- "Secretaria de Cultura Amapá"
- "SECULT Amapá"
- "Amapá cultural department"
- Likelihood: MEDIUM (state government department)
15. Casa das Minas / Casa de Nagô ⭐
- Type: MIXED (likely HOLY_SITES)
- Location: Maranhão
- Description: Afro-Brazilian heritage (Tambor de Mina)
- Search Strategy:
- "Casa das Minas São Luís"
- "Casa Nagô Maranhão"
- "Tambor de Mina"
- Likelihood: MEDIUM (important Afro-Brazilian religious sites)
Excluded from Batch 16
Technical Systems (Not Heritage Institutions)
- APIs, DSpace, AtoM, LOCKSS Cariniana (technical platforms)
- Tainacan implementations (content management system)
- Metadata, Hemeroteca Digital (technical services)
- Mapa Cultural (mapping platform)
Duplicate Entries
- Fundação de Cultura Elias Mansour vs. FEM (same institution, two entries)
Low-Priority Regional Projects
- Jalapão Heritage (regional project, unlikely Wikidata)
- Ouro Preto System (municipal system)
- Guarani-Kaiowá Projects (anthropological documentation, not institution)
Batch 16 Execution Plan
Phase 1: Search Top 10 Candidates (Tier 1)
Target: 5-7 successful matches
- DEAP Archives (Paraná state archive)
- APESP (São Paulo state archive)
- Sistema Brasileiro de Museus (national museum system)
- FCRB - Casa de Rui Barbosa (federal foundation)
- MuseusBr (IBRAM national platform)
- Museu dos Povos Acreanos (Acre museum, 2023)
- FUMDHAM (Serra da Capivara, UNESCO site)
- Fundação Elias Mansour (Acre cultural foundation)
- Museu Histórico Alcântara (Maranhão colonial museum)
- Museu Memória Rondoniense (Rondônia state museum)
Phase 2: Search Tier 2 Candidates (if needed)
Target: 2-3 additional matches
- MUSEAR/UFMT (university anthropology museum)
- Instituto Insikiran (indigenous education institute)
- Natural History Museum Campina Grande
- SECULT Amapá (state culture department)
- Casa das Minas/Nagô (Afro-Brazilian heritage sites)
Success Criteria
Minimum Success
- 5+ institutions enriched
- Coverage: 63.2% → 67.2% (84/125)
Target Success
- 7-8 institutions enriched
- Coverage: 63.2% → 69.6% (87/125)
- Within 1-2 institutions of 70% goal
Maximum Success
- 10+ institutions enriched
- Coverage: 63.2% → 71.2%+ (89+/125) ✨ 70% GOAL ACHIEVED
Search Quality Standards
Match Criteria
- Minimum similarity: 0.85 (high confidence)
- Manual verification: Flag scores 0.85-0.90 for review
- Identifier requirements: Prioritize institutions with multiple external IDs
Documentation Standards
- Complete descriptions (100+ words minimum)
- Alternative names (Portuguese + English)
- GeoNames location IDs
- Full provenance metadata
- Enrichment history with confidence scores
Expected Outcomes
Optimistic Scenario (70% Goal Reached)
If 10+ institutions enriched → Coverage: 71.2% ✨
- Mission accomplished: 70% target exceeded
- Move to quality verification phase
- Prepare for other countries
Realistic Scenario (Approaching 70%)
If 7-8 institutions enriched → Coverage: 68-70%
- One more batch (Batch 17) needed to secure 70%
- Focus remaining efforts on university collections
Pessimistic Scenario (Incremental Progress)
If 5-6 institutions enriched → Coverage: 67-68%
- Two more batches (Batches 17-18) to reach 70%
- May need to create Wikidata items for notable institutions
Risk Assessment
Low-Risk Targets (Tier 1: 1-10)
- APESP, DEAP, FCRB, SBM: Major state/federal institutions → VERY HIGH Wikidata likelihood
- Expected success rate: 70-80% (7-8 of 10 found)
Medium-Risk Targets (Tier 2: 11-15)
- MUSEAR, Insikiran, Campina Grande, SECULT: Regional/academic institutions
- Expected success rate: 40-60% (2-3 of 5 found)
Overall Expected Success
- Combined success rate: 60-70% (9-10 of 15 institutions found)
- Likely coverage after Batch 16: 68-71%
Next Steps
- ✅ Execute Batch 16 searches for Tier 1 candidates (top 10)
- ✅ Extract full metadata for successful matches
- ✅ Create batch16_enriched.yaml with LinkML-compliant records
- ✅ Merge into main dataset (create backup first)
- ✅ Generate Batch 16 report with coverage statistics
- 🎯 Assess if 70% reached or if Batch 17 needed
Files to Create
data/instances/brazil/
└── batch16_enriched.yaml (5-10 institutions)
scripts/
└── merge_batch16.py (merge script)
reports/brazil/
├── batch16_plan.md (this file)
└── batch16_report.md (to be created after execution)
data/instances/all/
├── globalglam-20251111.yaml.bak.batch16 (backup)
└── globalglam-20251111.yaml (updated)
Timeline
- Planning: ✅ Complete (this document)
- Execution: Ready to begin
- Estimated duration: 30-45 minutes (searches + extraction)
- Report generation: 10 minutes
Total estimated time: 1 hour for complete Batch 16 cycle
Ready to execute: Yes ✅
Next action: Begin Wikidata searches for Tier 1 candidates
Goal: Reach 70% coverage (88/125 institutions)