glam/reports/brazil/batch16_plan.md
2025-11-19 23:25:22 +01:00

11 KiB

Brazil Wikidata Enrichment - Batch 16 Plan

Date: November 11, 2025
Current Coverage: 63.2% (79/125)
Target Coverage: 65-68% (81-85/125)
Goal: Enrich 5-10 institutions from remaining 46 without Wikidata


Current Status

Coverage Statistics

  • Total Brazilian institutions: 125
  • With Wikidata: 79 (63.2%)
  • Without Wikidata: 46 (36.8%)
  • Remaining to reach 70% target: 9 institutions

Institution Breakdown (46 without Wikidata)

After filtering out technical systems:

  • Real institutions: 35
  • Technical systems/platforms: 11 (APIs, DSpace, Tainacan, etc.)

Batch 16 Strategy

Priority Targeting

Focus on institutions with highest Wikidata likelihood:

  1. State/regional museums (likely documented)
  2. Official cultural foundations (government institutions)
  3. University collections (academic institutions)
  4. Major state archives (public institutions)
  5. ⚠️ Regional heritage projects (lower priority - may lack Wikidata)

Search Improvements

  1. Portuguese-language queries: Use native Portuguese names
  2. State-level searches: "Museu [State]", "Arquivo [State]"
  3. Alternative name variants: Test abbreviations (MHAM, FEM, etc.)
  4. Geographic context: Include city/state in searches

Top 15 Candidates for Batch 16

Tier 1: High Likelihood (Archives, Major Museums, Official Institutions)

1. DEAP Archives

  • Type: ARCHIVE
  • Location: Paraná state
  • Description: 100,000+ immigrant records online
  • Search Strategy:
    • "Departamento Estadual de Arquivo Público Paraná"
    • "DEAP Paraná"
    • "Arquivo Público do Paraná"
  • Likelihood: VERY HIGH (state archive with large collection)

2. APESP

  • Type: MIXED (likely ARCHIVE)
  • Location: São Paulo
  • Description: 25M documents, 1M+ digitized pages
  • Search Strategy:
    • "Arquivo Público do Estado de São Paulo"
    • "APESP São Paulo"
  • Likelihood: VERY HIGH (major state archive, largest in Brazil)

3. Museu dos Povos Acreanos

  • Type: MUSEUM
  • Location: Rio Branco, Acre
  • Description: Opened 2023, $2.8M World Bank funding
  • Search Strategy:
    • "Museu dos Povos Acreanos"
    • "Museum of Acrean Peoples"
    • "Rio Branco museum"
  • Likelihood: HIGH (new museum, recent World Bank project)

4. Museu Histórico de Alcântara

  • Type: MUSEUM
  • Location: Alcântara, Maranhão
  • Description: 10,000 pieces
  • Search Strategy:
    • "Museu Histórico de Alcântara"
    • "Alcântara historical museum"
    • "Museu Casa Histórica Alcântara"
  • Likelihood: MEDIUM-HIGH (colonial city, UNESCO site area)

5. Sistema Brasileiro de Museus (SBM)

  • Type: OFFICIAL_INSTITUTION
  • Location: National (Brasília)
  • Description: Created 2004, updated 2013 (Decree 8.124), IBRAM coordination
  • Search Strategy:
    • "Sistema Brasileiro de Museus"
    • "Brazilian Museum System"
    • "SBM Brasil IBRAM"
  • Likelihood: VERY HIGH (federal system, official government program)

6. Fundação de Cultura Elias Mansour

  • Type: OFFICIAL_INSTITUTION
  • Location: Acre state
  • Description: State cultural foundation (https://www.femcultura.ac.gov.br/)
  • Search Strategy:
    • "Fundação de Cultura Elias Mansour"
    • "FEM Acre"
    • "Elias Mansour Cultural Foundation"
  • Likelihood: MEDIUM-HIGH (state foundation with active website)

7. FCRB

  • Type: MIXED (likely LIBRARY/OFFICIAL_INSTITUTION)
  • Location: Rio de Janeiro
  • Description: RUBI repository (DSpace), UNESCO recognition
  • Search Strategy:
    • "Fundação Casa de Rui Barbosa"
    • "FCRB Rio de Janeiro"
    • "Casa de Rui Barbosa"
  • Likelihood: VERY HIGH (major federal foundation, UNESCO recognized)

8. FUMDHAM

  • Type: MIXED (likely OFFICIAL_INSTITUTION/RESEARCH_CENTER)
  • Location: São Raimundo Nonato, Piauí
  • Description: Rock art preservation (Serra da Capivara)
  • Search Strategy:
    • "Fundação Museu do Homem Americano"
    • "FUMDHAM"
    • "Serra da Capivara foundation"
  • Likelihood: MEDIUM-HIGH (UNESCO World Heritage site management)

9. Museu Memória Rondoniense

  • Type: MUSEUM
  • Location: Porto Velho, Rondônia
  • Description: @museudamemoriarondoniense, 10,000+ records
  • Search Strategy:
    • "Museu da Memória Rondoniense"
    • "Porto Velho memory museum"
  • Likelihood: MEDIUM (state capital museum, active social media)

10. MuseusBr

  • Type: OFFICIAL_INSTITUTION
  • Location: National platform
  • Description: National platform with thousands of museum records, IBRAM governance
  • Search Strategy:
    • "MuseusBr platform"
    • "MuseusBr IBRAM"
    • "Brazilian museums platform"
  • Likelihood: HIGH (official IBRAM platform, government system)

Tier 2: Medium Likelihood (University Collections, Regional Museums)

11. MUSEAR/UFMT

  • Type: EDUCATION_PROVIDER
  • Location: Mato Grosso (UFMT - Federal University)
  • Description: @musearufmt, 3,000+ pieces, 29+ tribes
  • Search Strategy:
    • "MUSEAR UFMT"
    • "Museu Arqueologia Etnologia UFMT"
    • "Universidade Federal Mato Grosso museum"
  • Likelihood: MEDIUM (university museum, anthropology focus)

12. Instituto Insikiran

  • Type: OFFICIAL_INSTITUTION
  • Location: Roraima
  • Description: Indigenous higher education, 300+ graduates
  • Search Strategy:
    • "Instituto Insikiran"
    • "UFRR Insikiran"
    • "Indigenous education Roraima"
  • Likelihood: MEDIUM (UFRR institute, indigenous focus)

13. Natural History Museum Campina Grande

  • Type: MUSEUM
  • Location: Campina Grande, Paraíba
  • Description: Natural history focus
  • Search Strategy:
    • "Museu História Natural Campina Grande"
    • "Natural history museum Paraíba"
    • "MUHNA Campina Grande"
  • Likelihood: MEDIUM (regional natural history museum)

14. SECULT Amapá

  • Type: OFFICIAL_INSTITUTION
  • Location: Amapá state
  • Description: State Culture Secretariat (gab.secult@secult.ap.gov.br)
  • Search Strategy:
    • "Secretaria de Cultura Amapá"
    • "SECULT Amapá"
    • "Amapá cultural department"
  • Likelihood: MEDIUM (state government department)

15. Casa das Minas / Casa de Nagô

  • Type: MIXED (likely HOLY_SITES)
  • Location: Maranhão
  • Description: Afro-Brazilian heritage (Tambor de Mina)
  • Search Strategy:
    • "Casa das Minas São Luís"
    • "Casa Nagô Maranhão"
    • "Tambor de Mina"
  • Likelihood: MEDIUM (important Afro-Brazilian religious sites)

Excluded from Batch 16

Technical Systems (Not Heritage Institutions)

  • APIs, DSpace, AtoM, LOCKSS Cariniana (technical platforms)
  • Tainacan implementations (content management system)
  • Metadata, Hemeroteca Digital (technical services)
  • Mapa Cultural (mapping platform)

Duplicate Entries

  • Fundação de Cultura Elias Mansour vs. FEM (same institution, two entries)

Low-Priority Regional Projects

  • Jalapão Heritage (regional project, unlikely Wikidata)
  • Ouro Preto System (municipal system)
  • Guarani-Kaiowá Projects (anthropological documentation, not institution)

Batch 16 Execution Plan

Phase 1: Search Top 10 Candidates (Tier 1)

Target: 5-7 successful matches

  1. DEAP Archives (Paraná state archive)
  2. APESP (São Paulo state archive)
  3. Sistema Brasileiro de Museus (national museum system)
  4. FCRB - Casa de Rui Barbosa (federal foundation)
  5. MuseusBr (IBRAM national platform)
  6. Museu dos Povos Acreanos (Acre museum, 2023)
  7. FUMDHAM (Serra da Capivara, UNESCO site)
  8. Fundação Elias Mansour (Acre cultural foundation)
  9. Museu Histórico Alcântara (Maranhão colonial museum)
  10. Museu Memória Rondoniense (Rondônia state museum)

Phase 2: Search Tier 2 Candidates (if needed)

Target: 2-3 additional matches

  1. MUSEAR/UFMT (university anthropology museum)
  2. Instituto Insikiran (indigenous education institute)
  3. Natural History Museum Campina Grande
  4. SECULT Amapá (state culture department)
  5. Casa das Minas/Nagô (Afro-Brazilian heritage sites)

Success Criteria

Minimum Success

  • 5+ institutions enriched
  • Coverage: 63.2% → 67.2% (84/125)

Target Success

  • 7-8 institutions enriched
  • Coverage: 63.2% → 69.6% (87/125)
  • Within 1-2 institutions of 70% goal

Maximum Success

  • 10+ institutions enriched
  • Coverage: 63.2% → 71.2%+ (89+/125) 70% GOAL ACHIEVED

Search Quality Standards

Match Criteria

  • Minimum similarity: 0.85 (high confidence)
  • Manual verification: Flag scores 0.85-0.90 for review
  • Identifier requirements: Prioritize institutions with multiple external IDs

Documentation Standards

  • Complete descriptions (100+ words minimum)
  • Alternative names (Portuguese + English)
  • GeoNames location IDs
  • Full provenance metadata
  • Enrichment history with confidence scores

Expected Outcomes

Optimistic Scenario (70% Goal Reached)

If 10+ institutions enriched → Coverage: 71.2%

  • Mission accomplished: 70% target exceeded
  • Move to quality verification phase
  • Prepare for other countries

Realistic Scenario (Approaching 70%)

If 7-8 institutions enriched → Coverage: 68-70%

  • One more batch (Batch 17) needed to secure 70%
  • Focus remaining efforts on university collections

Pessimistic Scenario (Incremental Progress)

If 5-6 institutions enriched → Coverage: 67-68%

  • Two more batches (Batches 17-18) to reach 70%
  • May need to create Wikidata items for notable institutions

Risk Assessment

Low-Risk Targets (Tier 1: 1-10)

  • APESP, DEAP, FCRB, SBM: Major state/federal institutions → VERY HIGH Wikidata likelihood
  • Expected success rate: 70-80% (7-8 of 10 found)

Medium-Risk Targets (Tier 2: 11-15)

  • MUSEAR, Insikiran, Campina Grande, SECULT: Regional/academic institutions
  • Expected success rate: 40-60% (2-3 of 5 found)

Overall Expected Success

  • Combined success rate: 60-70% (9-10 of 15 institutions found)
  • Likely coverage after Batch 16: 68-71%

Next Steps

  1. Execute Batch 16 searches for Tier 1 candidates (top 10)
  2. Extract full metadata for successful matches
  3. Create batch16_enriched.yaml with LinkML-compliant records
  4. Merge into main dataset (create backup first)
  5. Generate Batch 16 report with coverage statistics
  6. 🎯 Assess if 70% reached or if Batch 17 needed

Files to Create

data/instances/brazil/
  └── batch16_enriched.yaml (5-10 institutions)

scripts/
  └── merge_batch16.py (merge script)

reports/brazil/
  ├── batch16_plan.md (this file)
  └── batch16_report.md (to be created after execution)

data/instances/all/
  ├── globalglam-20251111.yaml.bak.batch16 (backup)
  └── globalglam-20251111.yaml (updated)

Timeline

  • Planning: Complete (this document)
  • Execution: Ready to begin
  • Estimated duration: 30-45 minutes (searches + extraction)
  • Report generation: 10 minutes

Total estimated time: 1 hour for complete Batch 16 cycle


Ready to execute: Yes
Next action: Begin Wikidata searches for Tier 1 candidates
Goal: Reach 70% coverage (88/125 institutions)