glam/data/instances/brazil/BATCH9_CANDIDATES_REPORT.md
2025-11-19 23:25:22 +01:00

10 KiB

Brazil Batch 9 Enrichment Candidates

Generated: 2025-11-11
Purpose: Identify high-priority Brazilian institutions for Wikidata enrichment
Target: Add 10-15 Wikidata identifiers to increase Brazil coverage from 14.6% → 19.3%+

Summary Statistics

  • Total Brazilian institutions: 212
  • With Wikidata: 31 (14.6%)
  • Without Wikidata: 181 (85.4%)
  • Candidates analyzed: 181
  • Top candidates selected: 15

Institution Type Distribution (Without Wikidata)

Type Count % of Total
MIXED 61 33.7%
EDUCATION_PROVIDER 43 23.8%
MUSEUM 42 23.2%
OFFICIAL_INSTITUTION 16 8.8%
ARCHIVE 9 5.0%
RESEARCH_CENTER 4 2.2%
GALLERY 3 1.7%
LIBRARY 3 1.7%

Scoring Methodology

Institutions are scored based on:

  1. Institution Type (0-10 points)

    • MUSEUM, LIBRARY, ARCHIVE: 10 points (core heritage institutions)
    • GALLERY: 8 points
    • RESEARCH_CENTER: 7 points
    • OFFICIAL_INSTITUTION: 6 points
    • EDUCATION_PROVIDER: 4 points
    • MIXED: 3 points
  2. Name Specificity (0-5 points)

    • Explicit institutional names: +3 points
    • National/state/federal/municipal institutions: +2 points
    • Generic educational names: 0 bonus
  3. Digital Platforms (0-3 points)

    • Each platform: +1 point (max 3)
  4. Website Available (0-2 points)

    • Has website identifier: +2 points
  5. Geographic Location (0-3 points)

    • Major city (São Paulo, Rio, etc.): +3 points
    • Has city information: +1 point
  6. Description Richness (0-2 points)

    • Detailed (>100 chars): +2 points
    • Moderate (>50 chars): +1 point

Maximum possible score: 25 points

Top 15 Candidates for Batch 9

1. Museu Paulista

  • Score: 18.0/25
  • Type: MUSEUM
  • Location: São Paulo, SP
  • Website: Not available
  • Platforms: 0
  • Description: University of São Paulo museum, subject of research on collection policy (1990-2015) and acquisition strategies. Publishes Anais do Museu Paulista jou...

Wikidata Search Strategy:

  • Search term: Museu Paulista São Paulo Brazil
  • Filter: instance of → museum
  • Verify: Location matches São Paulo, Brazil

2. Museu Casa de Rui Barbosa

  • Score: 18.0/25
  • Type: MUSEUM
  • Location: Rio de Janeiro,
  • Website: Not available
  • Platforms: 0
  • Description: Developed systematic cataloging methodologies for museum environments, conducting research finding 47% of surveyed institutions perform room documenta...

Wikidata Search Strategy:

  • Search term: Museu Casa de Rui Barbosa Rio de Janeiro Brazil
  • Filter: instance of → museum
  • Verify: Location matches Rio de Janeiro, Brazil

3. UnB BCE

  • Score: 18.0/25
  • Type: LIBRARY
  • Location: Brasília, DISTRITO FEDERAL
  • Website: https://bce.unb.br/
  • Platforms: 0
  • Description: 24/7 operations, @bceunb (38K)

Wikidata Search Strategy:

  • Search term: UnB BCE Brasília Brazil
  • Filter: instance of → library
  • Verify: Location matches Brasília, Brazil

4. Museu de Arte de São Paulo (MASP)

  • Score: 17.0/25
  • Type: MUSEUM
  • Location: São Paulo, SP
  • Website: Not available
  • Platforms: 0
  • Description: Major art museum collaborating with Google Arts & Culture to reach global audiences.

Wikidata Search Strategy:

  • Search term: Museu de Arte de São Paulo (MASP) São Paulo Brazil
  • Filter: instance of → museum
  • Verify: Location matches São Paulo, Brazil

5. MAX

  • Score: 17.0/25
  • Type: MUSEUM
  • Location: Rio de Janeiro, SERGIPE
  • Website: Not available
  • Platforms: 0
  • Description: Archaeological museum, UFS-administered

Contact: Phone: 21 3395 8905

Wikidata Search Strategy:

  • Search term: MAX Rio de Janeiro Brazil
  • Filter: instance of → museum
  • Verify: Location matches Rio de Janeiro, Brazil

6. UFAL Natural History Museum

  • Score: 16.0/25
  • Type: MUSEUM
  • Location: Maceió, ALAGOAS
  • Website: Not available
  • Platforms: 0
  • Description:

Wikidata Search Strategy:

  • Search term: UFAL Natural History Museum Maceió Brazil
  • Filter: instance of → museum
  • Verify: Location matches Maceió, Brazil

7. Museu Sacaca

Wikidata Search Strategy:

  • Search term: Museu Sacaca Macapá Brazil
  • Filter: instance of → museum
  • Verify: Location matches Macapá, Brazil

8. Arquivo Público DF

  • Score: 16.0/25
  • Type: ARCHIVE
  • Location: Brasília, DISTRITO FEDERAL
  • Website: Not available
  • Platforms: 0
  • Description: @arpdf (16K), 1.445M photos

Wikidata Search Strategy:

  • Search term: Arquivo Público DF Brasília Brazil
  • Filter: instance of → archive
  • Verify: Location matches Brasília, Brazil

9. UFPA

  • Score: 16.0/25
  • Type: MUSEUM
  • Location: Belém, PARÁ
  • Website: Not available
  • Platforms: 0
  • Description: 50,000+ students, MUFPA museum

Wikidata Search Strategy:

  • Search term: UFPA Belém Brazil
  • Filter: instance of → museum
  • Verify: Location matches Belém, Brazil

10. Arquivo Blumenau

  • Score: 16.0/25
  • Type: ARCHIVE
  • Location: Blumenau, SANTA CATARINA
  • Website: http://arquivodeblumenau.com.br/
  • Platforms: 0
  • Description: 500,000+ photos, German/Italian

Wikidata Search Strategy:

  • Search term: Arquivo Blumenau Blumenau Brazil
  • Filter: instance of → archive
  • Verify: Location matches Blumenau, Brazil

11. Museu Palacinho

  • Score: 16.0/25
  • Type: MUSEUM
  • Location: Palmas, TOCANTINS
  • Website: https://museupalacinho.com/
  • Platforms: 0
  • Description: Contact: Phone: +55 63 99232-8613

Wikidata Search Strategy:

  • Search term: Museu Palacinho Palmas Brazil
  • Filter: instance of → museum
  • Verify: Location matches Palmas, Brazil

12. Museu Nacional

  • Score: 15.0/25
  • Type: MUSEUM
  • Location: Unknown, RJ
  • Website: Not available
  • Platforms: 0
  • Description: @museunacionalufrj (126K), reopening 2026

Wikidata Search Strategy:

  • Search term: Museu Nacional Unknown Brazil
  • Filter: instance of → museum
  • Verify: Location matches Unknown, Brazil

13. Museu Palacinho

Wikidata Search Strategy:

  • Search term: Museu Palacinho Unknown Brazil
  • Filter: instance of → museum
  • Verify: Location matches Unknown, Brazil

14. Biblioteca Digital Brasileira de Teses e Dissertações (BDTD)

  • Score: 15.0/25
  • Type: LIBRARY
  • Location: Unknown,
  • Website: Not available
  • Platforms: 0
  • Description: Aggregates graduate research from universities nationwide, providing access to Brazilian theses and dissertations.

Wikidata Search Strategy:

  • Search term: Biblioteca Digital Brasileira de Teses e Dissertações (BDTD) Unknown Brazil
  • Filter: instance of → library
  • Verify: Location matches Unknown, Brazil

15. Museu da Borracha

  • Score: 14.0/25
  • Type: MUSEUM
  • Location: Unknown, AC
  • Website: Not available
  • Platforms: 0
  • Description: 5,300+ pieces, 31,756+ newspapers, 4,700-volume library

Wikidata Search Strategy:

  • Search term: Museu da Borracha Unknown Brazil
  • Filter: instance of → museum
  • Verify: Location matches Unknown, Brazil

Additional High-Priority Candidates (16-30)

These institutions scored well but didn't make the top 15. Consider for Batch 10.

Rank Score Name Type City
16 14.0 Museu dos Povos Acreanos MUSEUM Rio Branco
17 14.0 Museu Histórico MUSEUM Alcântara
18 14.0 MARCO MUSEUM Campo Grande
19 14.0 Dom Bosco Museum MUSEUM Campo Grande
20 14.0 Ouro Preto System MUSEUM Ouro Preto
21 14.0 Natural History Museum MUSEUM Campina Grande
22 14.0 Memorial do RS MUSEUM Pelotas
23 14.0 Museu Memória MUSEUM Porto Velho
24 14.0 Museu do Homem Sergipano MUSEUM Aracaju
25 13.0 Museu dos Povos Acreanos MUSEUM Unknown
26 13.0 UFAL Natural History Museum MUSEUM Unknown
27 13.0 Museu Sacaca MUSEUM Unknown
28 13.0 Museu de Arqueologia e Etnologia MUSEUM Unknown
29 13.0 Arquivo Público (APEB) ARCHIVE Unknown
30 13.0 Arquivo Público DF ARCHIVE Unknown

Recommendations

Batch 9 Strategy (Target: 10-15 enrichments)

  1. Manual Wikidata Search (Most Reliable)

    • Search each top candidate on Wikidata
    • Verify location and institution type match
    • Record Q-numbers in enrichment script
  2. Automated Fuzzy Matching (Faster, Lower Precision)

    • Use existing scripts/enrich_brazil_batch9.py template
    • Adapt fuzzy matching from previous batches
    • Manually verify all matches before committing
  3. Hybrid Approach (Recommended)

    • Manual search for top 10 candidates (highest confidence)
    • Fuzzy matching for candidates 11-15 (with verification)
    • This balances speed and accuracy

Expected Outcome

  • Current coverage: 31/212 (14.6%)
  • After Batch 9 (+10 institutions): 41/212 (19.3%)
  • After Batch 9 (+15 institutions): 46/212 (21.7%)

Next Batches

  • Batch 10: Focus on remaining MUSEUM institutions (42 without Wikidata)
  • Batch 11: Focus on ARCHIVE + LIBRARY (12 total without Wikidata)
  • Batch 12: Cherry-pick high-scoring EDUCATION_PROVIDER institutions

Projected 30% coverage: Batches 9-11 combined (~35-40 total enrichments)

Files Generated

  • Candidate records: data/instances/brazil/batch9_candidates_analysis.yaml
  • This report: data/instances/brazil/BATCH9_CANDIDATES_REPORT.md

Manual Enrichment Template

For each candidate, follow this workflow:

# In scripts/enrich_brazil_batch9.py

BATCH_9_ENRICHMENTS = {
    "Museo Name Example": {
        "wikidata_id": "Q12345678",
        "match_score": 1.0,  # Manual verification
        "match_method": "Manual Wikidata search",
        "verification_notes": "Verified: location, type, and name match"
    },
    # ... add 10-15 entries
}

Status: Ready for manual Wikidata search
Next Action: Create scripts/enrich_brazil_batch9.py with top candidates