glam/data/instances/brazil/BATCH9_CANDIDATES_REPORT.md
2025-11-19 23:25:22 +01:00

387 lines
10 KiB
Markdown

# Brazil Batch 9 Enrichment Candidates
**Generated**: 2025-11-11
**Purpose**: Identify high-priority Brazilian institutions for Wikidata enrichment
**Target**: Add 10-15 Wikidata identifiers to increase Brazil coverage from 14.6% → 19.3%+
## Summary Statistics
- **Total Brazilian institutions**: 212
- **With Wikidata**: 31 (14.6%)
- **Without Wikidata**: 181 (85.4%)
- **Candidates analyzed**: 181
- **Top candidates selected**: 15
## Institution Type Distribution (Without Wikidata)
| Type | Count | % of Total |
|------|-------|------------|
| MIXED | 61 | 33.7% |
| EDUCATION_PROVIDER | 43 | 23.8% |
| MUSEUM | 42 | 23.2% |
| OFFICIAL_INSTITUTION | 16 | 8.8% |
| ARCHIVE | 9 | 5.0% |
| RESEARCH_CENTER | 4 | 2.2% |
| GALLERY | 3 | 1.7% |
| LIBRARY | 3 | 1.7% |
## Scoring Methodology
Institutions are scored based on:
1. **Institution Type** (0-10 points)
- MUSEUM, LIBRARY, ARCHIVE: 10 points (core heritage institutions)
- GALLERY: 8 points
- RESEARCH_CENTER: 7 points
- OFFICIAL_INSTITUTION: 6 points
- EDUCATION_PROVIDER: 4 points
- MIXED: 3 points
2. **Name Specificity** (0-5 points)
- Explicit institutional names: +3 points
- National/state/federal/municipal institutions: +2 points
- Generic educational names: 0 bonus
3. **Digital Platforms** (0-3 points)
- Each platform: +1 point (max 3)
4. **Website Available** (0-2 points)
- Has website identifier: +2 points
5. **Geographic Location** (0-3 points)
- Major city (São Paulo, Rio, etc.): +3 points
- Has city information: +1 point
6. **Description Richness** (0-2 points)
- Detailed (>100 chars): +2 points
- Moderate (>50 chars): +1 point
**Maximum possible score**: 25 points
## Top 15 Candidates for Batch 9
### 1. Museu Paulista
- **Score**: 18.0/25
- **Type**: MUSEUM
- **Location**: São Paulo, SP
- **Website**: Not available
- **Platforms**: 0
- **Description**: University of São Paulo museum, subject of research on collection policy (1990-2015) and acquisition strategies. Publishes Anais do Museu Paulista jou...
**Wikidata Search Strategy**:
- Search term: `Museu Paulista São Paulo Brazil`
- Filter: `instance of` → museum
- Verify: Location matches São Paulo, Brazil
---
### 2. Museu Casa de Rui Barbosa
- **Score**: 18.0/25
- **Type**: MUSEUM
- **Location**: Rio de Janeiro,
- **Website**: Not available
- **Platforms**: 0
- **Description**: Developed systematic cataloging methodologies for museum environments, conducting research finding 47% of surveyed institutions perform room documenta...
**Wikidata Search Strategy**:
- Search term: `Museu Casa de Rui Barbosa Rio de Janeiro Brazil`
- Filter: `instance of` → museum
- Verify: Location matches Rio de Janeiro, Brazil
---
### 3. UnB BCE
- **Score**: 18.0/25
- **Type**: LIBRARY
- **Location**: Brasília, DISTRITO FEDERAL
- **Website**: https://bce.unb.br/
- **Platforms**: 0
- **Description**: 24/7 operations, @bceunb (38K)
**Wikidata Search Strategy**:
- Search term: `UnB BCE Brasília Brazil`
- Filter: `instance of` → library
- Verify: Location matches Brasília, Brazil
---
### 4. Museu de Arte de São Paulo (MASP)
- **Score**: 17.0/25
- **Type**: MUSEUM
- **Location**: São Paulo, SP
- **Website**: Not available
- **Platforms**: 0
- **Description**: Major art museum collaborating with Google Arts & Culture to reach global audiences.
**Wikidata Search Strategy**:
- Search term: `Museu de Arte de São Paulo (MASP) São Paulo Brazil`
- Filter: `instance of` → museum
- Verify: Location matches São Paulo, Brazil
---
### 5. MAX
- **Score**: 17.0/25
- **Type**: MUSEUM
- **Location**: Rio de Janeiro, SERGIPE
- **Website**: Not available
- **Platforms**: 0
- **Description**: Archaeological museum, UFS-administered
Contact: Phone: 21 3395 8905
**Wikidata Search Strategy**:
- Search term: `MAX Rio de Janeiro Brazil`
- Filter: `instance of` → museum
- Verify: Location matches Rio de Janeiro, Brazil
---
### 6. UFAL Natural History Museum
- **Score**: 16.0/25
- **Type**: MUSEUM
- **Location**: Maceió, ALAGOAS
- **Website**: Not available
- **Platforms**: 0
- **Description**:
**Wikidata Search Strategy**:
- Search term: `UFAL Natural History Museum Maceió Brazil`
- Filter: `instance of` → museum
- Verify: Location matches Maceió, Brazil
---
### 7. Museu Sacaca
- **Score**: 16.0/25
- **Type**: MUSEUM
- **Location**: Macapá, AMAPÁ
- **Website**: http://www.museusacaca.ap.gov.br
- **Platforms**: 0
- **Description**: 21,000m², indigenous culture focus
**Wikidata Search Strategy**:
- Search term: `Museu Sacaca Macapá Brazil`
- Filter: `instance of` → museum
- Verify: Location matches Macapá, Brazil
---
### 8. Arquivo Público DF
- **Score**: 16.0/25
- **Type**: ARCHIVE
- **Location**: Brasília, DISTRITO FEDERAL
- **Website**: Not available
- **Platforms**: 0
- **Description**: @arpdf (16K), 1.445M photos
**Wikidata Search Strategy**:
- Search term: `Arquivo Público DF Brasília Brazil`
- Filter: `instance of` → archive
- Verify: Location matches Brasília, Brazil
---
### 9. UFPA
- **Score**: 16.0/25
- **Type**: MUSEUM
- **Location**: Belém, PARÁ
- **Website**: Not available
- **Platforms**: 0
- **Description**: 50,000+ students, MUFPA museum
**Wikidata Search Strategy**:
- Search term: `UFPA Belém Brazil`
- Filter: `instance of` → museum
- Verify: Location matches Belém, Brazil
---
### 10. Arquivo Blumenau
- **Score**: 16.0/25
- **Type**: ARCHIVE
- **Location**: Blumenau, SANTA CATARINA
- **Website**: http://arquivodeblumenau.com.br/
- **Platforms**: 0
- **Description**: 500,000+ photos, German/Italian
**Wikidata Search Strategy**:
- Search term: `Arquivo Blumenau Blumenau Brazil`
- Filter: `instance of` → archive
- Verify: Location matches Blumenau, Brazil
---
### 11. Museu Palacinho
- **Score**: 16.0/25
- **Type**: MUSEUM
- **Location**: Palmas, TOCANTINS
- **Website**: https://museupalacinho.com/
- **Platforms**: 0
- **Description**: Contact: Phone: +55 63 99232-8613
**Wikidata Search Strategy**:
- Search term: `Museu Palacinho Palmas Brazil`
- Filter: `instance of` → museum
- Verify: Location matches Palmas, Brazil
---
### 12. Museu Nacional
- **Score**: 15.0/25
- **Type**: MUSEUM
- **Location**: Unknown, RJ
- **Website**: Not available
- **Platforms**: 0
- **Description**: @museunacionalufrj (126K), reopening 2026
**Wikidata Search Strategy**:
- Search term: `Museu Nacional Unknown Brazil`
- Filter: `instance of` → museum
- Verify: Location matches Unknown, Brazil
---
### 13. Museu Palacinho
- **Score**: 15.0/25
- **Type**: MUSEUM
- **Location**: Unknown, TO
- **Website**: https://museupalacinho.com/
- **Platforms**: 0
- **Description**: https://museupalacinho.com/
**Wikidata Search Strategy**:
- Search term: `Museu Palacinho Unknown Brazil`
- Filter: `instance of` → museum
- Verify: Location matches Unknown, Brazil
---
### 14. Biblioteca Digital Brasileira de Teses e Dissertações (BDTD)
- **Score**: 15.0/25
- **Type**: LIBRARY
- **Location**: Unknown,
- **Website**: Not available
- **Platforms**: 0
- **Description**: Aggregates graduate research from universities nationwide, providing access to Brazilian theses and dissertations.
**Wikidata Search Strategy**:
- Search term: `Biblioteca Digital Brasileira de Teses e Dissertações (BDTD) Unknown Brazil`
- Filter: `instance of` → library
- Verify: Location matches Unknown, Brazil
---
### 15. Museu da Borracha
- **Score**: 14.0/25
- **Type**: MUSEUM
- **Location**: Unknown, AC
- **Website**: Not available
- **Platforms**: 0
- **Description**: 5,300+ pieces, 31,756+ newspapers, 4,700-volume library
**Wikidata Search Strategy**:
- Search term: `Museu da Borracha Unknown Brazil`
- Filter: `instance of` → museum
- Verify: Location matches Unknown, Brazil
---
## Additional High-Priority Candidates (16-30)
These institutions scored well but didn't make the top 15. Consider for Batch 10.
| Rank | Score | Name | Type | City |
|------|-------|------|------|------|
| 16 | 14.0 | Museu dos Povos Acreanos | MUSEUM | Rio Branco |
| 17 | 14.0 | Museu Histórico | MUSEUM | Alcântara |
| 18 | 14.0 | MARCO | MUSEUM | Campo Grande |
| 19 | 14.0 | Dom Bosco Museum | MUSEUM | Campo Grande |
| 20 | 14.0 | Ouro Preto System | MUSEUM | Ouro Preto |
| 21 | 14.0 | Natural History Museum | MUSEUM | Campina Grande |
| 22 | 14.0 | Memorial do RS | MUSEUM | Pelotas |
| 23 | 14.0 | Museu Memória | MUSEUM | Porto Velho |
| 24 | 14.0 | Museu do Homem Sergipano | MUSEUM | Aracaju |
| 25 | 13.0 | Museu dos Povos Acreanos | MUSEUM | Unknown |
| 26 | 13.0 | UFAL Natural History Museum | MUSEUM | Unknown |
| 27 | 13.0 | Museu Sacaca | MUSEUM | Unknown |
| 28 | 13.0 | Museu de Arqueologia e Etnologia | MUSEUM | Unknown |
| 29 | 13.0 | Arquivo Público (APEB) | ARCHIVE | Unknown |
| 30 | 13.0 | Arquivo Público DF | ARCHIVE | Unknown |
## Recommendations
### Batch 9 Strategy (Target: 10-15 enrichments)
1. **Manual Wikidata Search** (Most Reliable)
- Search each top candidate on Wikidata
- Verify location and institution type match
- Record Q-numbers in enrichment script
2. **Automated Fuzzy Matching** (Faster, Lower Precision)
- Use existing `scripts/enrich_brazil_batch9.py` template
- Adapt fuzzy matching from previous batches
- Manually verify all matches before committing
3. **Hybrid Approach** (Recommended)
- Manual search for top 10 candidates (highest confidence)
- Fuzzy matching for candidates 11-15 (with verification)
- This balances speed and accuracy
### Expected Outcome
- **Current coverage**: 31/212 (14.6%)
- **After Batch 9** (+10 institutions): 41/212 (19.3%)
- **After Batch 9** (+15 institutions): 46/212 (21.7%)
### Next Batches
- **Batch 10**: Focus on remaining MUSEUM institutions (42 without Wikidata)
- **Batch 11**: Focus on ARCHIVE + LIBRARY (12 total without Wikidata)
- **Batch 12**: Cherry-pick high-scoring EDUCATION_PROVIDER institutions
**Projected 30% coverage**: Batches 9-11 combined (~35-40 total enrichments)
## Files Generated
- **Candidate records**: `data/instances/brazil/batch9_candidates_analysis.yaml`
- **This report**: `data/instances/brazil/BATCH9_CANDIDATES_REPORT.md`
## Manual Enrichment Template
For each candidate, follow this workflow:
```python
# In scripts/enrich_brazil_batch9.py
BATCH_9_ENRICHMENTS = {
"Museo Name Example": {
"wikidata_id": "Q12345678",
"match_score": 1.0, # Manual verification
"match_method": "Manual Wikidata search",
"verification_notes": "Verified: location, type, and name match"
},
# ... add 10-15 entries
}
```
---
**Status**: Ready for manual Wikidata search
**Next Action**: Create `scripts/enrich_brazil_batch9.py` with top candidates