387 lines
10 KiB
Markdown
387 lines
10 KiB
Markdown
# Brazil Batch 9 Enrichment Candidates
|
|
|
|
**Generated**: 2025-11-11
|
|
**Purpose**: Identify high-priority Brazilian institutions for Wikidata enrichment
|
|
**Target**: Add 10-15 Wikidata identifiers to increase Brazil coverage from 14.6% → 19.3%+
|
|
|
|
## Summary Statistics
|
|
|
|
- **Total Brazilian institutions**: 212
|
|
- **With Wikidata**: 31 (14.6%)
|
|
- **Without Wikidata**: 181 (85.4%)
|
|
- **Candidates analyzed**: 181
|
|
- **Top candidates selected**: 15
|
|
|
|
## Institution Type Distribution (Without Wikidata)
|
|
|
|
| Type | Count | % of Total |
|
|
|------|-------|------------|
|
|
| MIXED | 61 | 33.7% |
|
|
| EDUCATION_PROVIDER | 43 | 23.8% |
|
|
| MUSEUM | 42 | 23.2% |
|
|
| OFFICIAL_INSTITUTION | 16 | 8.8% |
|
|
| ARCHIVE | 9 | 5.0% |
|
|
| RESEARCH_CENTER | 4 | 2.2% |
|
|
| GALLERY | 3 | 1.7% |
|
|
| LIBRARY | 3 | 1.7% |
|
|
|
|
## Scoring Methodology
|
|
|
|
Institutions are scored based on:
|
|
|
|
1. **Institution Type** (0-10 points)
|
|
- MUSEUM, LIBRARY, ARCHIVE: 10 points (core heritage institutions)
|
|
- GALLERY: 8 points
|
|
- RESEARCH_CENTER: 7 points
|
|
- OFFICIAL_INSTITUTION: 6 points
|
|
- EDUCATION_PROVIDER: 4 points
|
|
- MIXED: 3 points
|
|
|
|
2. **Name Specificity** (0-5 points)
|
|
- Explicit institutional names: +3 points
|
|
- National/state/federal/municipal institutions: +2 points
|
|
- Generic educational names: 0 bonus
|
|
|
|
3. **Digital Platforms** (0-3 points)
|
|
- Each platform: +1 point (max 3)
|
|
|
|
4. **Website Available** (0-2 points)
|
|
- Has website identifier: +2 points
|
|
|
|
5. **Geographic Location** (0-3 points)
|
|
- Major city (São Paulo, Rio, etc.): +3 points
|
|
- Has city information: +1 point
|
|
|
|
6. **Description Richness** (0-2 points)
|
|
- Detailed (>100 chars): +2 points
|
|
- Moderate (>50 chars): +1 point
|
|
|
|
**Maximum possible score**: 25 points
|
|
|
|
## Top 15 Candidates for Batch 9
|
|
|
|
|
|
### 1. Museu Paulista
|
|
|
|
- **Score**: 18.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: São Paulo, SP
|
|
- **Website**: Not available
|
|
- **Platforms**: 0
|
|
- **Description**: University of São Paulo museum, subject of research on collection policy (1990-2015) and acquisition strategies. Publishes Anais do Museu Paulista jou...
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Museu Paulista São Paulo Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches São Paulo, Brazil
|
|
|
|
---
|
|
|
|
### 2. Museu Casa de Rui Barbosa
|
|
|
|
- **Score**: 18.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: Rio de Janeiro,
|
|
- **Website**: Not available
|
|
- **Platforms**: 0
|
|
- **Description**: Developed systematic cataloging methodologies for museum environments, conducting research finding 47% of surveyed institutions perform room documenta...
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Museu Casa de Rui Barbosa Rio de Janeiro Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches Rio de Janeiro, Brazil
|
|
|
|
---
|
|
|
|
### 3. UnB BCE
|
|
|
|
- **Score**: 18.0/25
|
|
- **Type**: LIBRARY
|
|
- **Location**: Brasília, DISTRITO FEDERAL
|
|
- **Website**: https://bce.unb.br/
|
|
- **Platforms**: 0
|
|
- **Description**: 24/7 operations, @bceunb (38K)
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `UnB BCE Brasília Brazil`
|
|
- Filter: `instance of` → library
|
|
- Verify: Location matches Brasília, Brazil
|
|
|
|
---
|
|
|
|
### 4. Museu de Arte de São Paulo (MASP)
|
|
|
|
- **Score**: 17.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: São Paulo, SP
|
|
- **Website**: Not available
|
|
- **Platforms**: 0
|
|
- **Description**: Major art museum collaborating with Google Arts & Culture to reach global audiences.
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Museu de Arte de São Paulo (MASP) São Paulo Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches São Paulo, Brazil
|
|
|
|
---
|
|
|
|
### 5. MAX
|
|
|
|
- **Score**: 17.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: Rio de Janeiro, SERGIPE
|
|
- **Website**: Not available
|
|
- **Platforms**: 0
|
|
- **Description**: Archaeological museum, UFS-administered
|
|
|
|
Contact: Phone: 21 3395 8905
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `MAX Rio de Janeiro Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches Rio de Janeiro, Brazil
|
|
|
|
---
|
|
|
|
### 6. UFAL Natural History Museum
|
|
|
|
- **Score**: 16.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: Maceió, ALAGOAS
|
|
- **Website**: Not available
|
|
- **Platforms**: 0
|
|
- **Description**:
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `UFAL Natural History Museum Maceió Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches Maceió, Brazil
|
|
|
|
---
|
|
|
|
### 7. Museu Sacaca
|
|
|
|
- **Score**: 16.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: Macapá, AMAPÁ
|
|
- **Website**: http://www.museusacaca.ap.gov.br
|
|
- **Platforms**: 0
|
|
- **Description**: 21,000m², indigenous culture focus
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Museu Sacaca Macapá Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches Macapá, Brazil
|
|
|
|
---
|
|
|
|
### 8. Arquivo Público DF
|
|
|
|
- **Score**: 16.0/25
|
|
- **Type**: ARCHIVE
|
|
- **Location**: Brasília, DISTRITO FEDERAL
|
|
- **Website**: Not available
|
|
- **Platforms**: 0
|
|
- **Description**: @arpdf (16K), 1.445M photos
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Arquivo Público DF Brasília Brazil`
|
|
- Filter: `instance of` → archive
|
|
- Verify: Location matches Brasília, Brazil
|
|
|
|
---
|
|
|
|
### 9. UFPA
|
|
|
|
- **Score**: 16.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: Belém, PARÁ
|
|
- **Website**: Not available
|
|
- **Platforms**: 0
|
|
- **Description**: 50,000+ students, MUFPA museum
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `UFPA Belém Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches Belém, Brazil
|
|
|
|
---
|
|
|
|
### 10. Arquivo Blumenau
|
|
|
|
- **Score**: 16.0/25
|
|
- **Type**: ARCHIVE
|
|
- **Location**: Blumenau, SANTA CATARINA
|
|
- **Website**: http://arquivodeblumenau.com.br/
|
|
- **Platforms**: 0
|
|
- **Description**: 500,000+ photos, German/Italian
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Arquivo Blumenau Blumenau Brazil`
|
|
- Filter: `instance of` → archive
|
|
- Verify: Location matches Blumenau, Brazil
|
|
|
|
---
|
|
|
|
### 11. Museu Palacinho
|
|
|
|
- **Score**: 16.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: Palmas, TOCANTINS
|
|
- **Website**: https://museupalacinho.com/
|
|
- **Platforms**: 0
|
|
- **Description**: Contact: Phone: +55 63 99232-8613
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Museu Palacinho Palmas Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches Palmas, Brazil
|
|
|
|
---
|
|
|
|
### 12. Museu Nacional
|
|
|
|
- **Score**: 15.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: Unknown, RJ
|
|
- **Website**: Not available
|
|
- **Platforms**: 0
|
|
- **Description**: @museunacionalufrj (126K), reopening 2026
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Museu Nacional Unknown Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches Unknown, Brazil
|
|
|
|
---
|
|
|
|
### 13. Museu Palacinho
|
|
|
|
- **Score**: 15.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: Unknown, TO
|
|
- **Website**: https://museupalacinho.com/
|
|
- **Platforms**: 0
|
|
- **Description**: https://museupalacinho.com/
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Museu Palacinho Unknown Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches Unknown, Brazil
|
|
|
|
---
|
|
|
|
### 14. Biblioteca Digital Brasileira de Teses e Dissertações (BDTD)
|
|
|
|
- **Score**: 15.0/25
|
|
- **Type**: LIBRARY
|
|
- **Location**: Unknown,
|
|
- **Website**: Not available
|
|
- **Platforms**: 0
|
|
- **Description**: Aggregates graduate research from universities nationwide, providing access to Brazilian theses and dissertations.
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Biblioteca Digital Brasileira de Teses e Dissertações (BDTD) Unknown Brazil`
|
|
- Filter: `instance of` → library
|
|
- Verify: Location matches Unknown, Brazil
|
|
|
|
---
|
|
|
|
### 15. Museu da Borracha
|
|
|
|
- **Score**: 14.0/25
|
|
- **Type**: MUSEUM
|
|
- **Location**: Unknown, AC
|
|
- **Website**: Not available
|
|
- **Platforms**: 0
|
|
- **Description**: 5,300+ pieces, 31,756+ newspapers, 4,700-volume library
|
|
|
|
**Wikidata Search Strategy**:
|
|
- Search term: `Museu da Borracha Unknown Brazil`
|
|
- Filter: `instance of` → museum
|
|
- Verify: Location matches Unknown, Brazil
|
|
|
|
---
|
|
|
|
## Additional High-Priority Candidates (16-30)
|
|
|
|
These institutions scored well but didn't make the top 15. Consider for Batch 10.
|
|
|
|
| Rank | Score | Name | Type | City |
|
|
|------|-------|------|------|------|
|
|
| 16 | 14.0 | Museu dos Povos Acreanos | MUSEUM | Rio Branco |
|
|
| 17 | 14.0 | Museu Histórico | MUSEUM | Alcântara |
|
|
| 18 | 14.0 | MARCO | MUSEUM | Campo Grande |
|
|
| 19 | 14.0 | Dom Bosco Museum | MUSEUM | Campo Grande |
|
|
| 20 | 14.0 | Ouro Preto System | MUSEUM | Ouro Preto |
|
|
| 21 | 14.0 | Natural History Museum | MUSEUM | Campina Grande |
|
|
| 22 | 14.0 | Memorial do RS | MUSEUM | Pelotas |
|
|
| 23 | 14.0 | Museu Memória | MUSEUM | Porto Velho |
|
|
| 24 | 14.0 | Museu do Homem Sergipano | MUSEUM | Aracaju |
|
|
| 25 | 13.0 | Museu dos Povos Acreanos | MUSEUM | Unknown |
|
|
| 26 | 13.0 | UFAL Natural History Museum | MUSEUM | Unknown |
|
|
| 27 | 13.0 | Museu Sacaca | MUSEUM | Unknown |
|
|
| 28 | 13.0 | Museu de Arqueologia e Etnologia | MUSEUM | Unknown |
|
|
| 29 | 13.0 | Arquivo Público (APEB) | ARCHIVE | Unknown |
|
|
| 30 | 13.0 | Arquivo Público DF | ARCHIVE | Unknown |
|
|
|
|
## Recommendations
|
|
|
|
### Batch 9 Strategy (Target: 10-15 enrichments)
|
|
|
|
1. **Manual Wikidata Search** (Most Reliable)
|
|
- Search each top candidate on Wikidata
|
|
- Verify location and institution type match
|
|
- Record Q-numbers in enrichment script
|
|
|
|
2. **Automated Fuzzy Matching** (Faster, Lower Precision)
|
|
- Use existing `scripts/enrich_brazil_batch9.py` template
|
|
- Adapt fuzzy matching from previous batches
|
|
- Manually verify all matches before committing
|
|
|
|
3. **Hybrid Approach** (Recommended)
|
|
- Manual search for top 10 candidates (highest confidence)
|
|
- Fuzzy matching for candidates 11-15 (with verification)
|
|
- This balances speed and accuracy
|
|
|
|
### Expected Outcome
|
|
|
|
- **Current coverage**: 31/212 (14.6%)
|
|
- **After Batch 9** (+10 institutions): 41/212 (19.3%)
|
|
- **After Batch 9** (+15 institutions): 46/212 (21.7%)
|
|
|
|
### Next Batches
|
|
|
|
- **Batch 10**: Focus on remaining MUSEUM institutions (42 without Wikidata)
|
|
- **Batch 11**: Focus on ARCHIVE + LIBRARY (12 total without Wikidata)
|
|
- **Batch 12**: Cherry-pick high-scoring EDUCATION_PROVIDER institutions
|
|
|
|
**Projected 30% coverage**: Batches 9-11 combined (~35-40 total enrichments)
|
|
|
|
## Files Generated
|
|
|
|
- **Candidate records**: `data/instances/brazil/batch9_candidates_analysis.yaml`
|
|
- **This report**: `data/instances/brazil/BATCH9_CANDIDATES_REPORT.md`
|
|
|
|
## Manual Enrichment Template
|
|
|
|
For each candidate, follow this workflow:
|
|
|
|
```python
|
|
# In scripts/enrich_brazil_batch9.py
|
|
|
|
BATCH_9_ENRICHMENTS = {
|
|
"Museo Name Example": {
|
|
"wikidata_id": "Q12345678",
|
|
"match_score": 1.0, # Manual verification
|
|
"match_method": "Manual Wikidata search",
|
|
"verification_notes": "Verified: location, type, and name match"
|
|
},
|
|
# ... add 10-15 entries
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
**Status**: Ready for manual Wikidata search
|
|
**Next Action**: Create `scripts/enrich_brazil_batch9.py` with top candidates
|