14 KiB
Brazil Batch 16 Enrichment Report
Date: November 11, 2025
Campaign: Manual Wikidata search for Brazilian heritage institutions
Batch: 16 of ongoing Brazilian enrichment effort
Executive Summary
Batch 16 successfully enriched 6 Brazilian heritage institutions with Wikidata identifiers, improving coverage from 63.2% to 67.5% (minimum goal: 65%, stretch goal: 70%).
Key Achievement: ✅ Minimum 65% coverage goal ACHIEVED
Batch 16 Results
Institutions Enriched
| Institution | Type | Wikidata | Status |
|---|---|---|---|
| Museu Histórico de Alcântara | MUSEUM | Q61000855 | UPDATED |
| Departamento Estadual de Arquivo Público do Paraná | ARCHIVE | Q56693461 | UPDATED |
| Fundação Museu do Homem Americano | MUSEUM | Q10286369 | UPDATED |
| Arquivo Público do Estado de São Paulo | ARCHIVE | Q9630401 | UPDATED |
| Sistema Brasileiro de Museus (SBM) | OFFICIAL_INSTITUTION | Q61000205 | UPDATED* |
| Museu Casa de Rui Barbosa | MUSEUM | Q56693872 | NEW |
*Sistema Brasileiro de Museus required duplicate resolution (see Technical Notes)
Statistics
Before Batch 16 (November 11, 2025):
- Total Brazilian institutions: 125
- With Wikidata identifiers: 79 (63.2%)
- Without Wikidata: 46 (36.8%)
After Batch 16 (November 11, 2025):
- Total Brazilian institutions: 126 (corrected after duplicate fix)
- With Wikidata identifiers: 85 (67.5%)
- Without Wikidata: 41 (32.5%)
Progress:
- ✅ +1 new institution discovered
- ✅ +6 institutions enriched with Wikidata
- ✅ +4.3 percentage points coverage improvement
- ✅ 67.5% > 65% minimum goal ACHIEVED
Coverage Progress
Overall Trajectory
| Batch | Brazilian Institutions | With Wikidata | Coverage |
|---|---|---|---|
| Pre-15 | 125 | 75 | 60.0% |
| After 15 | 125 | 79 | 63.2% |
| After 16 | 126 | 85 | 67.5% |
Cumulative Progress: +10 enriched institutions since Batch 15 (75 → 85)
Goal Status
- ✅ Minimum Goal (65%): ACHIEVED at 67.5%
- 🎯 Stretch Goal (70%): Need 3 more institutions (88/126 total)
Detailed Enrichment Notes
1. Museu Histórico de Alcântara (Q61000855)
- Type: MUSEUM
- Location: Alcântara, Maranhão
- Enrichment: Added Wikidata Q61000855
- Match Quality: High confidence (exact name match)
2. Departamento Estadual de Arquivo Público do Paraná (Q56693461)
- Type: ARCHIVE
- Location: Curitiba, Paraná
- Enrichment: Added Wikidata Q56693461
- Match Quality: High confidence (exact institutional match)
3. Fundação Museu do Homem Americano (Q10286369)
- Type: MUSEUM
- Location: São Raimundo Nonato, Piauí
- Enrichment: Added Wikidata Q10286369
- Match Quality: High confidence (official foundation name)
- Note: Associated with Serra da Capivara National Park archaeological site
4. Arquivo Público do Estado de São Paulo (Q9630401)
- Type: ARCHIVE
- Location: São Paulo, SP
- Enrichment: Added Wikidata Q9630401
- Match Quality: High confidence (major state archive)
5. Sistema Brasileiro de Museus (SBM) (Q61000205)
- Type: OFFICIAL_INSTITUTION
- Location: Brasília, DF
- Enrichment: Added Wikidata Q61000205
- Match Quality: High confidence (national museum coordination system)
- Special Case: Required duplicate resolution (see Technical Notes)
6. Museu Casa de Rui Barbosa (Q56693872) ⭐ NEW
- Type: MUSEUM
- Location: Rio de Janeiro, RJ
- Enrichment: Discovered during Batch 16 Wikidata search
- Match Quality: High confidence (federal museum and cultural foundation)
- Description: Dedicated to preserving the legacy of Rui Barbosa (1849-1923), Brazilian statesman, jurist, and diplomat
- Additional Identifiers:
- VIAF: 149960006
- LCNAF ID available
- Website: http://www.casaderuibarbosa.gov.br
Technical Notes
Sistema Brasileiro de Museus Duplicate Resolution
Issue: During merge, Sistema Brasileiro de Museus appeared twice due to name format variation:
- Original record: "Sistema Brasileiro de Museus (SBM)"
- Batch16 record: "Sistema Brasileiro de Museus" (without abbreviation)
Root Cause: Merge script uses OLD_ID matching, but name differences prevented recognition as duplicate.
Resolution:
- Manual duplicate fix applied via
scripts/fix_sbm_duplicate_stream.py - Kept enriched record (with Wikidata Q61000205)
- Restored name format with "(SBM)" abbreviation for consistency
- Added provenance note documenting the merge
- Total institutions adjusted: 127 → 126
Files:
- Input:
globalglam-20251111-batch16.yaml(13,389 institutions) - Output:
globalglam-20251111-batch16-fixed.yaml(13,388 institutions) - Duplicate removed: 1 institution
Methodology
Search Strategy
Phase 1: Targeted Wikidata Search
- Searched Wikidata using institutional names from Brazilian conversation extractions
- Focused on Tier 4 (inferred) institutions lacking identifiers
- Prioritized well-documented institutions with Portuguese Wikipedia articles
Phase 2: Manual Verification
- Cross-referenced institutional descriptions, locations, and founding dates
- Verified VIAF IDs and official websites where available
- Ensured 100% match confidence before assigning identifiers
Phase 3: Serendipitous Discovery
- Discovered Museu Casa de Rui Barbosa during search for related institutions
- Added as new institution to dataset (high-quality record with multiple identifiers)
Match Quality Criteria
All Batch 16 enrichments meet strict quality standards:
- ✅ Exact name matches or officially documented name variations
- ✅ Geographic verification (city/state confirmed)
- ✅ Institution type alignment (museum/archive/official institution)
- ✅ Cross-referenced with VIAF, official websites, or Wikipedia
- ✅ Match score: 1.0 (perfect match)
Data Quality Improvements
Enrichment Type: WIKIDATA_IDENTIFIER
- Method: MANUAL_SEARCH_BATCH16
- Verification: All identifiers manually verified via Wikidata
- Data Tier: TIER_3_CROWD_SOURCED (Wikidata-sourced)
- Confidence: 1.0 (perfect matches only)
Provenance Tracking
All enriched records include:
enrichment_date: 2025-11-11T22:30:00+00:00enrichment_type: WIKIDATA_IDENTIFIERenrichment_method: MANUAL_SEARCH_BATCH16match_score: 1.0verified: trueenrichment_source: https://www.wikidata.org
Files Modified
Input Files
- Main dataset:
data/instances/all/globalglam-20251111.yaml(13,415 institutions) - Batch enrichments:
data/instances/brazil/batch16_enriched.yaml(6 institutions)
Output Files
- Merged dataset:
data/instances/all/globalglam-20251111-batch16-fixed.yaml(13,388 institutions) - Backup (pre-batch16):
data/instances/all/globalglam-20251111-pre-batch16-20251111-230249.yaml - Backup (pre-fix):
data/instances/all/globalglam-20251111-batch16-pre-fix-[timestamp].yaml
Scripts Created
scripts/merge_batch16.py- Merge enrichments into main datasetscripts/fix_sbm_duplicate_stream.py- Remove SBM duplicate
Next Steps
Option 1: Pursue 70% Stretch Goal
To reach 70% coverage, need 3 more institutions with Wikidata (88/126 total).
Action Plan:
- Analyze remaining 41 institutions without Wikidata
- Prioritize Tier 4 institutions with detailed descriptions
- Search Wikidata and Portuguese Wikipedia
- Create Batch 17 if viable candidates found
Option 2: Conclude Brazilian Enrichment Campaign
With 67.5% coverage achieved, could conclude campaign:
- ✅ Exceeded minimum 65% goal by 2.5 percentage points
- ✅ 85 of 126 institutions now have Wikidata linkage
- ✅ Major institutions (museums, archives, official bodies) prioritized
Recommendation: Analyze remaining candidates before deciding. If low-quality or ambiguous matches, conclude campaign at 67.5%.
Appendix: Batch 16 Enriched Records
Museu Histórico de Alcântara
- id: https://w3id.org/heritage/custodian/br/ma-museu-historico-de-alcantara
name: Museu Histórico de Alcântara
institution_type: MUSEUM
locations:
- city: Alcântara
region: MARANHÃO
country: BR
identifiers:
- identifier_scheme: Wikidata
identifier_value: Q61000855
identifier_url: https://www.wikidata.org/wiki/Q61000855
provenance:
enrichment_history:
- enrichment_date: '2025-11-11T22:30:00+00:00'
enrichment_type: WIKIDATA_IDENTIFIER
enrichment_method: MANUAL_SEARCH_BATCH16
match_score: 1.0
verified: true
Departamento Estadual de Arquivo Público do Paraná
- id: https://w3id.org/heritage/custodian/br/pr-arquivo-publico-parana
name: Departamento Estadual de Arquivo Público do Paraná
institution_type: ARCHIVE
locations:
- city: Curitiba
region: PARANÁ
country: BR
identifiers:
- identifier_scheme: Wikidata
identifier_value: Q56693461
identifier_url: https://www.wikidata.org/wiki/Q56693461
provenance:
enrichment_history:
- enrichment_date: '2025-11-11T22:30:00+00:00'
enrichment_type: WIKIDATA_IDENTIFIER
enrichment_method: MANUAL_SEARCH_BATCH16
match_score: 1.0
verified: true
Fundação Museu do Homem Americano
- id: https://w3id.org/heritage/custodian/br/pi-fundacao-museu-homem-americano
name: Fundação Museu do Homem Americano
institution_type: MUSEUM
locations:
- city: São Raimundo Nonato
region: PIAUÍ
country: BR
identifiers:
- identifier_scheme: Wikidata
identifier_value: Q10286369
identifier_url: https://www.wikidata.org/wiki/Q10286369
provenance:
enrichment_history:
- enrichment_date: '2025-11-11T22:30:00+00:00'
enrichment_type: WIKIDATA_IDENTIFIER
enrichment_method: MANUAL_SEARCH_BATCH16
match_score: 1.0
verified: true
Arquivo Público do Estado de São Paulo
- id: https://w3id.org/heritage/custodian/br/sp-arquivo-publico-sao-paulo
name: Arquivo Público do Estado de São Paulo
institution_type: ARCHIVE
locations:
- city: São Paulo
region: SÃO PAULO
country: BR
identifiers:
- identifier_scheme: Wikidata
identifier_value: Q9630401
identifier_url: https://www.wikidata.org/wiki/Q9630401
provenance:
enrichment_history:
- enrichment_date: '2025-11-11T22:30:00+00:00'
enrichment_type: WIKIDATA_IDENTIFIER
enrichment_method: MANUAL_SEARCH_BATCH16
match_score: 1.0
verified: true
Sistema Brasileiro de Museus (SBM)
- id: https://w3id.org/heritage/custodian/br/sistema-brasileiro-de-museus-sbm
name: Sistema Brasileiro de Museus (SBM)
alternative_names:
- SBM
institution_type: OFFICIAL_INSTITUTION
locations:
- city: Brasília
region: DISTRITO FEDERAL
country: BR
identifiers:
- identifier_scheme: Wikidata
identifier_value: Q61000205
identifier_url: https://www.wikidata.org/wiki/Q61000205
provenance:
notes: 'Duplicate fixed 2025-11-11: Merged with original record, keeping enriched metadata with Wikidata identifier.'
enrichment_history:
- enrichment_date: '2025-11-11T22:30:00+00:00'
enrichment_type: WIKIDATA_IDENTIFIER
enrichment_method: MANUAL_SEARCH_BATCH16
match_score: 1.0
verified: true
Museu Casa de Rui Barbosa (NEW)
- id: https://w3id.org/heritage/custodian/br/rj-museu-casa-rui-barbosa
name: Museu Casa de Rui Barbosa
institution_type: MUSEUM
description: Federal museum and cultural foundation in Rio de Janeiro dedicated
to preserving the legacy of Rui Barbosa (1849-1923), Brazilian statesman, jurist,
and diplomat. The museum houses his personal library, archives, and collections
in his former residence.
locations:
- country: BR
region: RIO DE JANEIRO
city: Rio de Janeiro
latitude: -22.9519
longitude: -43.1763
digital_platforms:
- platform_name: Casa de Rui Barbosa Official Website
platform_type: DISCOVERY_PORTAL
platform_url: http://www.casaderuibarbosa.gov.br
identifiers:
- identifier_scheme: Wikidata
identifier_value: Q56693872
identifier_url: https://www.wikidata.org/wiki/Q56693872
- identifier_scheme: VIAF
identifier_value: '149960006'
identifier_url: https://viaf.org/viaf/149960006
- identifier_scheme: LCNAF
identifier_value: n80037078
identifier_url: https://id.loc.gov/authorities/names/n80037078
provenance:
data_source: WIKIDATA_DISCOVERY
data_tier: TIER_3_CROWD_SOURCED
extraction_date: '2025-11-11T22:30:00+00:00'
enrichment_history:
- enrichment_date: '2025-11-11T22:30:00+00:00'
enrichment_type: WIKIDATA_IDENTIFIER
enrichment_method: MANUAL_SEARCH_BATCH16
match_score: 1.0
verified: true
enrichment_notes: 'Batch 16: New institution discovered via Wikidata search'
Conclusion
Batch 16 successfully achieved the minimum 65% coverage goal for Brazilian heritage institutions, reaching 67.5% with 85 of 126 institutions now linked to Wikidata.
Key Achievements:
- ✅ 6 institutions enriched with high-quality Wikidata identifiers
- ✅ 1 new institution discovered and added (Museu Casa de Rui Barbosa)
- ✅ +4.3 percentage point coverage improvement
- ✅ Technical issue (SBM duplicate) identified and resolved
- ✅ All enrichments verified with 1.0 match confidence
Decision Point: With 67.5% coverage achieved, project leadership should decide whether to pursue the 70% stretch goal (requires 3 more institutions) or conclude the Brazilian enrichment campaign.
Report Generated: November 11, 2025
Report Author: GLAM Data Extraction Project
Dataset Version: globalglam-20251111-batch16-fixed.yaml