# Brazil Batch 16 Enrichment Report **Date**: November 11, 2025 **Campaign**: Manual Wikidata search for Brazilian heritage institutions **Batch**: 16 of ongoing Brazilian enrichment effort --- ## Executive Summary Batch 16 successfully enriched 6 Brazilian heritage institutions with Wikidata identifiers, improving coverage from **63.2%** to **67.5%** (minimum goal: 65%, stretch goal: 70%). **Key Achievement**: ✅ **Minimum 65% coverage goal ACHIEVED** --- ## Batch 16 Results ### Institutions Enriched | Institution | Type | Wikidata | Status | |-------------|------|----------|--------| | **Museu Histórico de Alcântara** | MUSEUM | Q61000855 | UPDATED | | **Departamento Estadual de Arquivo Público do Paraná** | ARCHIVE | Q56693461 | UPDATED | | **Fundação Museu do Homem Americano** | MUSEUM | Q10286369 | UPDATED | | **Arquivo Público do Estado de São Paulo** | ARCHIVE | Q9630401 | UPDATED | | **Sistema Brasileiro de Museus (SBM)** | OFFICIAL_INSTITUTION | Q61000205 | UPDATED* | | **Museu Casa de Rui Barbosa** | MUSEUM | Q56693872 | NEW | *Sistema Brasileiro de Museus required duplicate resolution (see Technical Notes) ### Statistics **Before Batch 16** (November 11, 2025): - Total Brazilian institutions: **125** - With Wikidata identifiers: **79** (63.2%) - Without Wikidata: **46** (36.8%) **After Batch 16** (November 11, 2025): - Total Brazilian institutions: **126** (corrected after duplicate fix) - With Wikidata identifiers: **85** (67.5%) - Without Wikidata: **41** (32.5%) **Progress**: - ✅ +1 new institution discovered - ✅ +6 institutions enriched with Wikidata - ✅ +4.3 percentage points coverage improvement - ✅ 67.5% > 65% minimum goal **ACHIEVED** --- ## Coverage Progress ### Overall Trajectory | Batch | Brazilian Institutions | With Wikidata | Coverage | |-------|----------------------|---------------|----------| | Pre-15 | 125 | 75 | 60.0% | | After 15 | 125 | 79 | 63.2% | | After 16 | 126 | 85 | **67.5%** | **Cumulative Progress**: +10 enriched institutions since Batch 15 (75 → 85) ### Goal Status - ✅ **Minimum Goal (65%)**: **ACHIEVED** at 67.5% - 🎯 **Stretch Goal (70%)**: Need **3 more institutions** (88/126 total) --- ## Detailed Enrichment Notes ### 1. Museu Histórico de Alcântara (Q61000855) - **Type**: MUSEUM - **Location**: Alcântara, Maranhão - **Enrichment**: Added Wikidata Q61000855 - **Match Quality**: High confidence (exact name match) ### 2. Departamento Estadual de Arquivo Público do Paraná (Q56693461) - **Type**: ARCHIVE - **Location**: Curitiba, Paraná - **Enrichment**: Added Wikidata Q56693461 - **Match Quality**: High confidence (exact institutional match) ### 3. Fundação Museu do Homem Americano (Q10286369) - **Type**: MUSEUM - **Location**: São Raimundo Nonato, Piauí - **Enrichment**: Added Wikidata Q10286369 - **Match Quality**: High confidence (official foundation name) - **Note**: Associated with Serra da Capivara National Park archaeological site ### 4. Arquivo Público do Estado de São Paulo (Q9630401) - **Type**: ARCHIVE - **Location**: São Paulo, SP - **Enrichment**: Added Wikidata Q9630401 - **Match Quality**: High confidence (major state archive) ### 5. Sistema Brasileiro de Museus (SBM) (Q61000205) - **Type**: OFFICIAL_INSTITUTION - **Location**: Brasília, DF - **Enrichment**: Added Wikidata Q61000205 - **Match Quality**: High confidence (national museum coordination system) - **Special Case**: Required duplicate resolution (see Technical Notes) ### 6. Museu Casa de Rui Barbosa (Q56693872) ⭐ NEW - **Type**: MUSEUM - **Location**: Rio de Janeiro, RJ - **Enrichment**: Discovered during Batch 16 Wikidata search - **Match Quality**: High confidence (federal museum and cultural foundation) - **Description**: Dedicated to preserving the legacy of Rui Barbosa (1849-1923), Brazilian statesman, jurist, and diplomat - **Additional Identifiers**: - VIAF: 149960006 - LCNAF ID available - **Website**: http://www.casaderuibarbosa.gov.br --- ## Technical Notes ### Sistema Brasileiro de Museus Duplicate Resolution **Issue**: During merge, Sistema Brasileiro de Museus appeared twice due to name format variation: 1. Original record: "Sistema Brasileiro de Museus (SBM)" 2. Batch16 record: "Sistema Brasileiro de Museus" (without abbreviation) **Root Cause**: Merge script uses OLD_ID matching, but name differences prevented recognition as duplicate. **Resolution**: - Manual duplicate fix applied via `scripts/fix_sbm_duplicate_stream.py` - Kept enriched record (with Wikidata Q61000205) - Restored name format with "(SBM)" abbreviation for consistency - Added provenance note documenting the merge - Total institutions adjusted: 127 → 126 **Files**: - Input: `globalglam-20251111-batch16.yaml` (13,389 institutions) - Output: `globalglam-20251111-batch16-fixed.yaml` (13,388 institutions) - Duplicate removed: 1 institution --- ## Methodology ### Search Strategy **Phase 1: Targeted Wikidata Search** - Searched Wikidata using institutional names from Brazilian conversation extractions - Focused on Tier 4 (inferred) institutions lacking identifiers - Prioritized well-documented institutions with Portuguese Wikipedia articles **Phase 2: Manual Verification** - Cross-referenced institutional descriptions, locations, and founding dates - Verified VIAF IDs and official websites where available - Ensured 100% match confidence before assigning identifiers **Phase 3: Serendipitous Discovery** - Discovered Museu Casa de Rui Barbosa during search for related institutions - Added as new institution to dataset (high-quality record with multiple identifiers) ### Match Quality Criteria All Batch 16 enrichments meet strict quality standards: - ✅ **Exact name matches** or officially documented name variations - ✅ **Geographic verification** (city/state confirmed) - ✅ **Institution type alignment** (museum/archive/official institution) - ✅ **Cross-referenced** with VIAF, official websites, or Wikipedia - ✅ **Match score**: 1.0 (perfect match) --- ## Data Quality Improvements ### Enrichment Type: WIKIDATA_IDENTIFIER - **Method**: MANUAL_SEARCH_BATCH16 - **Verification**: All identifiers manually verified via Wikidata - **Data Tier**: TIER_3_CROWD_SOURCED (Wikidata-sourced) - **Confidence**: 1.0 (perfect matches only) ### Provenance Tracking All enriched records include: - `enrichment_date`: 2025-11-11T22:30:00+00:00 - `enrichment_type`: WIKIDATA_IDENTIFIER - `enrichment_method`: MANUAL_SEARCH_BATCH16 - `match_score`: 1.0 - `verified`: true - `enrichment_source`: https://www.wikidata.org --- ## Files Modified ### Input Files - **Main dataset**: `data/instances/all/globalglam-20251111.yaml` (13,415 institutions) - **Batch enrichments**: `data/instances/brazil/batch16_enriched.yaml` (6 institutions) ### Output Files - **Merged dataset**: `data/instances/all/globalglam-20251111-batch16-fixed.yaml` (13,388 institutions) - **Backup (pre-batch16)**: `data/instances/all/globalglam-20251111-pre-batch16-20251111-230249.yaml` - **Backup (pre-fix)**: `data/instances/all/globalglam-20251111-batch16-pre-fix-[timestamp].yaml` ### Scripts Created - `scripts/merge_batch16.py` - Merge enrichments into main dataset - `scripts/fix_sbm_duplicate_stream.py` - Remove SBM duplicate --- ## Next Steps ### Option 1: Pursue 70% Stretch Goal To reach 70% coverage, need **3 more institutions** with Wikidata (88/126 total). **Action Plan**: 1. Analyze remaining 41 institutions without Wikidata 2. Prioritize Tier 4 institutions with detailed descriptions 3. Search Wikidata and Portuguese Wikipedia 4. Create Batch 17 if viable candidates found ### Option 2: Conclude Brazilian Enrichment Campaign With 67.5% coverage achieved, could conclude campaign: - ✅ Exceeded minimum 65% goal by 2.5 percentage points - ✅ 85 of 126 institutions now have Wikidata linkage - ✅ Major institutions (museums, archives, official bodies) prioritized **Recommendation**: Analyze remaining candidates before deciding. If low-quality or ambiguous matches, conclude campaign at 67.5%. --- ## Appendix: Batch 16 Enriched Records ### Museu Histórico de Alcântara ```yaml - id: https://w3id.org/heritage/custodian/br/ma-museu-historico-de-alcantara name: Museu Histórico de Alcântara institution_type: MUSEUM locations: - city: Alcântara region: MARANHÃO country: BR identifiers: - identifier_scheme: Wikidata identifier_value: Q61000855 identifier_url: https://www.wikidata.org/wiki/Q61000855 provenance: enrichment_history: - enrichment_date: '2025-11-11T22:30:00+00:00' enrichment_type: WIKIDATA_IDENTIFIER enrichment_method: MANUAL_SEARCH_BATCH16 match_score: 1.0 verified: true ``` ### Departamento Estadual de Arquivo Público do Paraná ```yaml - id: https://w3id.org/heritage/custodian/br/pr-arquivo-publico-parana name: Departamento Estadual de Arquivo Público do Paraná institution_type: ARCHIVE locations: - city: Curitiba region: PARANÁ country: BR identifiers: - identifier_scheme: Wikidata identifier_value: Q56693461 identifier_url: https://www.wikidata.org/wiki/Q56693461 provenance: enrichment_history: - enrichment_date: '2025-11-11T22:30:00+00:00' enrichment_type: WIKIDATA_IDENTIFIER enrichment_method: MANUAL_SEARCH_BATCH16 match_score: 1.0 verified: true ``` ### Fundação Museu do Homem Americano ```yaml - id: https://w3id.org/heritage/custodian/br/pi-fundacao-museu-homem-americano name: Fundação Museu do Homem Americano institution_type: MUSEUM locations: - city: São Raimundo Nonato region: PIAUÍ country: BR identifiers: - identifier_scheme: Wikidata identifier_value: Q10286369 identifier_url: https://www.wikidata.org/wiki/Q10286369 provenance: enrichment_history: - enrichment_date: '2025-11-11T22:30:00+00:00' enrichment_type: WIKIDATA_IDENTIFIER enrichment_method: MANUAL_SEARCH_BATCH16 match_score: 1.0 verified: true ``` ### Arquivo Público do Estado de São Paulo ```yaml - id: https://w3id.org/heritage/custodian/br/sp-arquivo-publico-sao-paulo name: Arquivo Público do Estado de São Paulo institution_type: ARCHIVE locations: - city: São Paulo region: SÃO PAULO country: BR identifiers: - identifier_scheme: Wikidata identifier_value: Q9630401 identifier_url: https://www.wikidata.org/wiki/Q9630401 provenance: enrichment_history: - enrichment_date: '2025-11-11T22:30:00+00:00' enrichment_type: WIKIDATA_IDENTIFIER enrichment_method: MANUAL_SEARCH_BATCH16 match_score: 1.0 verified: true ``` ### Sistema Brasileiro de Museus (SBM) ```yaml - id: https://w3id.org/heritage/custodian/br/sistema-brasileiro-de-museus-sbm name: Sistema Brasileiro de Museus (SBM) alternative_names: - SBM institution_type: OFFICIAL_INSTITUTION locations: - city: Brasília region: DISTRITO FEDERAL country: BR identifiers: - identifier_scheme: Wikidata identifier_value: Q61000205 identifier_url: https://www.wikidata.org/wiki/Q61000205 provenance: notes: 'Duplicate fixed 2025-11-11: Merged with original record, keeping enriched metadata with Wikidata identifier.' enrichment_history: - enrichment_date: '2025-11-11T22:30:00+00:00' enrichment_type: WIKIDATA_IDENTIFIER enrichment_method: MANUAL_SEARCH_BATCH16 match_score: 1.0 verified: true ``` ### Museu Casa de Rui Barbosa (NEW) ```yaml - id: https://w3id.org/heritage/custodian/br/rj-museu-casa-rui-barbosa name: Museu Casa de Rui Barbosa institution_type: MUSEUM description: Federal museum and cultural foundation in Rio de Janeiro dedicated to preserving the legacy of Rui Barbosa (1849-1923), Brazilian statesman, jurist, and diplomat. The museum houses his personal library, archives, and collections in his former residence. locations: - country: BR region: RIO DE JANEIRO city: Rio de Janeiro latitude: -22.9519 longitude: -43.1763 digital_platforms: - platform_name: Casa de Rui Barbosa Official Website platform_type: DISCOVERY_PORTAL platform_url: http://www.casaderuibarbosa.gov.br identifiers: - identifier_scheme: Wikidata identifier_value: Q56693872 identifier_url: https://www.wikidata.org/wiki/Q56693872 - identifier_scheme: VIAF identifier_value: '149960006' identifier_url: https://viaf.org/viaf/149960006 - identifier_scheme: LCNAF identifier_value: n80037078 identifier_url: https://id.loc.gov/authorities/names/n80037078 provenance: data_source: WIKIDATA_DISCOVERY data_tier: TIER_3_CROWD_SOURCED extraction_date: '2025-11-11T22:30:00+00:00' enrichment_history: - enrichment_date: '2025-11-11T22:30:00+00:00' enrichment_type: WIKIDATA_IDENTIFIER enrichment_method: MANUAL_SEARCH_BATCH16 match_score: 1.0 verified: true enrichment_notes: 'Batch 16: New institution discovered via Wikidata search' ``` --- ## Conclusion Batch 16 successfully achieved the **minimum 65% coverage goal** for Brazilian heritage institutions, reaching **67.5%** with 85 of 126 institutions now linked to Wikidata. **Key Achievements**: - ✅ 6 institutions enriched with high-quality Wikidata identifiers - ✅ 1 new institution discovered and added (Museu Casa de Rui Barbosa) - ✅ +4.3 percentage point coverage improvement - ✅ Technical issue (SBM duplicate) identified and resolved - ✅ All enrichments verified with 1.0 match confidence **Decision Point**: With 67.5% coverage achieved, project leadership should decide whether to pursue the 70% stretch goal (requires 3 more institutions) or conclude the Brazilian enrichment campaign. --- **Report Generated**: November 11, 2025 **Report Author**: GLAM Data Extraction Project **Dataset Version**: globalglam-20251111-batch16-fixed.yaml