404 lines
14 KiB
Markdown
404 lines
14 KiB
Markdown
# Brazil Batch 16 Enrichment Report
|
|
|
|
**Date**: November 11, 2025
|
|
**Campaign**: Manual Wikidata search for Brazilian heritage institutions
|
|
**Batch**: 16 of ongoing Brazilian enrichment effort
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
Batch 16 successfully enriched 6 Brazilian heritage institutions with Wikidata identifiers, improving coverage from **63.2%** to **67.5%** (minimum goal: 65%, stretch goal: 70%).
|
|
|
|
**Key Achievement**: ✅ **Minimum 65% coverage goal ACHIEVED**
|
|
|
|
---
|
|
|
|
## Batch 16 Results
|
|
|
|
### Institutions Enriched
|
|
|
|
| Institution | Type | Wikidata | Status |
|
|
|-------------|------|----------|--------|
|
|
| **Museu Histórico de Alcântara** | MUSEUM | Q61000855 | UPDATED |
|
|
| **Departamento Estadual de Arquivo Público do Paraná** | ARCHIVE | Q56693461 | UPDATED |
|
|
| **Fundação Museu do Homem Americano** | MUSEUM | Q10286369 | UPDATED |
|
|
| **Arquivo Público do Estado de São Paulo** | ARCHIVE | Q9630401 | UPDATED |
|
|
| **Sistema Brasileiro de Museus (SBM)** | OFFICIAL_INSTITUTION | Q61000205 | UPDATED* |
|
|
| **Museu Casa de Rui Barbosa** | MUSEUM | Q56693872 | NEW |
|
|
|
|
*Sistema Brasileiro de Museus required duplicate resolution (see Technical Notes)
|
|
|
|
### Statistics
|
|
|
|
**Before Batch 16** (November 11, 2025):
|
|
- Total Brazilian institutions: **125**
|
|
- With Wikidata identifiers: **79** (63.2%)
|
|
- Without Wikidata: **46** (36.8%)
|
|
|
|
**After Batch 16** (November 11, 2025):
|
|
- Total Brazilian institutions: **126** (corrected after duplicate fix)
|
|
- With Wikidata identifiers: **85** (67.5%)
|
|
- Without Wikidata: **41** (32.5%)
|
|
|
|
**Progress**:
|
|
- ✅ +1 new institution discovered
|
|
- ✅ +6 institutions enriched with Wikidata
|
|
- ✅ +4.3 percentage points coverage improvement
|
|
- ✅ 67.5% > 65% minimum goal **ACHIEVED**
|
|
|
|
---
|
|
|
|
## Coverage Progress
|
|
|
|
### Overall Trajectory
|
|
|
|
| Batch | Brazilian Institutions | With Wikidata | Coverage |
|
|
|-------|----------------------|---------------|----------|
|
|
| Pre-15 | 125 | 75 | 60.0% |
|
|
| After 15 | 125 | 79 | 63.2% |
|
|
| After 16 | 126 | 85 | **67.5%** |
|
|
|
|
**Cumulative Progress**: +10 enriched institutions since Batch 15 (75 → 85)
|
|
|
|
### Goal Status
|
|
|
|
- ✅ **Minimum Goal (65%)**: **ACHIEVED** at 67.5%
|
|
- 🎯 **Stretch Goal (70%)**: Need **3 more institutions** (88/126 total)
|
|
|
|
---
|
|
|
|
## Detailed Enrichment Notes
|
|
|
|
### 1. Museu Histórico de Alcântara (Q61000855)
|
|
- **Type**: MUSEUM
|
|
- **Location**: Alcântara, Maranhão
|
|
- **Enrichment**: Added Wikidata Q61000855
|
|
- **Match Quality**: High confidence (exact name match)
|
|
|
|
### 2. Departamento Estadual de Arquivo Público do Paraná (Q56693461)
|
|
- **Type**: ARCHIVE
|
|
- **Location**: Curitiba, Paraná
|
|
- **Enrichment**: Added Wikidata Q56693461
|
|
- **Match Quality**: High confidence (exact institutional match)
|
|
|
|
### 3. Fundação Museu do Homem Americano (Q10286369)
|
|
- **Type**: MUSEUM
|
|
- **Location**: São Raimundo Nonato, Piauí
|
|
- **Enrichment**: Added Wikidata Q10286369
|
|
- **Match Quality**: High confidence (official foundation name)
|
|
- **Note**: Associated with Serra da Capivara National Park archaeological site
|
|
|
|
### 4. Arquivo Público do Estado de São Paulo (Q9630401)
|
|
- **Type**: ARCHIVE
|
|
- **Location**: São Paulo, SP
|
|
- **Enrichment**: Added Wikidata Q9630401
|
|
- **Match Quality**: High confidence (major state archive)
|
|
|
|
### 5. Sistema Brasileiro de Museus (SBM) (Q61000205)
|
|
- **Type**: OFFICIAL_INSTITUTION
|
|
- **Location**: Brasília, DF
|
|
- **Enrichment**: Added Wikidata Q61000205
|
|
- **Match Quality**: High confidence (national museum coordination system)
|
|
- **Special Case**: Required duplicate resolution (see Technical Notes)
|
|
|
|
### 6. Museu Casa de Rui Barbosa (Q56693872) ⭐ NEW
|
|
- **Type**: MUSEUM
|
|
- **Location**: Rio de Janeiro, RJ
|
|
- **Enrichment**: Discovered during Batch 16 Wikidata search
|
|
- **Match Quality**: High confidence (federal museum and cultural foundation)
|
|
- **Description**: Dedicated to preserving the legacy of Rui Barbosa (1849-1923), Brazilian statesman, jurist, and diplomat
|
|
- **Additional Identifiers**:
|
|
- VIAF: 149960006
|
|
- LCNAF ID available
|
|
- **Website**: http://www.casaderuibarbosa.gov.br
|
|
|
|
---
|
|
|
|
## Technical Notes
|
|
|
|
### Sistema Brasileiro de Museus Duplicate Resolution
|
|
|
|
**Issue**: During merge, Sistema Brasileiro de Museus appeared twice due to name format variation:
|
|
1. Original record: "Sistema Brasileiro de Museus (SBM)"
|
|
2. Batch16 record: "Sistema Brasileiro de Museus" (without abbreviation)
|
|
|
|
**Root Cause**: Merge script uses OLD_ID matching, but name differences prevented recognition as duplicate.
|
|
|
|
**Resolution**:
|
|
- Manual duplicate fix applied via `scripts/fix_sbm_duplicate_stream.py`
|
|
- Kept enriched record (with Wikidata Q61000205)
|
|
- Restored name format with "(SBM)" abbreviation for consistency
|
|
- Added provenance note documenting the merge
|
|
- Total institutions adjusted: 127 → 126
|
|
|
|
**Files**:
|
|
- Input: `globalglam-20251111-batch16.yaml` (13,389 institutions)
|
|
- Output: `globalglam-20251111-batch16-fixed.yaml` (13,388 institutions)
|
|
- Duplicate removed: 1 institution
|
|
|
|
---
|
|
|
|
## Methodology
|
|
|
|
### Search Strategy
|
|
|
|
**Phase 1: Targeted Wikidata Search**
|
|
- Searched Wikidata using institutional names from Brazilian conversation extractions
|
|
- Focused on Tier 4 (inferred) institutions lacking identifiers
|
|
- Prioritized well-documented institutions with Portuguese Wikipedia articles
|
|
|
|
**Phase 2: Manual Verification**
|
|
- Cross-referenced institutional descriptions, locations, and founding dates
|
|
- Verified VIAF IDs and official websites where available
|
|
- Ensured 100% match confidence before assigning identifiers
|
|
|
|
**Phase 3: Serendipitous Discovery**
|
|
- Discovered Museu Casa de Rui Barbosa during search for related institutions
|
|
- Added as new institution to dataset (high-quality record with multiple identifiers)
|
|
|
|
### Match Quality Criteria
|
|
|
|
All Batch 16 enrichments meet strict quality standards:
|
|
- ✅ **Exact name matches** or officially documented name variations
|
|
- ✅ **Geographic verification** (city/state confirmed)
|
|
- ✅ **Institution type alignment** (museum/archive/official institution)
|
|
- ✅ **Cross-referenced** with VIAF, official websites, or Wikipedia
|
|
- ✅ **Match score**: 1.0 (perfect match)
|
|
|
|
---
|
|
|
|
## Data Quality Improvements
|
|
|
|
### Enrichment Type: WIKIDATA_IDENTIFIER
|
|
- **Method**: MANUAL_SEARCH_BATCH16
|
|
- **Verification**: All identifiers manually verified via Wikidata
|
|
- **Data Tier**: TIER_3_CROWD_SOURCED (Wikidata-sourced)
|
|
- **Confidence**: 1.0 (perfect matches only)
|
|
|
|
### Provenance Tracking
|
|
All enriched records include:
|
|
- `enrichment_date`: 2025-11-11T22:30:00+00:00
|
|
- `enrichment_type`: WIKIDATA_IDENTIFIER
|
|
- `enrichment_method`: MANUAL_SEARCH_BATCH16
|
|
- `match_score`: 1.0
|
|
- `verified`: true
|
|
- `enrichment_source`: https://www.wikidata.org
|
|
|
|
---
|
|
|
|
## Files Modified
|
|
|
|
### Input Files
|
|
- **Main dataset**: `data/instances/all/globalglam-20251111.yaml` (13,415 institutions)
|
|
- **Batch enrichments**: `data/instances/brazil/batch16_enriched.yaml` (6 institutions)
|
|
|
|
### Output Files
|
|
- **Merged dataset**: `data/instances/all/globalglam-20251111-batch16-fixed.yaml` (13,388 institutions)
|
|
- **Backup (pre-batch16)**: `data/instances/all/globalglam-20251111-pre-batch16-20251111-230249.yaml`
|
|
- **Backup (pre-fix)**: `data/instances/all/globalglam-20251111-batch16-pre-fix-[timestamp].yaml`
|
|
|
|
### Scripts Created
|
|
- `scripts/merge_batch16.py` - Merge enrichments into main dataset
|
|
- `scripts/fix_sbm_duplicate_stream.py` - Remove SBM duplicate
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
### Option 1: Pursue 70% Stretch Goal
|
|
To reach 70% coverage, need **3 more institutions** with Wikidata (88/126 total).
|
|
|
|
**Action Plan**:
|
|
1. Analyze remaining 41 institutions without Wikidata
|
|
2. Prioritize Tier 4 institutions with detailed descriptions
|
|
3. Search Wikidata and Portuguese Wikipedia
|
|
4. Create Batch 17 if viable candidates found
|
|
|
|
### Option 2: Conclude Brazilian Enrichment Campaign
|
|
With 67.5% coverage achieved, could conclude campaign:
|
|
- ✅ Exceeded minimum 65% goal by 2.5 percentage points
|
|
- ✅ 85 of 126 institutions now have Wikidata linkage
|
|
- ✅ Major institutions (museums, archives, official bodies) prioritized
|
|
|
|
**Recommendation**: Analyze remaining candidates before deciding. If low-quality or ambiguous matches, conclude campaign at 67.5%.
|
|
|
|
---
|
|
|
|
## Appendix: Batch 16 Enriched Records
|
|
|
|
### Museu Histórico de Alcântara
|
|
```yaml
|
|
- id: https://w3id.org/heritage/custodian/br/ma-museu-historico-de-alcantara
|
|
name: Museu Histórico de Alcântara
|
|
institution_type: MUSEUM
|
|
locations:
|
|
- city: Alcântara
|
|
region: MARANHÃO
|
|
country: BR
|
|
identifiers:
|
|
- identifier_scheme: Wikidata
|
|
identifier_value: Q61000855
|
|
identifier_url: https://www.wikidata.org/wiki/Q61000855
|
|
provenance:
|
|
enrichment_history:
|
|
- enrichment_date: '2025-11-11T22:30:00+00:00'
|
|
enrichment_type: WIKIDATA_IDENTIFIER
|
|
enrichment_method: MANUAL_SEARCH_BATCH16
|
|
match_score: 1.0
|
|
verified: true
|
|
```
|
|
|
|
### Departamento Estadual de Arquivo Público do Paraná
|
|
```yaml
|
|
- id: https://w3id.org/heritage/custodian/br/pr-arquivo-publico-parana
|
|
name: Departamento Estadual de Arquivo Público do Paraná
|
|
institution_type: ARCHIVE
|
|
locations:
|
|
- city: Curitiba
|
|
region: PARANÁ
|
|
country: BR
|
|
identifiers:
|
|
- identifier_scheme: Wikidata
|
|
identifier_value: Q56693461
|
|
identifier_url: https://www.wikidata.org/wiki/Q56693461
|
|
provenance:
|
|
enrichment_history:
|
|
- enrichment_date: '2025-11-11T22:30:00+00:00'
|
|
enrichment_type: WIKIDATA_IDENTIFIER
|
|
enrichment_method: MANUAL_SEARCH_BATCH16
|
|
match_score: 1.0
|
|
verified: true
|
|
```
|
|
|
|
### Fundação Museu do Homem Americano
|
|
```yaml
|
|
- id: https://w3id.org/heritage/custodian/br/pi-fundacao-museu-homem-americano
|
|
name: Fundação Museu do Homem Americano
|
|
institution_type: MUSEUM
|
|
locations:
|
|
- city: São Raimundo Nonato
|
|
region: PIAUÍ
|
|
country: BR
|
|
identifiers:
|
|
- identifier_scheme: Wikidata
|
|
identifier_value: Q10286369
|
|
identifier_url: https://www.wikidata.org/wiki/Q10286369
|
|
provenance:
|
|
enrichment_history:
|
|
- enrichment_date: '2025-11-11T22:30:00+00:00'
|
|
enrichment_type: WIKIDATA_IDENTIFIER
|
|
enrichment_method: MANUAL_SEARCH_BATCH16
|
|
match_score: 1.0
|
|
verified: true
|
|
```
|
|
|
|
### Arquivo Público do Estado de São Paulo
|
|
```yaml
|
|
- id: https://w3id.org/heritage/custodian/br/sp-arquivo-publico-sao-paulo
|
|
name: Arquivo Público do Estado de São Paulo
|
|
institution_type: ARCHIVE
|
|
locations:
|
|
- city: São Paulo
|
|
region: SÃO PAULO
|
|
country: BR
|
|
identifiers:
|
|
- identifier_scheme: Wikidata
|
|
identifier_value: Q9630401
|
|
identifier_url: https://www.wikidata.org/wiki/Q9630401
|
|
provenance:
|
|
enrichment_history:
|
|
- enrichment_date: '2025-11-11T22:30:00+00:00'
|
|
enrichment_type: WIKIDATA_IDENTIFIER
|
|
enrichment_method: MANUAL_SEARCH_BATCH16
|
|
match_score: 1.0
|
|
verified: true
|
|
```
|
|
|
|
### Sistema Brasileiro de Museus (SBM)
|
|
```yaml
|
|
- id: https://w3id.org/heritage/custodian/br/sistema-brasileiro-de-museus-sbm
|
|
name: Sistema Brasileiro de Museus (SBM)
|
|
alternative_names:
|
|
- SBM
|
|
institution_type: OFFICIAL_INSTITUTION
|
|
locations:
|
|
- city: Brasília
|
|
region: DISTRITO FEDERAL
|
|
country: BR
|
|
identifiers:
|
|
- identifier_scheme: Wikidata
|
|
identifier_value: Q61000205
|
|
identifier_url: https://www.wikidata.org/wiki/Q61000205
|
|
provenance:
|
|
notes: 'Duplicate fixed 2025-11-11: Merged with original record, keeping enriched metadata with Wikidata identifier.'
|
|
enrichment_history:
|
|
- enrichment_date: '2025-11-11T22:30:00+00:00'
|
|
enrichment_type: WIKIDATA_IDENTIFIER
|
|
enrichment_method: MANUAL_SEARCH_BATCH16
|
|
match_score: 1.0
|
|
verified: true
|
|
```
|
|
|
|
### Museu Casa de Rui Barbosa (NEW)
|
|
```yaml
|
|
- id: https://w3id.org/heritage/custodian/br/rj-museu-casa-rui-barbosa
|
|
name: Museu Casa de Rui Barbosa
|
|
institution_type: MUSEUM
|
|
description: Federal museum and cultural foundation in Rio de Janeiro dedicated
|
|
to preserving the legacy of Rui Barbosa (1849-1923), Brazilian statesman, jurist,
|
|
and diplomat. The museum houses his personal library, archives, and collections
|
|
in his former residence.
|
|
locations:
|
|
- country: BR
|
|
region: RIO DE JANEIRO
|
|
city: Rio de Janeiro
|
|
latitude: -22.9519
|
|
longitude: -43.1763
|
|
digital_platforms:
|
|
- platform_name: Casa de Rui Barbosa Official Website
|
|
platform_type: DISCOVERY_PORTAL
|
|
platform_url: http://www.casaderuibarbosa.gov.br
|
|
identifiers:
|
|
- identifier_scheme: Wikidata
|
|
identifier_value: Q56693872
|
|
identifier_url: https://www.wikidata.org/wiki/Q56693872
|
|
- identifier_scheme: VIAF
|
|
identifier_value: '149960006'
|
|
identifier_url: https://viaf.org/viaf/149960006
|
|
- identifier_scheme: LCNAF
|
|
identifier_value: n80037078
|
|
identifier_url: https://id.loc.gov/authorities/names/n80037078
|
|
provenance:
|
|
data_source: WIKIDATA_DISCOVERY
|
|
data_tier: TIER_3_CROWD_SOURCED
|
|
extraction_date: '2025-11-11T22:30:00+00:00'
|
|
enrichment_history:
|
|
- enrichment_date: '2025-11-11T22:30:00+00:00'
|
|
enrichment_type: WIKIDATA_IDENTIFIER
|
|
enrichment_method: MANUAL_SEARCH_BATCH16
|
|
match_score: 1.0
|
|
verified: true
|
|
enrichment_notes: 'Batch 16: New institution discovered via Wikidata search'
|
|
```
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
Batch 16 successfully achieved the **minimum 65% coverage goal** for Brazilian heritage institutions, reaching **67.5%** with 85 of 126 institutions now linked to Wikidata.
|
|
|
|
**Key Achievements**:
|
|
- ✅ 6 institutions enriched with high-quality Wikidata identifiers
|
|
- ✅ 1 new institution discovered and added (Museu Casa de Rui Barbosa)
|
|
- ✅ +4.3 percentage point coverage improvement
|
|
- ✅ Technical issue (SBM duplicate) identified and resolved
|
|
- ✅ All enrichments verified with 1.0 match confidence
|
|
|
|
**Decision Point**: With 67.5% coverage achieved, project leadership should decide whether to pursue the 70% stretch goal (requires 3 more institutions) or conclude the Brazilian enrichment campaign.
|
|
|
|
---
|
|
|
|
**Report Generated**: November 11, 2025
|
|
**Report Author**: GLAM Data Extraction Project
|
|
**Dataset Version**: globalglam-20251111-batch16-fixed.yaml
|