glam/reports/brazil/batch16_report.md
2025-11-19 23:25:22 +01:00

14 KiB

Brazil Batch 16 Enrichment Report

Date: November 11, 2025
Campaign: Manual Wikidata search for Brazilian heritage institutions
Batch: 16 of ongoing Brazilian enrichment effort


Executive Summary

Batch 16 successfully enriched 6 Brazilian heritage institutions with Wikidata identifiers, improving coverage from 63.2% to 67.5% (minimum goal: 65%, stretch goal: 70%).

Key Achievement: Minimum 65% coverage goal ACHIEVED


Batch 16 Results

Institutions Enriched

Institution Type Wikidata Status
Museu Histórico de Alcântara MUSEUM Q61000855 UPDATED
Departamento Estadual de Arquivo Público do Paraná ARCHIVE Q56693461 UPDATED
Fundação Museu do Homem Americano MUSEUM Q10286369 UPDATED
Arquivo Público do Estado de São Paulo ARCHIVE Q9630401 UPDATED
Sistema Brasileiro de Museus (SBM) OFFICIAL_INSTITUTION Q61000205 UPDATED*
Museu Casa de Rui Barbosa MUSEUM Q56693872 NEW

*Sistema Brasileiro de Museus required duplicate resolution (see Technical Notes)

Statistics

Before Batch 16 (November 11, 2025):

  • Total Brazilian institutions: 125
  • With Wikidata identifiers: 79 (63.2%)
  • Without Wikidata: 46 (36.8%)

After Batch 16 (November 11, 2025):

  • Total Brazilian institutions: 126 (corrected after duplicate fix)
  • With Wikidata identifiers: 85 (67.5%)
  • Without Wikidata: 41 (32.5%)

Progress:

  • +1 new institution discovered
  • +6 institutions enriched with Wikidata
  • +4.3 percentage points coverage improvement
  • 67.5% > 65% minimum goal ACHIEVED

Coverage Progress

Overall Trajectory

Batch Brazilian Institutions With Wikidata Coverage
Pre-15 125 75 60.0%
After 15 125 79 63.2%
After 16 126 85 67.5%

Cumulative Progress: +10 enriched institutions since Batch 15 (75 → 85)

Goal Status

  • Minimum Goal (65%): ACHIEVED at 67.5%
  • 🎯 Stretch Goal (70%): Need 3 more institutions (88/126 total)

Detailed Enrichment Notes

1. Museu Histórico de Alcântara (Q61000855)

  • Type: MUSEUM
  • Location: Alcântara, Maranhão
  • Enrichment: Added Wikidata Q61000855
  • Match Quality: High confidence (exact name match)

2. Departamento Estadual de Arquivo Público do Paraná (Q56693461)

  • Type: ARCHIVE
  • Location: Curitiba, Paraná
  • Enrichment: Added Wikidata Q56693461
  • Match Quality: High confidence (exact institutional match)

3. Fundação Museu do Homem Americano (Q10286369)

  • Type: MUSEUM
  • Location: São Raimundo Nonato, Piauí
  • Enrichment: Added Wikidata Q10286369
  • Match Quality: High confidence (official foundation name)
  • Note: Associated with Serra da Capivara National Park archaeological site

4. Arquivo Público do Estado de São Paulo (Q9630401)

  • Type: ARCHIVE
  • Location: São Paulo, SP
  • Enrichment: Added Wikidata Q9630401
  • Match Quality: High confidence (major state archive)

5. Sistema Brasileiro de Museus (SBM) (Q61000205)

  • Type: OFFICIAL_INSTITUTION
  • Location: Brasília, DF
  • Enrichment: Added Wikidata Q61000205
  • Match Quality: High confidence (national museum coordination system)
  • Special Case: Required duplicate resolution (see Technical Notes)

6. Museu Casa de Rui Barbosa (Q56693872) NEW

  • Type: MUSEUM
  • Location: Rio de Janeiro, RJ
  • Enrichment: Discovered during Batch 16 Wikidata search
  • Match Quality: High confidence (federal museum and cultural foundation)
  • Description: Dedicated to preserving the legacy of Rui Barbosa (1849-1923), Brazilian statesman, jurist, and diplomat
  • Additional Identifiers:
    • VIAF: 149960006
    • LCNAF ID available
  • Website: http://www.casaderuibarbosa.gov.br

Technical Notes

Sistema Brasileiro de Museus Duplicate Resolution

Issue: During merge, Sistema Brasileiro de Museus appeared twice due to name format variation:

  1. Original record: "Sistema Brasileiro de Museus (SBM)"
  2. Batch16 record: "Sistema Brasileiro de Museus" (without abbreviation)

Root Cause: Merge script uses OLD_ID matching, but name differences prevented recognition as duplicate.

Resolution:

  • Manual duplicate fix applied via scripts/fix_sbm_duplicate_stream.py
  • Kept enriched record (with Wikidata Q61000205)
  • Restored name format with "(SBM)" abbreviation for consistency
  • Added provenance note documenting the merge
  • Total institutions adjusted: 127 → 126

Files:

  • Input: globalglam-20251111-batch16.yaml (13,389 institutions)
  • Output: globalglam-20251111-batch16-fixed.yaml (13,388 institutions)
  • Duplicate removed: 1 institution

Methodology

Search Strategy

Phase 1: Targeted Wikidata Search

  • Searched Wikidata using institutional names from Brazilian conversation extractions
  • Focused on Tier 4 (inferred) institutions lacking identifiers
  • Prioritized well-documented institutions with Portuguese Wikipedia articles

Phase 2: Manual Verification

  • Cross-referenced institutional descriptions, locations, and founding dates
  • Verified VIAF IDs and official websites where available
  • Ensured 100% match confidence before assigning identifiers

Phase 3: Serendipitous Discovery

  • Discovered Museu Casa de Rui Barbosa during search for related institutions
  • Added as new institution to dataset (high-quality record with multiple identifiers)

Match Quality Criteria

All Batch 16 enrichments meet strict quality standards:

  • Exact name matches or officially documented name variations
  • Geographic verification (city/state confirmed)
  • Institution type alignment (museum/archive/official institution)
  • Cross-referenced with VIAF, official websites, or Wikipedia
  • Match score: 1.0 (perfect match)

Data Quality Improvements

Enrichment Type: WIKIDATA_IDENTIFIER

  • Method: MANUAL_SEARCH_BATCH16
  • Verification: All identifiers manually verified via Wikidata
  • Data Tier: TIER_3_CROWD_SOURCED (Wikidata-sourced)
  • Confidence: 1.0 (perfect matches only)

Provenance Tracking

All enriched records include:

  • enrichment_date: 2025-11-11T22:30:00+00:00
  • enrichment_type: WIKIDATA_IDENTIFIER
  • enrichment_method: MANUAL_SEARCH_BATCH16
  • match_score: 1.0
  • verified: true
  • enrichment_source: https://www.wikidata.org

Files Modified

Input Files

  • Main dataset: data/instances/all/globalglam-20251111.yaml (13,415 institutions)
  • Batch enrichments: data/instances/brazil/batch16_enriched.yaml (6 institutions)

Output Files

  • Merged dataset: data/instances/all/globalglam-20251111-batch16-fixed.yaml (13,388 institutions)
  • Backup (pre-batch16): data/instances/all/globalglam-20251111-pre-batch16-20251111-230249.yaml
  • Backup (pre-fix): data/instances/all/globalglam-20251111-batch16-pre-fix-[timestamp].yaml

Scripts Created

  • scripts/merge_batch16.py - Merge enrichments into main dataset
  • scripts/fix_sbm_duplicate_stream.py - Remove SBM duplicate

Next Steps

Option 1: Pursue 70% Stretch Goal

To reach 70% coverage, need 3 more institutions with Wikidata (88/126 total).

Action Plan:

  1. Analyze remaining 41 institutions without Wikidata
  2. Prioritize Tier 4 institutions with detailed descriptions
  3. Search Wikidata and Portuguese Wikipedia
  4. Create Batch 17 if viable candidates found

Option 2: Conclude Brazilian Enrichment Campaign

With 67.5% coverage achieved, could conclude campaign:

  • Exceeded minimum 65% goal by 2.5 percentage points
  • 85 of 126 institutions now have Wikidata linkage
  • Major institutions (museums, archives, official bodies) prioritized

Recommendation: Analyze remaining candidates before deciding. If low-quality or ambiguous matches, conclude campaign at 67.5%.


Appendix: Batch 16 Enriched Records

Museu Histórico de Alcântara

- id: https://w3id.org/heritage/custodian/br/ma-museu-historico-de-alcantara
  name: Museu Histórico de Alcântara
  institution_type: MUSEUM
  locations:
    - city: Alcântara
      region: MARANHÃO
      country: BR
  identifiers:
    - identifier_scheme: Wikidata
      identifier_value: Q61000855
      identifier_url: https://www.wikidata.org/wiki/Q61000855
  provenance:
    enrichment_history:
      - enrichment_date: '2025-11-11T22:30:00+00:00'
        enrichment_type: WIKIDATA_IDENTIFIER
        enrichment_method: MANUAL_SEARCH_BATCH16
        match_score: 1.0
        verified: true

Departamento Estadual de Arquivo Público do Paraná

- id: https://w3id.org/heritage/custodian/br/pr-arquivo-publico-parana
  name: Departamento Estadual de Arquivo Público do Paraná
  institution_type: ARCHIVE
  locations:
    - city: Curitiba
      region: PARANÁ
      country: BR
  identifiers:
    - identifier_scheme: Wikidata
      identifier_value: Q56693461
      identifier_url: https://www.wikidata.org/wiki/Q56693461
  provenance:
    enrichment_history:
      - enrichment_date: '2025-11-11T22:30:00+00:00'
        enrichment_type: WIKIDATA_IDENTIFIER
        enrichment_method: MANUAL_SEARCH_BATCH16
        match_score: 1.0
        verified: true

Fundação Museu do Homem Americano

- id: https://w3id.org/heritage/custodian/br/pi-fundacao-museu-homem-americano
  name: Fundação Museu do Homem Americano
  institution_type: MUSEUM
  locations:
    - city: São Raimundo Nonato
      region: PIAUÍ
      country: BR
  identifiers:
    - identifier_scheme: Wikidata
      identifier_value: Q10286369
      identifier_url: https://www.wikidata.org/wiki/Q10286369
  provenance:
    enrichment_history:
      - enrichment_date: '2025-11-11T22:30:00+00:00'
        enrichment_type: WIKIDATA_IDENTIFIER
        enrichment_method: MANUAL_SEARCH_BATCH16
        match_score: 1.0
        verified: true

Arquivo Público do Estado de São Paulo

- id: https://w3id.org/heritage/custodian/br/sp-arquivo-publico-sao-paulo
  name: Arquivo Público do Estado de São Paulo
  institution_type: ARCHIVE
  locations:
    - city: São Paulo
      region: SÃO PAULO
      country: BR
  identifiers:
    - identifier_scheme: Wikidata
      identifier_value: Q9630401
      identifier_url: https://www.wikidata.org/wiki/Q9630401
  provenance:
    enrichment_history:
      - enrichment_date: '2025-11-11T22:30:00+00:00'
        enrichment_type: WIKIDATA_IDENTIFIER
        enrichment_method: MANUAL_SEARCH_BATCH16
        match_score: 1.0
        verified: true

Sistema Brasileiro de Museus (SBM)

- id: https://w3id.org/heritage/custodian/br/sistema-brasileiro-de-museus-sbm
  name: Sistema Brasileiro de Museus (SBM)
  alternative_names:
    - SBM
  institution_type: OFFICIAL_INSTITUTION
  locations:
    - city: Brasília
      region: DISTRITO FEDERAL
      country: BR
  identifiers:
    - identifier_scheme: Wikidata
      identifier_value: Q61000205
      identifier_url: https://www.wikidata.org/wiki/Q61000205
  provenance:
    notes: 'Duplicate fixed 2025-11-11: Merged with original record, keeping enriched metadata with Wikidata identifier.'
    enrichment_history:
      - enrichment_date: '2025-11-11T22:30:00+00:00'
        enrichment_type: WIKIDATA_IDENTIFIER
        enrichment_method: MANUAL_SEARCH_BATCH16
        match_score: 1.0
        verified: true

Museu Casa de Rui Barbosa (NEW)

- id: https://w3id.org/heritage/custodian/br/rj-museu-casa-rui-barbosa
  name: Museu Casa de Rui Barbosa
  institution_type: MUSEUM
  description: Federal museum and cultural foundation in Rio de Janeiro dedicated
    to preserving the legacy of Rui Barbosa (1849-1923), Brazilian statesman, jurist,
    and diplomat. The museum houses his personal library, archives, and collections
    in his former residence.
  locations:
    - country: BR
      region: RIO DE JANEIRO
      city: Rio de Janeiro
      latitude: -22.9519
      longitude: -43.1763
  digital_platforms:
    - platform_name: Casa de Rui Barbosa Official Website
      platform_type: DISCOVERY_PORTAL
      platform_url: http://www.casaderuibarbosa.gov.br
  identifiers:
    - identifier_scheme: Wikidata
      identifier_value: Q56693872
      identifier_url: https://www.wikidata.org/wiki/Q56693872
    - identifier_scheme: VIAF
      identifier_value: '149960006'
      identifier_url: https://viaf.org/viaf/149960006
    - identifier_scheme: LCNAF
      identifier_value: n80037078
      identifier_url: https://id.loc.gov/authorities/names/n80037078
  provenance:
    data_source: WIKIDATA_DISCOVERY
    data_tier: TIER_3_CROWD_SOURCED
    extraction_date: '2025-11-11T22:30:00+00:00'
    enrichment_history:
      - enrichment_date: '2025-11-11T22:30:00+00:00'
        enrichment_type: WIKIDATA_IDENTIFIER
        enrichment_method: MANUAL_SEARCH_BATCH16
        match_score: 1.0
        verified: true
        enrichment_notes: 'Batch 16: New institution discovered via Wikidata search'

Conclusion

Batch 16 successfully achieved the minimum 65% coverage goal for Brazilian heritage institutions, reaching 67.5% with 85 of 126 institutions now linked to Wikidata.

Key Achievements:

  • 6 institutions enriched with high-quality Wikidata identifiers
  • 1 new institution discovered and added (Museu Casa de Rui Barbosa)
  • +4.3 percentage point coverage improvement
  • Technical issue (SBM duplicate) identified and resolved
  • All enrichments verified with 1.0 match confidence

Decision Point: With 67.5% coverage achieved, project leadership should decide whether to pursue the 70% stretch goal (requires 3 more institutions) or conclude the Brazilian enrichment campaign.


Report Generated: November 11, 2025
Report Author: GLAM Data Extraction Project
Dataset Version: globalglam-20251111-batch16-fixed.yaml