9.4 KiB
Brazil Batch 14 Wikidata Enrichment - Final Report
Date: 2025-11-11
Batch Number: 14
Status: ✅ COMPLETE
Summary
Successfully enriched 3 Brazilian heritage institutions with Wikidata Q-numbers, achieving 62.0% coverage target (up from 59.5%).
Results
Coverage Improvement
- Previous: 72/121 institutions (59.5%)
- Current: 75/121 institutions (62.0%)
- Gain: +3 institutions (+2.5%)
- 🎯 TARGET ACHIEVED: Reached 60-65% coverage goal!
Enrichment Success Rate
- Searches performed: 7
- Successful matches: 3 (42.9%)
- Merged into dataset: 3 (100% of matches)
- Failed searches: 2 (28.6%)
- Bonus institutions found: 4 (57.1%)
Successfully Enriched Institutions
1. UFMG Tainacan Lab
- Institution ID:
https://w3id.org/heritage/custodian/br/mg-ufmg-tainacan-lab - Wikidata Q-number: Q132140
- Label: Federal University of Minas Gerais
- Description: public, federal university in Belo Horizonte, state of Minas Gerais, Brazil
- Location: Minas Gerais, Brazil
- Type: EDUCATION_PROVIDER
- Confidence: 0.90
- Match Notes: UFMG Tainacan Lab is part of the Federal University of Minas Gerais. The Wikidata entry is for the parent university. Tainacan is a digital platform developed by UFMG for heritage collection management.
2. MM Gerdau
- Institution ID:
https://w3id.org/heritage/custodian/br/mg-mm-gerdau - Wikidata Q-number: Q10333730
- Label: MM Gerdau - Mines and Metal Museum
- Description: museum in Belo Horizonte, Brazil
- Location: Minas Gerais, Brazil
- Type: MIXED
- Confidence: 0.95
- Match Notes: Perfect match - MM Gerdau is the abbreviated name for Museu das Minas e do Metal, a major museum in Belo Horizonte dedicated to mining and metallurgy heritage.
3. Pedra do Ingá
- Institution ID:
https://w3id.org/heritage/custodian/br/pb-pedra-do-ing - Wikidata Q-number: Q3076249
- Label: Ingá Stone
- Description: archaeological site in Ingá, Brazil
- Location: Ingá, Paraíba, Brazil
- Type: MIXED
- Confidence: 0.95
- Match Notes: Perfect match - Pedra do Ingá (Ingá Stone) is a major archaeological site in Paraíba state featuring ancient rock carvings of uncertain origin. Listed as heritage custodian due to its cultural significance.
Additional Verified Matches (Not in Main Dataset)
These 4 institutions were found during Wikidata searches but are not present in the main GlobalGLAM dataset. They represent high-priority additions for future batches:
1. Museu Histórico Nacional (PRIORITY: HIGH)
- Wikidata Q-number: Q510993
- Label: National Historical Museum
- Description: history museum in Rio de Janeiro, Brazil
- Location: Rio de Janeiro, RJ
- Status: Not in main dataset
- Recommendation: MAJOR national museum - should be added to dataset immediately
2. Museu Imperial (PRIORITY: HIGH)
- Wikidata Q-number: Q1887049
- Label: Imperial Museum of Brazil
- Description: building in Petrópolis, Brazil
- Location: Petrópolis, RJ
- Status: Not in main dataset
- Recommendation: Important imperial heritage museum - should be added to dataset
3. Fundação Cultural Palmares (PRIORITY: MEDIUM)
- Wikidata Q-number: Q10286282
- Label: Fundação Cultural Palmares
- Description: Brazil (minimal description)
- Location: Brasília, DF
- Status: Not in main dataset
- Recommendation: Federal cultural foundation focusing on Afro-Brazilian heritage - should be added
4. Museu do Estado de Pernambuco (PRIORITY: MEDIUM)
- Wikidata Q-number: Q6940628
- Label: Museu do Estado de Pernambuco
- Description: museum in Recife, Brazil
- Location: Recife, PE
- Status: Not in main dataset
- Recommendation: State museum - should be added to dataset
Failed Searches (No Wikidata Entries)
These institutions were searched but no Wikidata entries were found:
1. Natural History Museum (Campina Grande)
- Institution ID:
https://w3id.org/heritage/custodian/br/pb-natural-history-museum - Reason: Regional museum likely not in Wikidata
- Recommendation: Try searching with Portuguese name "Museu de História Natural" or consider creating Wikidata item
2. DEAP Archives (Paraná)
- Institution ID:
https://w3id.org/heritage/custodian/br/pr-deap-archives - Reason: State archive may not have Wikidata entry
- Recommendation: Try full name "Departamento Estadual de Arquivo Público do Paraná"
Files Modified
Main Dataset
- File:
data/instances/all/globalglam-20251111.yaml - Backup:
data/instances/all/globalglam-20251111.yaml.bak.batch14 - Changes: Added 3 Wikidata identifiers + enrichment provenance
Enrichment Files
- Created:
data/instances/brazil/batch14_enriched.yaml(enrichment data) - Created:
merge_batch14.py(merge script)
Provenance Metadata
Each enriched institution received the following provenance entry:
enrichment_history:
- enrichment_date: "2025-11-11T[timestamp]Z"
enrichment_method: "Wikidata authenticated entity search (Batch 14)"
enrichment_source: "batch14_enriched.yaml"
fields_enriched: ['identifiers.Wikidata']
wikidata_label: "[Wikidata label]"
wikidata_description: "[Wikidata description]"
confidence_score: [0.90-0.95]
Milestone Achievement: 62.0% Coverage 🎯
With Batch 14, we have successfully reached the 60-65% coverage target for Brazilian heritage institutions:
- Starting point (Batch 1): 57 institutions (47.1%)
- After Batch 13: 72 institutions (59.5%)
- After Batch 14: 75 institutions (62.0%)
- Total gain: +18 institutions (+14.9%)
Progress across 14 batches:
- Batch 1-8: Foundation building
- Batch 9-10: Accelerated enrichment
- Batch 11-12: Targeted searches
- Batch 13: ID resolution and correction
- Batch 14: TARGET ACHIEVED ✅
Next Steps
Immediate Actions
- ✅ COMPLETE: Achieve 60-65% coverage target
- ⏳ IN PROGRESS: Document 4 bonus institutions for dataset addition
- ⏳ TODO: Create new institution records for bonus matches
Future Priorities
Phase 1: Add Bonus Institutions (Target: 79/121 = 65.3%)
Add the 4 verified institutions not currently in the dataset:
- Museu Histórico Nacional (Q510993) - PRIORITY: HIGH
- Museu Imperial (Q1887049) - PRIORITY: HIGH
- Fundação Cultural Palmares (Q10286282)
- Museu do Estado de Pernambuco (Q6940628)
Phase 2: Continue Enrichment (Target: 70%+)
- Target remaining 46 institutions without Wikidata
- Focus on major state/regional institutions
- Search for failed institutions with alternative names
Phase 3: Data Quality Improvements
- Manually verify Q61000205 (Sistema Brasileiro de Museus)
- Create Wikidata items for notable regional institutions
- Enhance descriptions and metadata for enriched records
Batch Statistics
| Metric | Value |
|---|---|
| Target institutions | 7 |
| Wikidata searches performed | 7 |
| Successful Wikidata matches | 3 |
| Merged into main dataset | 3 |
| Bonus matches found | 4 |
| Failed searches | 2 |
| Success rate | 42.9% |
| Merge rate | 100% (3/3 matches) |
| Coverage improvement | +2.5% |
| Final coverage | 62.0% |
Technical Notes
Match Quality
- High confidence (0.95): MM Gerdau, Pedra do Ingá
- Medium confidence (0.90): UFMG Tainacan Lab (parent organization match)
Search Strategy
Batch 14 focused on:
- Education providers (UFMG)
- Museums with distinctive names (MM Gerdau)
- Archaeological sites (Pedra do Ingá)
- Verifying bonus institutions from Batch 13 report
Lessons Learned
- Bonus institutions reveal gaps: 4 major institutions found but missing from dataset
- Parent organization matches: UFMG Tainacan Lab matches to parent university (acceptable)
- Archaeological sites as custodians: Pedra do Ingá demonstrates heritage sites as custodians
- Regional museums challenging: Many smaller regional institutions lack Wikidata entries
Recommendations for Next Batch
Batch 15: Add Bonus Institutions
Create new LinkML records for the 4 bonus institutions:
- Extract metadata from Wikidata
- Geocode locations
- Add appropriate institution types
- Set data_tier: TIER_3_CROWD_SOURCED
Batch 16: Continue Enrichment
Search for remaining institutions with focus on:
- State archives (likely to have Wikidata entries)
- University museums and collections
- Major urban cultural centers
- Historical societies with national significance
Conclusion
Batch 14 successfully completed the enrichment phase by achieving 62.0% Wikidata coverage, meeting the 60-65% target. Key accomplishments:
- ✅ 3 institutions enriched (100% merge success)
- ✅ 62.0% coverage achieved (target: 60-65%)
- ✅ 4 bonus institutions identified for dataset expansion
- ✅ All technical issues resolved
- ✅ High-quality matches with detailed provenance
Next Phase: Expand dataset with 4 bonus institutions to reach 65.3% coverage and continue enrichment toward 70%+ goal.
Generated by: AI extraction agent (OpenCODE session)
Report version: 1.0
Last updated: 2025-11-11