229 lines
8 KiB
Markdown
229 lines
8 KiB
Markdown
# Brazil Batch 13 Wikidata Enrichment - Final Report
|
|
|
|
**Date:** 2025-11-11
|
|
**Batch Number:** 13
|
|
**Status:** ✅ COMPLETE
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
Successfully enriched 3 Brazilian heritage institutions with Wikidata Q-numbers, improving coverage from 57.0% to 59.5%.
|
|
|
|
---
|
|
|
|
## Results
|
|
|
|
### Coverage Improvement
|
|
- **Previous:** 69/121 institutions (57.0%)
|
|
- **Current:** 72/121 institutions (59.5%)
|
|
- **Gain:** +3 institutions (+2.5%)
|
|
|
|
### Enrichment Success Rate
|
|
- **Searches performed:** 12
|
|
- **Successful matches:** 9 (75%)
|
|
- **Merged into dataset:** 3
|
|
- **Failed searches:** 3 (25%)
|
|
|
|
---
|
|
|
|
## Successfully Enriched Institutions
|
|
|
|
### 1. UNIR (Universidade Federal de Rondônia)
|
|
- **Institution ID:** `3008281717687280329`
|
|
- **Wikidata Q-number:** [Q7894377](https://www.wikidata.org/wiki/Q7894377)
|
|
- **Label:** Federal University of Rondônia
|
|
- **Description:** Brazilian public university
|
|
- **Location:** Vilhena, Rondônia, Brazil
|
|
- **Type:** UNIVERSITY
|
|
- **Confidence:** 0.95
|
|
|
|
### 2. Secult Tocantins
|
|
- **Institution ID:** `709508309148680086`
|
|
- **Wikidata Q-number:** [Q108397863](https://www.wikidata.org/wiki/Q108397863)
|
|
- **Label:** Secretary of Culture of the State of Tocantins
|
|
- **Description:** State secretariat responsible for cultural related affairs in the state of Tocantins, Brazil
|
|
- **Location:** Tocantins, Brazil
|
|
- **Type:** OFFICIAL_INSTITUTION
|
|
- **Confidence:** 0.95
|
|
|
|
### 3. Instituto Histórico e Geográfico de Alagoas
|
|
- **Institution ID:** `2519599505258789521`
|
|
- **Wikidata Q-number:** [Q10302531](https://www.wikidata.org/wiki/Q10302531)
|
|
- **Label:** Instituto Histórico e Geográfico de Alagoas
|
|
- **Description:** Research institute and museum in Maceió, Brazil
|
|
- **Location:** Alagoas, Brazil
|
|
- **Type:** COLLECTING_SOCIETY
|
|
- **Confidence:** 0.95
|
|
|
|
---
|
|
|
|
## Additional Verified Matches (Not in Main Dataset)
|
|
|
|
These institutions were found during Wikidata searches but are **not present** in the main GlobalGLAM dataset. They represent potential additions for future batches:
|
|
|
|
### 1. Museu do Estado de Pernambuco
|
|
- **Wikidata Q-number:** [Q6940628](https://www.wikidata.org/wiki/Q6940628)
|
|
- **Label:** Museu do Estado de Pernambuco
|
|
- **Description:** Museum in Recife, Brazil
|
|
- **Status:** Not in main dataset - candidate for addition
|
|
|
|
### 2. Museu Histórico Nacional
|
|
- **Wikidata Q-number:** [Q510993](https://www.wikidata.org/wiki/Q510993)
|
|
- **Label:** National Historical Museum
|
|
- **Description:** History museum in Rio de Janeiro, Brazil
|
|
- **Status:** Not in main dataset - major national museum, should be added
|
|
|
|
### 3. Fundação Cultural Palmares
|
|
- **Wikidata Q-number:** [Q10286282](https://www.wikidata.org/wiki/Q10286282)
|
|
- **Label:** Fundação Cultural Palmares
|
|
- **Description:** Brazil (minimal description)
|
|
- **Status:** Not in main dataset - federal cultural foundation
|
|
|
|
### 4. Museu Imperial
|
|
- **Wikidata Q-number:** [Q1887049](https://www.wikidata.org/wiki/Q1887049)
|
|
- **Label:** Imperial Museum of Brazil
|
|
- **Description:** Building in Petrópolis, Brazil
|
|
- **Status:** Not in main dataset - imperial palace museum
|
|
|
|
---
|
|
|
|
## Failed Searches (No Wikidata Entries)
|
|
|
|
These institutions were searched but no Wikidata entries were found:
|
|
|
|
### 1. Fundação de Cultura Elias Mansour (Acre)
|
|
- **Institution ID:** `https://w3id.org/heritage/custodian/br/ac-funda-o-de-cultura-elias-mansour-fem`
|
|
- **Reason:** Regional/state foundation likely not in Wikidata
|
|
- **Recommendation:** Consider creating Wikidata item
|
|
|
|
### 2. Museu dos Povos Acreanos
|
|
- **Institution ID:** `https://w3id.org/heritage/custodian/br/ac-museu-dos-povos-acreanos`
|
|
- **Reason:** Recently opened (2023), may not be in Wikidata yet
|
|
- **Recommendation:** Monitor for future Wikidata addition
|
|
|
|
### 3. Museu Histórico de Alcântara (Maranhão)
|
|
- **Institution ID:** `https://w3id.org/heritage/custodian/br/mt-museu-hist-rico`
|
|
- **Reason:** Regional museum likely not in Wikidata
|
|
- **Recommendation:** Consider creating Wikidata item
|
|
|
|
---
|
|
|
|
## Suspicious Match (Requires Manual Review)
|
|
|
|
### Sistema Brasileiro de Museus (SBM)
|
|
- **Institution ID:** `https://w3id.org/heritage/custodian/br/sistema-brasileiro-de-museus-sbm`
|
|
- **Wikidata Q-number:** [Q61000205](https://www.wikidata.org/wiki/Q61000205)
|
|
- **Status:** Q-number returned but has no label/description
|
|
- **Issue:** Likely deleted or stub item in Wikidata
|
|
- **Action Required:** Manual verification - may need to create new Wikidata item
|
|
|
|
---
|
|
|
|
## Technical Issues Resolved
|
|
|
|
### ID Mismatch Problem
|
|
Initial enrichment file (`batch13_enriched.yaml`) had incorrect institution IDs:
|
|
- **Issue:** Used Q-numbers or numeric IDs instead of actual URL-format IDs
|
|
- **Example:** `Q108397863` instead of `709508309148680086`
|
|
- **Resolution:** Corrected IDs by searching main dataset for exact name matches
|
|
|
|
### Corrected IDs
|
|
| Institution | Original ID (Wrong) | Corrected ID | Status |
|
|
|-------------|---------------------|--------------|--------|
|
|
| Secult Tocantins | Q108397863 | 709508309148680086 | ✅ Fixed |
|
|
| UNIR | 3008281717687280329 | 3008281717687280329 | ✅ Correct |
|
|
| Instituto Histórico Alagoas | 2519599505258789521 | 2519599505258789521 | ✅ Correct |
|
|
|
|
---
|
|
|
|
## Files Modified
|
|
|
|
### Main Dataset
|
|
- **File:** `data/instances/all/globalglam-20251111.yaml`
|
|
- **Backup:** `data/instances/all/globalglam-20251111.yaml.bak.batch13`
|
|
- **Changes:** Added 3 Wikidata identifiers + enrichment provenance
|
|
|
|
### Enrichment Files
|
|
- **Corrected:** `data/instances/brazil/batch13_enriched.yaml` (fixed Secretaria Tocantins ID)
|
|
- **Created:** `merge_batch13_corrected.py` (merge script with corrected IDs)
|
|
|
|
---
|
|
|
|
## Provenance Metadata
|
|
|
|
Each enriched institution received the following provenance entry:
|
|
|
|
```yaml
|
|
enrichment_history:
|
|
- enrichment_date: "2025-11-11T[timestamp]Z"
|
|
enrichment_method: "Wikidata authenticated entity search (Batch 13)"
|
|
enrichment_source: "batch13_enriched.yaml"
|
|
fields_enriched: ['identifiers.Wikidata']
|
|
wikidata_label: "[Wikidata label]"
|
|
wikidata_description: "[Wikidata description]"
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
### Immediate Actions
|
|
1. ✅ **COMPLETE:** Merge 3 verified Q-numbers into main dataset
|
|
2. ✅ **COMPLETE:** Create final report (this document)
|
|
3. ⏳ **TODO:** Manually verify Q61000205 (Sistema Brasileiro de Museus)
|
|
|
|
### Future Batches (Batch 14+)
|
|
1. **Add 4 bonus institutions** found during searches (Museu Histórico Nacional, Museu Imperial, etc.)
|
|
2. **Create Wikidata items** for 3 failed searches (if institutions are notable)
|
|
3. **Continue enrichment** targeting 60-65% coverage (need +1-7 more institutions)
|
|
|
|
### Recommendations
|
|
- **Prioritize major museums:** Museu Histórico Nacional (Q510993) should be in dataset
|
|
- **Validate regional institutions:** Check if failed searches are actual heritage institutions
|
|
- **Investigate SBM Q-number:** Q61000205 needs manual Wikidata verification
|
|
|
|
---
|
|
|
|
## Batch Statistics
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Target institutions | 12 |
|
|
| Wikidata searches performed | 12 |
|
|
| Successful Wikidata matches | 9 |
|
|
| Merged into main dataset | 3 |
|
|
| Already had Q-numbers | 2 |
|
|
| Bonus matches found | 4 |
|
|
| Failed searches | 3 |
|
|
| Suspicious matches | 1 |
|
|
| Success rate | 75% |
|
|
| Merge rate | 25% (3/12) |
|
|
| Coverage improvement | +2.5% |
|
|
|
|
---
|
|
|
|
## Lessons Learned
|
|
|
|
1. **ID Verification Critical:** Always verify institution IDs by searching the main dataset before creating enrichment files
|
|
2. **Numeric IDs Valid:** Main dataset uses both URL-format and numeric IDs - both are valid
|
|
3. **Bonus Matches Value:** Finding institutions not in target list (4 bonus matches) helps identify missing entries
|
|
4. **Regional Institutions Gap:** Small regional museums often lack Wikidata entries - opportunity for contribution
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
Batch 13 successfully enriched 3 Brazilian institutions with Wikidata Q-numbers, achieving:
|
|
- ✅ 59.5% Wikidata coverage (up from 57.0%)
|
|
- ✅ 75% Wikidata search success rate
|
|
- ✅ 4 additional candidate institutions identified
|
|
- ✅ All technical ID issues resolved
|
|
|
|
**Status:** Ready for Batch 14 to continue toward 60-65% coverage target.
|
|
|
|
---
|
|
|
|
**Generated by:** AI extraction agent (OpenCODE session)
|
|
**Report version:** 1.0
|
|
**Last updated:** 2025-11-11
|