4.9 KiB
4.9 KiB
Brazilian GLAM Institution Curation Report - FINAL
Generated: 2025-11-06T08:26:33.367488+00:00
Executive Summary
Successfully curated 97 valid heritage institutions from Brazilian GLAM conversation data.
- Original v2 records: 104
- Filtered invalid records: 7 (platforms, non-institutions)
- Valid curated institutions: 97
- Data quality: Tier 4 (inferred from conversation NLP)
Quality Achievements
Completeness Metrics
| Field | Count | Percentage | Target | Status |
|---|---|---|---|---|
| Descriptions | 82 | 84.5% | 90%+ | ✓ Near |
| Website Identifiers | 9 | 9.3% | 80%+ | ✗ Limited source data |
| City-level Locations | 8 | 8.2% | 60%+ | ✗ Sparse in source |
| Change History (founding dates) | 7 | 7.2% | - | Baseline |
Data Source Analysis
The conversation JSON contained limited structured institutional metadata:
- Descriptions: ~91 institutions (91% coverage possible)
- URLs: ~9 institutions (9% coverage possible)
- City names: ~13 cities mentioned across all states
- Founding dates: ~7 institutions with explicit years
Conclusion: Enrichment achieved near-maximum extraction from available source data.
Records Filtered Out (Non-Institutions)
The following 7 records were removed as they represent platforms/technologies or invalid data:
- Tainacan - Collection management platform (WordPress-based)
- AtoM - Archival description software
- DSpace - Digital repository platform
- APIs - Generic technology reference
- LOCKSS Cariniana - Digital preservation network
- Population - Demographic statistic (Roraima indigenous population)
- Documentation - Too generic, not a specific organization
Valid Institutions Retained
97 heritage custodian organizations across all 27 Brazilian states, representing:
- Museums (MUSEUM, MIXED): Cultural, historical, natural history, specialized
- Libraries (LIBRARY): National, university, specialized
- Archives (ARCHIVE): State, municipal, institutional
- Research Centers (RESEARCH_CENTER): Archaeological, documentary, heritage
- Educational Providers (EDUCATION_PROVIDER): University repositories
- Official Institutions (OFFICIAL_INSTITUTION): State cultural foundations, heritage agencies
Geographic Coverage
- All 27 federative units represented
- State-level location data: 100% (all records)
- City-level location data: 8 institutions (8.2%)
- Cities identified: Maceió, Brasília, São Luís, Ouro Preto, Campina Grande, Teresina, Natal, Aracaju
Enrichment Methods Applied
- Automated parsing: Structured data extraction from conversation artifact
- Fuzzy name matching: Institution names matched to conversation metadata
- Pattern recognition: URLs, collection extents, founding dates
- Known entity matching: Brazilian city names from curated list
- Provenance tracking: All records tagged with data source, confidence, extraction method
Known Limitations
- Sparse URL data: Only 9% of institutions had website URLs in source conversation
- Limited geographic detail: Most institutions organized by state, not city
- Unverified data: Tier 4 (inferred) - requires validation against authoritative sources
- Missing digital platform details: Conversation focused on state-level infrastructure, not institution-specific systems
Recommendations
Immediate Actions
- Validate against IBRAM registry: Cross-reference with official Brazilian museum database
- Geocode institutions: Use state + institution name to lookup city locations via Nominatim
- Web scraping: Extract additional metadata from the 9 known website URLs
Future Enrichment
- Tier 2 data sources: Crawl institutional websites for collection details
- Tier 3 data sources: Integrate Wikidata Q-IDs and VIAF identifiers
- Platform identification: Map institutions to digital systems (Tainacan, DSpace, AtoM instances)
- Collection metadata: Extract subject areas, temporal coverage, access rights
Manual Review Needed
2 records flagged for verification:
- Brasiliana Museus: Classify as national aggregation platform vs. custodian
- Hemeroteca Digital: Determine if custodian or aggregation service
Files Generated
- Curated records:
data/instances/brazilian_institutions_curated_v2.yaml(97 institutions) - This report:
data/instances/brazilian_curation_report_final.md
Provenance
- Source conversation:
2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json - Extraction method: Automated parsing with pattern recognition
- Curation script:
curate_brazilian_institutions.pyv2.1 - Schema version: LinkML heritage_custodian v0.2.0 (modular)
- Data tier: TIER_4_INFERRED (conversation NLP)
Curator: Automated curation system
Date: 2025-11-06
Status: ✓ Baseline curation complete - ready for Tier 2/3 enrichment