181 lines
6.1 KiB
Markdown
181 lines
6.1 KiB
Markdown
# Brazilian GLAM Data Curation Status
|
|
|
|
## Summary
|
|
|
|
Successfully created **manually curated LinkML-compliant records** for 12 major Brazilian GLAM institutions based on the comprehensive infrastructure report in conversation file:
|
|
- `2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json`
|
|
|
|
## Output Files
|
|
|
|
### ✅ `data/instances/brazilian_institutions_curated.yaml`
|
|
**12 comprehensive records** with full LinkML compliance:
|
|
|
|
#### National Libraries (2)
|
|
1. **Biblioteca Nacional do Brasil** - Brazil's National Library with 9M+ items
|
|
- 3 digital platforms (BNDigital, Hemeroteca Digital, Brasiliana Fotográfica)
|
|
- 1.5M digitized works, 500K+ monthly visits
|
|
- Founded 1810 by King João VI
|
|
- Wikidata ID: Q1526131
|
|
|
|
2. **Biblioteca Brasiliana Guita e José Mindlin (USP)** - University Brazilian studies library
|
|
- 70,000-volume collection, 4,000+ digitized items
|
|
- Focus on 16th-21st century Brazilian works
|
|
- Wikidata ID: Q10373176
|
|
|
|
#### National Museums (4)
|
|
3. **MASP** - Museu de Arte de São Paulo Assis Chateaubriand
|
|
- 8,000+ artworks on Google Arts & Culture
|
|
- Wikidata ID: Q861028
|
|
|
|
4. **Pinacoteca de São Paulo** - Oldest São Paulo art museum
|
|
- 10,000+ Brazilian artworks online (colonial-contemporary)
|
|
- Digital preservation policies since 2017
|
|
- Wikidata ID: Q1129738
|
|
|
|
5. **Museu Histórico Nacional** - National Historical Museum
|
|
- Uses Tainacan platform (IBRAM open-source system)
|
|
- Wikidata ID: Q10326887
|
|
|
|
6. **Museu Paulista (USP)** - University museum and research center
|
|
- Publisher of Anais do Museu Paulista since 1922
|
|
- Wikidata ID: Q1130511
|
|
|
|
#### National Archives (1)
|
|
7. **Arquivo Nacional** - National Archive of Brazil
|
|
- 560 TB digitized on 1.1 PB infrastructure
|
|
- AN Digital program since 2003
|
|
- Archivematica implementation (CONARQ standards)
|
|
- Wikidata ID: Q10283879
|
|
|
|
#### Government Heritage Institutions (1)
|
|
8. **IBRAM** - Instituto Brasileiro de Museus
|
|
- Coordinates 30 federal museums, 20 libraries (300K items)
|
|
- Developed Tainacan platform (20+ museums)
|
|
- Manages MuseusBr platform
|
|
- Founded 2009, launched library network 2025
|
|
- Wikidata ID: Q10302917
|
|
|
|
#### State Archives (1)
|
|
9. **APESP** - Arquivo Público do Estado de São Paulo
|
|
- 25M+ textual documents, 3M iconographic items
|
|
- 400K+ digitized images (DOPS, immigration records)
|
|
- Wikidata ID: Q10405845
|
|
|
|
#### Research Centers (3)
|
|
10. **LARHUD (IBICT)** - Digital Humanities Laboratory
|
|
- Portuguese-language DH tool development
|
|
- Wiki LARHUD encyclopedia
|
|
- TADiRAH taxonomy adaptation
|
|
|
|
11. **UNICAMP Digital Humanities Center** - 20 researchers
|
|
- COVID-19 digital archiving project
|
|
- "Digital Memory" theoretical frameworks
|
|
|
|
12. **UFRJ Digital Humanities Lab** - Est. 2023
|
|
- Pirenópolis Declaration implementation
|
|
- Inter-institutional collaboration with UFBA
|
|
|
|
## Data Quality Features
|
|
|
|
### Complete LinkML Compliance
|
|
All records include:
|
|
- ✅ Full provenance metadata (source, tier, confidence, extraction method)
|
|
- ✅ Multiple identifiers (Website, Wikidata)
|
|
- ✅ Digital platforms with metadata standards
|
|
- ✅ Collection metadata (extent, subjects, temporal coverage)
|
|
- ✅ Change history (founding dates, organizational changes)
|
|
- ✅ Rich descriptions with quantitative metrics
|
|
- ✅ Alternative names in multiple languages
|
|
|
|
### Metadata Standards Documented
|
|
- Dublin Core (widespread)
|
|
- MARC21 (libraries)
|
|
- EAD (archives)
|
|
- PREMIS (digital preservation)
|
|
- OAI-PMH (interoperability)
|
|
- INBCM (Brazilian museum standard)
|
|
|
|
### Confidence Scores
|
|
- Range: 0.84 - 0.96
|
|
- Average: 0.91
|
|
- Methodology: Based on detail level in source artifact
|
|
|
|
## Comparison with v2 Extraction
|
|
|
|
### Previous v2 File
|
|
- **104 basic records** organized by state
|
|
- Minimal metadata (name, type, region)
|
|
- Some URLs and brief descriptions
|
|
- Generic confidence scores (0.7-0.8)
|
|
|
|
### New Curated File
|
|
- **12 comprehensive records** (major national institutions)
|
|
- Full LinkML compliance with all optional fields
|
|
- Rich descriptions with quantitative metrics
|
|
- Digital platform documentation
|
|
- Collection metadata
|
|
- Historical change events
|
|
- Higher confidence scores (0.84-0.96)
|
|
|
|
## Source Conversation Analysis
|
|
|
|
### Conversation Type
|
|
The conversation is **NOT** a state-by-state institutional directory (like Chilean/Mexican conversations).
|
|
|
|
Instead, it contains:
|
|
- Comprehensive research report on Brazilian GLAM infrastructure
|
|
- Information about R$18B in cultural funding
|
|
- Platform analysis (BNDigital, Tainacan, Brasiliana Fotográfica)
|
|
- Standards adoption analysis
|
|
- Government programs and training ecosystems
|
|
- Academic research output
|
|
|
|
### Institutional Mention Frequency
|
|
Top institutions by mention count:
|
|
- IBRAM: 217 mentions
|
|
- Tainacan: 235 mentions
|
|
- Pinacoteca: 60 mentions
|
|
- Biblioteca Nacional: 14 mentions
|
|
- Arquivo Nacional: 11 mentions
|
|
- MASP: 16 mentions
|
|
|
|
## Next Steps
|
|
|
|
### Option 1: Web Scraping Enhancement
|
|
Use identified URLs to fetch additional institutional data:
|
|
- IBRAM: https://www.gov.br/museus
|
|
- Biblioteca Nacional: https://www.bn.gov.br
|
|
- BNDigital: https://bndigital.bn.br
|
|
- Pinacoteca: https://acervo.pinacoteca.org.br
|
|
- MASP: https://masp.org.br
|
|
|
|
### Option 2: Merge with v2 Data
|
|
Cross-reference curated records with v2 state-level institutions:
|
|
- Enrich v2 records with platform/standards data
|
|
- Add digital collection URLs
|
|
- Improve descriptions
|
|
|
|
### Option 3: Expand Coverage
|
|
Continue manual curation for:
|
|
- State museum systems (SEM-RS, COSEM Paraná)
|
|
- University libraries (FGV, UNIRIO, UFPE)
|
|
- Regional archives (Hemeroteca Digital Catarinense)
|
|
- Professional networks (REM-BR, educator networks)
|
|
|
|
## Files
|
|
|
|
### Input
|
|
- `/Users/kempersc/apps/glam/2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json` (12,107 lines)
|
|
|
|
### Output
|
|
- `/Users/kempersc/apps/glam/data/instances/brazilian_institutions_curated.yaml` (12 comprehensive records)
|
|
|
|
### Previous Work
|
|
- `/Users/kempersc/apps/glam/data/instances/brazilian_institutions_v2.yaml` (104 basic records)
|
|
|
|
---
|
|
|
|
**Date**: 2025-11-06
|
|
**Method**: Manual comprehensive extraction following AGENTS.md guidelines
|
|
**Compliance**: LinkML schema v0.2.0 (modular)
|
|
**Quality**: TIER_4_INFERRED with high confidence scores (0.84-0.96)
|