glam/CURATION_STATUS.md
2025-11-19 23:25:22 +01:00

181 lines
6.1 KiB
Markdown

# Brazilian GLAM Data Curation Status
## Summary
Successfully created **manually curated LinkML-compliant records** for 12 major Brazilian GLAM institutions based on the comprehensive infrastructure report in conversation file:
- `2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json`
## Output Files
### ✅ `data/instances/brazilian_institutions_curated.yaml`
**12 comprehensive records** with full LinkML compliance:
#### National Libraries (2)
1. **Biblioteca Nacional do Brasil** - Brazil's National Library with 9M+ items
- 3 digital platforms (BNDigital, Hemeroteca Digital, Brasiliana Fotográfica)
- 1.5M digitized works, 500K+ monthly visits
- Founded 1810 by King João VI
- Wikidata ID: Q1526131
2. **Biblioteca Brasiliana Guita e José Mindlin (USP)** - University Brazilian studies library
- 70,000-volume collection, 4,000+ digitized items
- Focus on 16th-21st century Brazilian works
- Wikidata ID: Q10373176
#### National Museums (4)
3. **MASP** - Museu de Arte de São Paulo Assis Chateaubriand
- 8,000+ artworks on Google Arts & Culture
- Wikidata ID: Q861028
4. **Pinacoteca de São Paulo** - Oldest São Paulo art museum
- 10,000+ Brazilian artworks online (colonial-contemporary)
- Digital preservation policies since 2017
- Wikidata ID: Q1129738
5. **Museu Histórico Nacional** - National Historical Museum
- Uses Tainacan platform (IBRAM open-source system)
- Wikidata ID: Q10326887
6. **Museu Paulista (USP)** - University museum and research center
- Publisher of Anais do Museu Paulista since 1922
- Wikidata ID: Q1130511
#### National Archives (1)
7. **Arquivo Nacional** - National Archive of Brazil
- 560 TB digitized on 1.1 PB infrastructure
- AN Digital program since 2003
- Archivematica implementation (CONARQ standards)
- Wikidata ID: Q10283879
#### Government Heritage Institutions (1)
8. **IBRAM** - Instituto Brasileiro de Museus
- Coordinates 30 federal museums, 20 libraries (300K items)
- Developed Tainacan platform (20+ museums)
- Manages MuseusBr platform
- Founded 2009, launched library network 2025
- Wikidata ID: Q10302917
#### State Archives (1)
9. **APESP** - Arquivo Público do Estado de São Paulo
- 25M+ textual documents, 3M iconographic items
- 400K+ digitized images (DOPS, immigration records)
- Wikidata ID: Q10405845
#### Research Centers (3)
10. **LARHUD (IBICT)** - Digital Humanities Laboratory
- Portuguese-language DH tool development
- Wiki LARHUD encyclopedia
- TADiRAH taxonomy adaptation
11. **UNICAMP Digital Humanities Center** - 20 researchers
- COVID-19 digital archiving project
- "Digital Memory" theoretical frameworks
12. **UFRJ Digital Humanities Lab** - Est. 2023
- Pirenópolis Declaration implementation
- Inter-institutional collaboration with UFBA
## Data Quality Features
### Complete LinkML Compliance
All records include:
- ✅ Full provenance metadata (source, tier, confidence, extraction method)
- ✅ Multiple identifiers (Website, Wikidata)
- ✅ Digital platforms with metadata standards
- ✅ Collection metadata (extent, subjects, temporal coverage)
- ✅ Change history (founding dates, organizational changes)
- ✅ Rich descriptions with quantitative metrics
- ✅ Alternative names in multiple languages
### Metadata Standards Documented
- Dublin Core (widespread)
- MARC21 (libraries)
- EAD (archives)
- PREMIS (digital preservation)
- OAI-PMH (interoperability)
- INBCM (Brazilian museum standard)
### Confidence Scores
- Range: 0.84 - 0.96
- Average: 0.91
- Methodology: Based on detail level in source artifact
## Comparison with v2 Extraction
### Previous v2 File
- **104 basic records** organized by state
- Minimal metadata (name, type, region)
- Some URLs and brief descriptions
- Generic confidence scores (0.7-0.8)
### New Curated File
- **12 comprehensive records** (major national institutions)
- Full LinkML compliance with all optional fields
- Rich descriptions with quantitative metrics
- Digital platform documentation
- Collection metadata
- Historical change events
- Higher confidence scores (0.84-0.96)
## Source Conversation Analysis
### Conversation Type
The conversation is **NOT** a state-by-state institutional directory (like Chilean/Mexican conversations).
Instead, it contains:
- Comprehensive research report on Brazilian GLAM infrastructure
- Information about R$18B in cultural funding
- Platform analysis (BNDigital, Tainacan, Brasiliana Fotográfica)
- Standards adoption analysis
- Government programs and training ecosystems
- Academic research output
### Institutional Mention Frequency
Top institutions by mention count:
- IBRAM: 217 mentions
- Tainacan: 235 mentions
- Pinacoteca: 60 mentions
- Biblioteca Nacional: 14 mentions
- Arquivo Nacional: 11 mentions
- MASP: 16 mentions
## Next Steps
### Option 1: Web Scraping Enhancement
Use identified URLs to fetch additional institutional data:
- IBRAM: https://www.gov.br/museus
- Biblioteca Nacional: https://www.bn.gov.br
- BNDigital: https://bndigital.bn.br
- Pinacoteca: https://acervo.pinacoteca.org.br
- MASP: https://masp.org.br
### Option 2: Merge with v2 Data
Cross-reference curated records with v2 state-level institutions:
- Enrich v2 records with platform/standards data
- Add digital collection URLs
- Improve descriptions
### Option 3: Expand Coverage
Continue manual curation for:
- State museum systems (SEM-RS, COSEM Paraná)
- University libraries (FGV, UNIRIO, UFPE)
- Regional archives (Hemeroteca Digital Catarinense)
- Professional networks (REM-BR, educator networks)
## Files
### Input
- `/Users/kempersc/apps/glam/2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json` (12,107 lines)
### Output
- `/Users/kempersc/apps/glam/data/instances/brazilian_institutions_curated.yaml` (12 comprehensive records)
### Previous Work
- `/Users/kempersc/apps/glam/data/instances/brazilian_institutions_v2.yaml` (104 basic records)
---
**Date**: 2025-11-06
**Method**: Manual comprehensive extraction following AGENTS.md guidelines
**Compliance**: LinkML schema v0.2.0 (modular)
**Quality**: TIER_4_INFERRED with high confidence scores (0.84-0.96)