6.1 KiB
Brazilian GLAM Data Curation Status
Summary
Successfully created manually curated LinkML-compliant records for 12 major Brazilian GLAM institutions based on the comprehensive infrastructure report in conversation file:
2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json
Output Files
✅ data/instances/brazilian_institutions_curated.yaml
12 comprehensive records with full LinkML compliance:
National Libraries (2)
-
Biblioteca Nacional do Brasil - Brazil's National Library with 9M+ items
- 3 digital platforms (BNDigital, Hemeroteca Digital, Brasiliana Fotográfica)
- 1.5M digitized works, 500K+ monthly visits
- Founded 1810 by King João VI
- Wikidata ID: Q1526131
-
Biblioteca Brasiliana Guita e José Mindlin (USP) - University Brazilian studies library
- 70,000-volume collection, 4,000+ digitized items
- Focus on 16th-21st century Brazilian works
- Wikidata ID: Q10373176
National Museums (4)
-
MASP - Museu de Arte de São Paulo Assis Chateaubriand
- 8,000+ artworks on Google Arts & Culture
- Wikidata ID: Q861028
-
Pinacoteca de São Paulo - Oldest São Paulo art museum
- 10,000+ Brazilian artworks online (colonial-contemporary)
- Digital preservation policies since 2017
- Wikidata ID: Q1129738
-
Museu Histórico Nacional - National Historical Museum
- Uses Tainacan platform (IBRAM open-source system)
- Wikidata ID: Q10326887
-
Museu Paulista (USP) - University museum and research center
- Publisher of Anais do Museu Paulista since 1922
- Wikidata ID: Q1130511
National Archives (1)
- Arquivo Nacional - National Archive of Brazil
- 560 TB digitized on 1.1 PB infrastructure
- AN Digital program since 2003
- Archivematica implementation (CONARQ standards)
- Wikidata ID: Q10283879
Government Heritage Institutions (1)
- IBRAM - Instituto Brasileiro de Museus
- Coordinates 30 federal museums, 20 libraries (300K items)
- Developed Tainacan platform (20+ museums)
- Manages MuseusBr platform
- Founded 2009, launched library network 2025
- Wikidata ID: Q10302917
State Archives (1)
- APESP - Arquivo Público do Estado de São Paulo
- 25M+ textual documents, 3M iconographic items
- 400K+ digitized images (DOPS, immigration records)
- Wikidata ID: Q10405845
Research Centers (3)
-
LARHUD (IBICT) - Digital Humanities Laboratory
- Portuguese-language DH tool development
- Wiki LARHUD encyclopedia
- TADiRAH taxonomy adaptation
-
UNICAMP Digital Humanities Center - 20 researchers
- COVID-19 digital archiving project
- "Digital Memory" theoretical frameworks
-
UFRJ Digital Humanities Lab - Est. 2023
- Pirenópolis Declaration implementation
- Inter-institutional collaboration with UFBA
Data Quality Features
Complete LinkML Compliance
All records include:
- ✅ Full provenance metadata (source, tier, confidence, extraction method)
- ✅ Multiple identifiers (Website, Wikidata)
- ✅ Digital platforms with metadata standards
- ✅ Collection metadata (extent, subjects, temporal coverage)
- ✅ Change history (founding dates, organizational changes)
- ✅ Rich descriptions with quantitative metrics
- ✅ Alternative names in multiple languages
Metadata Standards Documented
- Dublin Core (widespread)
- MARC21 (libraries)
- EAD (archives)
- PREMIS (digital preservation)
- OAI-PMH (interoperability)
- INBCM (Brazilian museum standard)
Confidence Scores
- Range: 0.84 - 0.96
- Average: 0.91
- Methodology: Based on detail level in source artifact
Comparison with v2 Extraction
Previous v2 File
- 104 basic records organized by state
- Minimal metadata (name, type, region)
- Some URLs and brief descriptions
- Generic confidence scores (0.7-0.8)
New Curated File
- 12 comprehensive records (major national institutions)
- Full LinkML compliance with all optional fields
- Rich descriptions with quantitative metrics
- Digital platform documentation
- Collection metadata
- Historical change events
- Higher confidence scores (0.84-0.96)
Source Conversation Analysis
Conversation Type
The conversation is NOT a state-by-state institutional directory (like Chilean/Mexican conversations).
Instead, it contains:
- Comprehensive research report on Brazilian GLAM infrastructure
- Information about R$18B in cultural funding
- Platform analysis (BNDigital, Tainacan, Brasiliana Fotográfica)
- Standards adoption analysis
- Government programs and training ecosystems
- Academic research output
Institutional Mention Frequency
Top institutions by mention count:
- IBRAM: 217 mentions
- Tainacan: 235 mentions
- Pinacoteca: 60 mentions
- Biblioteca Nacional: 14 mentions
- Arquivo Nacional: 11 mentions
- MASP: 16 mentions
Next Steps
Option 1: Web Scraping Enhancement
Use identified URLs to fetch additional institutional data:
- IBRAM: https://www.gov.br/museus
- Biblioteca Nacional: https://www.bn.gov.br
- BNDigital: https://bndigital.bn.br
- Pinacoteca: https://acervo.pinacoteca.org.br
- MASP: https://masp.org.br
Option 2: Merge with v2 Data
Cross-reference curated records with v2 state-level institutions:
- Enrich v2 records with platform/standards data
- Add digital collection URLs
- Improve descriptions
Option 3: Expand Coverage
Continue manual curation for:
- State museum systems (SEM-RS, COSEM Paraná)
- University libraries (FGV, UNIRIO, UFPE)
- Regional archives (Hemeroteca Digital Catarinense)
- Professional networks (REM-BR, educator networks)
Files
Input
/Users/kempersc/apps/glam/2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json(12,107 lines)
Output
/Users/kempersc/apps/glam/data/instances/brazilian_institutions_curated.yaml(12 comprehensive records)
Previous Work
/Users/kempersc/apps/glam/data/instances/brazilian_institutions_v2.yaml(104 basic records)
Date: 2025-11-06
Method: Manual comprehensive extraction following AGENTS.md guidelines
Compliance: LinkML schema v0.2.0 (modular)
Quality: TIER_4_INFERRED with high confidence scores (0.84-0.96)