# Brazilian GLAM Data Curation Status ## Summary Successfully created **manually curated LinkML-compliant records** for 12 major Brazilian GLAM institutions based on the comprehensive infrastructure report in conversation file: - `2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json` ## Output Files ### ✅ `data/instances/brazilian_institutions_curated.yaml` **12 comprehensive records** with full LinkML compliance: #### National Libraries (2) 1. **Biblioteca Nacional do Brasil** - Brazil's National Library with 9M+ items - 3 digital platforms (BNDigital, Hemeroteca Digital, Brasiliana Fotográfica) - 1.5M digitized works, 500K+ monthly visits - Founded 1810 by King João VI - Wikidata ID: Q1526131 2. **Biblioteca Brasiliana Guita e José Mindlin (USP)** - University Brazilian studies library - 70,000-volume collection, 4,000+ digitized items - Focus on 16th-21st century Brazilian works - Wikidata ID: Q10373176 #### National Museums (4) 3. **MASP** - Museu de Arte de São Paulo Assis Chateaubriand - 8,000+ artworks on Google Arts & Culture - Wikidata ID: Q861028 4. **Pinacoteca de São Paulo** - Oldest São Paulo art museum - 10,000+ Brazilian artworks online (colonial-contemporary) - Digital preservation policies since 2017 - Wikidata ID: Q1129738 5. **Museu Histórico Nacional** - National Historical Museum - Uses Tainacan platform (IBRAM open-source system) - Wikidata ID: Q10326887 6. **Museu Paulista (USP)** - University museum and research center - Publisher of Anais do Museu Paulista since 1922 - Wikidata ID: Q1130511 #### National Archives (1) 7. **Arquivo Nacional** - National Archive of Brazil - 560 TB digitized on 1.1 PB infrastructure - AN Digital program since 2003 - Archivematica implementation (CONARQ standards) - Wikidata ID: Q10283879 #### Government Heritage Institutions (1) 8. **IBRAM** - Instituto Brasileiro de Museus - Coordinates 30 federal museums, 20 libraries (300K items) - Developed Tainacan platform (20+ museums) - Manages MuseusBr platform - Founded 2009, launched library network 2025 - Wikidata ID: Q10302917 #### State Archives (1) 9. **APESP** - Arquivo Público do Estado de São Paulo - 25M+ textual documents, 3M iconographic items - 400K+ digitized images (DOPS, immigration records) - Wikidata ID: Q10405845 #### Research Centers (3) 10. **LARHUD (IBICT)** - Digital Humanities Laboratory - Portuguese-language DH tool development - Wiki LARHUD encyclopedia - TADiRAH taxonomy adaptation 11. **UNICAMP Digital Humanities Center** - 20 researchers - COVID-19 digital archiving project - "Digital Memory" theoretical frameworks 12. **UFRJ Digital Humanities Lab** - Est. 2023 - Pirenópolis Declaration implementation - Inter-institutional collaboration with UFBA ## Data Quality Features ### Complete LinkML Compliance All records include: - ✅ Full provenance metadata (source, tier, confidence, extraction method) - ✅ Multiple identifiers (Website, Wikidata) - ✅ Digital platforms with metadata standards - ✅ Collection metadata (extent, subjects, temporal coverage) - ✅ Change history (founding dates, organizational changes) - ✅ Rich descriptions with quantitative metrics - ✅ Alternative names in multiple languages ### Metadata Standards Documented - Dublin Core (widespread) - MARC21 (libraries) - EAD (archives) - PREMIS (digital preservation) - OAI-PMH (interoperability) - INBCM (Brazilian museum standard) ### Confidence Scores - Range: 0.84 - 0.96 - Average: 0.91 - Methodology: Based on detail level in source artifact ## Comparison with v2 Extraction ### Previous v2 File - **104 basic records** organized by state - Minimal metadata (name, type, region) - Some URLs and brief descriptions - Generic confidence scores (0.7-0.8) ### New Curated File - **12 comprehensive records** (major national institutions) - Full LinkML compliance with all optional fields - Rich descriptions with quantitative metrics - Digital platform documentation - Collection metadata - Historical change events - Higher confidence scores (0.84-0.96) ## Source Conversation Analysis ### Conversation Type The conversation is **NOT** a state-by-state institutional directory (like Chilean/Mexican conversations). Instead, it contains: - Comprehensive research report on Brazilian GLAM infrastructure - Information about R$18B in cultural funding - Platform analysis (BNDigital, Tainacan, Brasiliana Fotográfica) - Standards adoption analysis - Government programs and training ecosystems - Academic research output ### Institutional Mention Frequency Top institutions by mention count: - IBRAM: 217 mentions - Tainacan: 235 mentions - Pinacoteca: 60 mentions - Biblioteca Nacional: 14 mentions - Arquivo Nacional: 11 mentions - MASP: 16 mentions ## Next Steps ### Option 1: Web Scraping Enhancement Use identified URLs to fetch additional institutional data: - IBRAM: https://www.gov.br/museus - Biblioteca Nacional: https://www.bn.gov.br - BNDigital: https://bndigital.bn.br - Pinacoteca: https://acervo.pinacoteca.org.br - MASP: https://masp.org.br ### Option 2: Merge with v2 Data Cross-reference curated records with v2 state-level institutions: - Enrich v2 records with platform/standards data - Add digital collection URLs - Improve descriptions ### Option 3: Expand Coverage Continue manual curation for: - State museum systems (SEM-RS, COSEM Paraná) - University libraries (FGV, UNIRIO, UFPE) - Regional archives (Hemeroteca Digital Catarinense) - Professional networks (REM-BR, educator networks) ## Files ### Input - `/Users/kempersc/apps/glam/2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json` (12,107 lines) ### Output - `/Users/kempersc/apps/glam/data/instances/brazilian_institutions_curated.yaml` (12 comprehensive records) ### Previous Work - `/Users/kempersc/apps/glam/data/instances/brazilian_institutions_v2.yaml` (104 basic records) --- **Date**: 2025-11-06 **Method**: Manual comprehensive extraction following AGENTS.md guidelines **Compliance**: LinkML schema v0.2.0 (modular) **Quality**: TIER_4_INFERRED with high confidence scores (0.84-0.96)