glam/CURATION_STATUS.md
2025-11-19 23:25:22 +01:00

6.1 KiB

Brazilian GLAM Data Curation Status

Summary

Successfully created manually curated LinkML-compliant records for 12 major Brazilian GLAM institutions based on the comprehensive infrastructure report in conversation file:

  • 2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json

Output Files

data/instances/brazilian_institutions_curated.yaml

12 comprehensive records with full LinkML compliance:

National Libraries (2)

  1. Biblioteca Nacional do Brasil - Brazil's National Library with 9M+ items

    • 3 digital platforms (BNDigital, Hemeroteca Digital, Brasiliana Fotográfica)
    • 1.5M digitized works, 500K+ monthly visits
    • Founded 1810 by King João VI
    • Wikidata ID: Q1526131
  2. Biblioteca Brasiliana Guita e José Mindlin (USP) - University Brazilian studies library

    • 70,000-volume collection, 4,000+ digitized items
    • Focus on 16th-21st century Brazilian works
    • Wikidata ID: Q10373176

National Museums (4)

  1. MASP - Museu de Arte de São Paulo Assis Chateaubriand

    • 8,000+ artworks on Google Arts & Culture
    • Wikidata ID: Q861028
  2. Pinacoteca de São Paulo - Oldest São Paulo art museum

    • 10,000+ Brazilian artworks online (colonial-contemporary)
    • Digital preservation policies since 2017
    • Wikidata ID: Q1129738
  3. Museu Histórico Nacional - National Historical Museum

    • Uses Tainacan platform (IBRAM open-source system)
    • Wikidata ID: Q10326887
  4. Museu Paulista (USP) - University museum and research center

    • Publisher of Anais do Museu Paulista since 1922
    • Wikidata ID: Q1130511

National Archives (1)

  1. Arquivo Nacional - National Archive of Brazil
    • 560 TB digitized on 1.1 PB infrastructure
    • AN Digital program since 2003
    • Archivematica implementation (CONARQ standards)
    • Wikidata ID: Q10283879

Government Heritage Institutions (1)

  1. IBRAM - Instituto Brasileiro de Museus
    • Coordinates 30 federal museums, 20 libraries (300K items)
    • Developed Tainacan platform (20+ museums)
    • Manages MuseusBr platform
    • Founded 2009, launched library network 2025
    • Wikidata ID: Q10302917

State Archives (1)

  1. APESP - Arquivo Público do Estado de São Paulo
    • 25M+ textual documents, 3M iconographic items
    • 400K+ digitized images (DOPS, immigration records)
    • Wikidata ID: Q10405845

Research Centers (3)

  1. LARHUD (IBICT) - Digital Humanities Laboratory

    • Portuguese-language DH tool development
    • Wiki LARHUD encyclopedia
    • TADiRAH taxonomy adaptation
  2. UNICAMP Digital Humanities Center - 20 researchers

    • COVID-19 digital archiving project
    • "Digital Memory" theoretical frameworks
  3. UFRJ Digital Humanities Lab - Est. 2023

    • Pirenópolis Declaration implementation
    • Inter-institutional collaboration with UFBA

Data Quality Features

Complete LinkML Compliance

All records include:

  • Full provenance metadata (source, tier, confidence, extraction method)
  • Multiple identifiers (Website, Wikidata)
  • Digital platforms with metadata standards
  • Collection metadata (extent, subjects, temporal coverage)
  • Change history (founding dates, organizational changes)
  • Rich descriptions with quantitative metrics
  • Alternative names in multiple languages

Metadata Standards Documented

  • Dublin Core (widespread)
  • MARC21 (libraries)
  • EAD (archives)
  • PREMIS (digital preservation)
  • OAI-PMH (interoperability)
  • INBCM (Brazilian museum standard)

Confidence Scores

  • Range: 0.84 - 0.96
  • Average: 0.91
  • Methodology: Based on detail level in source artifact

Comparison with v2 Extraction

Previous v2 File

  • 104 basic records organized by state
  • Minimal metadata (name, type, region)
  • Some URLs and brief descriptions
  • Generic confidence scores (0.7-0.8)

New Curated File

  • 12 comprehensive records (major national institutions)
  • Full LinkML compliance with all optional fields
  • Rich descriptions with quantitative metrics
  • Digital platform documentation
  • Collection metadata
  • Historical change events
  • Higher confidence scores (0.84-0.96)

Source Conversation Analysis

Conversation Type

The conversation is NOT a state-by-state institutional directory (like Chilean/Mexican conversations).

Instead, it contains:

  • Comprehensive research report on Brazilian GLAM infrastructure
  • Information about R$18B in cultural funding
  • Platform analysis (BNDigital, Tainacan, Brasiliana Fotográfica)
  • Standards adoption analysis
  • Government programs and training ecosystems
  • Academic research output

Institutional Mention Frequency

Top institutions by mention count:

  • IBRAM: 217 mentions
  • Tainacan: 235 mentions
  • Pinacoteca: 60 mentions
  • Biblioteca Nacional: 14 mentions
  • Arquivo Nacional: 11 mentions
  • MASP: 16 mentions

Next Steps

Option 1: Web Scraping Enhancement

Use identified URLs to fetch additional institutional data:

Option 2: Merge with v2 Data

Cross-reference curated records with v2 state-level institutions:

  • Enrich v2 records with platform/standards data
  • Add digital collection URLs
  • Improve descriptions

Option 3: Expand Coverage

Continue manual curation for:

  • State museum systems (SEM-RS, COSEM Paraná)
  • University libraries (FGV, UNIRIO, UFPE)
  • Regional archives (Hemeroteca Digital Catarinense)
  • Professional networks (REM-BR, educator networks)

Files

Input

  • /Users/kempersc/apps/glam/2025-09-22T14-40-15-0102c00a-4c0a-4488-bdca-5dd9fb94c9c5-Brazilian_GLAM_collection_inventories.json (12,107 lines)

Output

  • /Users/kempersc/apps/glam/data/instances/brazilian_institutions_curated.yaml (12 comprehensive records)

Previous Work

  • /Users/kempersc/apps/glam/data/instances/brazilian_institutions_v2.yaml (104 basic records)

Date: 2025-11-06
Method: Manual comprehensive extraction following AGENTS.md guidelines
Compliance: LinkML schema v0.2.0 (modular)
Quality: TIER_4_INFERRED with high confidence scores (0.84-0.96)