glam/data/instances
2025-11-19 23:25:22 +01:00
..
algeria add isil entries 2025-11-19 23:25:22 +01:00
all add isil entries 2025-11-19 23:25:22 +01:00
archive add isil entries 2025-11-19 23:25:22 +01:00
argentina add isil entries 2025-11-19 23:25:22 +01:00
backups add isil entries 2025-11-19 23:25:22 +01:00
belgium add isil entries 2025-11-19 23:25:22 +01:00
brazil add isil entries 2025-11-19 23:25:22 +01:00
canada add isil entries 2025-11-19 23:25:22 +01:00
chile add isil entries 2025-11-19 23:25:22 +01:00
conferences add isil entries 2025-11-19 23:25:22 +01:00
georgia add isil entries 2025-11-19 23:25:22 +01:00
global add isil entries 2025-11-19 23:25:22 +01:00
great_britain add isil entries 2025-11-19 23:25:22 +01:00
italy add isil entries 2025-11-19 23:25:22 +01:00
japan add isil entries 2025-11-19 23:25:22 +01:00
journals add isil entries 2025-11-19 23:25:22 +01:00
libya add isil entries 2025-11-19 23:25:22 +01:00
luxembourg add isil entries 2025-11-19 23:25:22 +01:00
mexico add isil entries 2025-11-19 23:25:22 +01:00
morocco add isil entries 2025-11-19 23:25:22 +01:00
netherlands add isil entries 2025-11-19 23:25:22 +01:00
north_africa add isil entries 2025-11-19 23:25:22 +01:00
norway add isil entries 2025-11-19 23:25:22 +01:00
publications add isil entries 2025-11-19 23:25:22 +01:00
test_outputs add isil entries 2025-11-19 23:25:22 +01:00
tunisia add isil entries 2025-11-19 23:25:22 +01:00
united_states add isil entries 2025-11-19 23:25:22 +01:00
argentina_complete.yaml add isil entries 2025-11-19 23:25:22 +01:00
argentina_conabip_raw.yaml add isil entries 2025-11-19 23:25:22 +01:00
austria_complete.yaml add isil entries 2025-11-19 23:25:22 +01:00
austria_isil.yaml add isil entries 2025-11-19 23:25:22 +01:00
belarus_complete.yaml add isil entries 2025-11-19 23:25:22 +01:00
belarus_isil_enriched.yaml add isil entries 2025-11-19 23:25:22 +01:00
belgium_complete.yaml add isil entries 2025-11-19 23:25:22 +01:00
belgium_isil.yaml add isil entries 2025-11-19 23:25:22 +01:00
belgium_isil_institutions.yaml add isil entries 2025-11-19 23:25:22 +01:00
belgium_isil_institutions_enriched.yaml add isil entries 2025-11-19 23:25:22 +01:00
belgium_isil_institutions_wikidata.yaml add isil entries 2025-11-19 23:25:22 +01:00
bulgaria_complete.yaml add isil entries 2025-11-19 23:25:22 +01:00
bulgaria_isil_libraries.yaml add isil entries 2025-11-19 23:25:22 +01:00
bulgaria_isil_libraries_enriched.yaml add isil entries 2025-11-19 23:25:22 +01:00
conversations_extracted.yaml add isil entries 2025-11-19 23:25:22 +01:00
czech_archives_aron.yaml add isil entries 2025-11-19 23:25:22 +01:00
czech_sample.yaml add isil entries 2025-11-19 23:25:22 +01:00
denmark_archives.json add isil entries 2025-11-19 23:25:22 +01:00
denmark_libraries.json add isil entries 2025-11-19 23:25:22 +01:00
denmark_libraries.yaml add isil entries 2025-11-19 23:25:22 +01:00
denmark_libraries_v2.json add isil entries 2025-11-19 23:25:22 +01:00
denmark_sample.json add isil entries 2025-11-19 23:25:22 +01:00
egypt_institutions.yaml add isil entries 2025-11-19 23:25:22 +01:00
egypt_institutions_final_enriched.yaml add isil entries 2025-11-19 23:25:22 +01:00
egypt_institutions_ghcid.yaml add isil entries 2025-11-19 23:25:22 +01:00
egypt_institutions_viaf_enriched.yaml add isil entries 2025-11-19 23:25:22 +01:00
egypt_institutions_wikidata_corrected.yaml add isil entries 2025-11-19 23:25:22 +01:00
egypt_institutions_wikidata_enriched.yaml add isil entries 2025-11-19 23:25:22 +01:00
egypt_institutions_wikidata_viaf.yaml add isil entries 2025-11-19 23:25:22 +01:00
egypt_step1.yaml add isil entries 2025-11-19 23:25:22 +01:00
egypt_step1_2.yaml add isil entries 2025-11-19 23:25:22 +01:00
egypt_step3.yaml add isil entries 2025-11-19 23:25:22 +01:00
egypt_step4.yaml add isil entries 2025-11-19 23:25:22 +01:00
eu_institutions.yaml add isil entries 2025-11-19 23:25:22 +01:00
georgia_glam_institutions.yaml add isil entries 2025-11-19 23:25:22 +01:00
georgia_glam_institutions_enriched.pre_enrichment_backfill_20251111_100230.yaml add isil entries 2025-11-19 23:25:22 +01:00
georgia_glam_institutions_enriched.yaml add isil entries 2025-11-19 23:25:22 +01:00
historical_institutions_validation.yaml add isil entries 2025-11-19 23:25:22 +01:00
japan_archives.yaml add isil entries 2025-11-19 23:25:22 +01:00
latin_american_institutions_AUTHORITATIVE.backup_20251106_124619.yaml add isil entries 2025-11-19 23:25:22 +01:00
latin_american_institutions_AUTHORITATIVE.pre_enrichment_backfill_20251111_100229.yaml add isil entries 2025-11-19 23:25:22 +01:00
latin_american_institutions_AUTHORITATIVE.yaml add isil entries 2025-11-19 23:25:22 +01:00
libya_heritage_institutions_extracted.json add isil entries 2025-11-19 23:25:22 +01:00
libya_historic_buildings_museums_batch4.json add isil entries 2025-11-19 23:25:22 +01:00
libya_museums_batch2.json add isil entries 2025-11-19 23:25:22 +01:00
libya_sites_digital_manuscripts_batch3.json add isil entries 2025-11-19 23:25:22 +01:00
libya_universities_batch1.json add isil entries 2025-11-19 23:25:22 +01:00
netherlands_complete.yaml add isil entries 2025-11-19 23:25:22 +01:00
netherlands_isil_raw.yaml add isil entries 2025-11-19 23:25:22 +01:00
README.md add isil entries 2025-11-19 23:25:22 +01:00
vietnamese_glam_institutions.yaml add isil entries 2025-11-19 23:25:22 +01:00
vietnamese_institutions_extracted.json add isil entries 2025-11-19 23:25:22 +01:00

GLAM Instance Data - Authoritative Files

Last Updated: 2025-11-06
Status: Consolidated and Archived

Authoritative Dataset

Latin American GLAM Institutions (Brazil, Chile, Mexico)

File: latin_american_institutions_AUTHORITATIVE.yaml

  • Total Institutions: 304
    • Brazil: 97
    • Chile: 90
    • Mexico: 117
  • Data Tier: TIER_4_INFERRED (conversation NLP extraction)
  • Enrichments Applied:
    • Wikidata IDs: 56 institutions (18.4%)
    • VIAF IDs: 19 institutions (6.3%) - API unavailable, IDs preserved
    • OpenStreetMap data: 83 institutions (27.3%)
    • Geocoding: 187 institutions (61.5%)
    • ISIL Gap Documentation: All 304 institutions
  • File Size: 470 KB
  • Schema Version: LinkML v0.2.0 (modular)
  • Last Enrichment: 2025-11-06 (OpenStreetMap enrichment)

Enrichment Details:

Enrichment Type Count Examples
Street addresses 33 "Avenida Feliciano Coelho 1502"
Contact info 19 Phone numbers, email addresses
Websites 16 Institutional URLs from OSM
Alternative names 13 Multilingual, official names
Opening hours 10 OSM opening_hours format

Use This File For:

  • Production data pipelines
  • Export generation (JSON-LD, CSV, GeoJSON)
  • Geographic visualization
  • Cross-linking with other datasets
  • Schema validation
  • Research and analysis

Archived Files

All superseded files have been archived to maintain data provenance and enable rollback if needed.

Archive Location

archive/2025-11-06_pre-consolidation/

Archive Structure

archive/2025-11-06_pre-consolidation/
├── intermediate_versions/           # Enrichment pipeline stages
│   ├── latin_american_institutions.yaml                    # Original combined (313 KB)
│   ├── latin_american_institutions_documented.yaml         # + ISIL gap notes (444 KB)
│   ├── latin_american_institutions_enriched.yaml           # + Wikidata (329 KB)
│   ├── latin_american_institutions_viaf_enriched.yaml      # + VIAF IDs (446 KB)
│   └── latin_american_institutions_osm_enriched.yaml       # + OSM data (470 KB) ← SOURCE OF AUTHORITATIVE
├── individual_countries/            # Pre-combination country files
│   ├── brazilian_institutions.yaml  # 97 institutions (84 KB)
│   ├── chilean_institutions.yaml    # 90 institutions (107 KB)
│   └── mexican_institutions.yaml    # 117 institutions (122 KB)
├── backup_files/                    # Temporary backup files
│   ├── mexican_institutions.yaml.bak
│   └── mexican_institutions.yaml.bak2
├── latin_american_combination_report.md  # Country combination report
└── latin_american_validation_report.md   # Validation report

Enrichment Pipeline History

The authoritative file represents the final stage of a 5-phase enrichment pipeline:

  1. Phase 1: Wikidata Enrichment (2025-11-06)

    • Script: scripts/enrich_from_wikidata.py
    • Result: 56 Wikidata IDs added
    • Output: latin_american_institutions_enriched.yaml
  2. Phase 2: ISIL Gap Documentation (2025-11-06)

    • Script: scripts/add_isil_gap_notes.py
    • Result: All 304 institutions documented
    • Output: latin_american_institutions_documented.yaml
  3. Phase 3: National Library Outreach (2025-11-06)

    • Script: scripts/draft_national_library_emails.py
    • Result: 3 bilingual emails drafted
    • Documentation: docs/national_library_outreach_emails.md
  4. Phase 4: VIAF Enrichment (2025-11-06) BLOCKED

    • Script: scripts/enrich_from_viaf.py
    • Status: VIAF XML/JSON API returns HTTP 404
    • Result: 19 existing VIAF IDs preserved
    • Output: latin_american_institutions_viaf_enriched.yaml
  5. Phase 5: OpenStreetMap Enrichment (2025-11-06)

    • Scripts:
      • scripts/enrich_from_osm_batched.py
      • scripts/resume_osm_enrichment.py
    • Result: 83 institutions enriched with OSM data
    • Output: latin_american_institutions_osm_enriched.yamlAUTHORITATIVE

See PROGRESS.md for detailed enrichment statistics and docs/osm_enrichment_report.md for Phase 5 analysis.

Export Files

All exports are generated from the authoritative file.

Location: exports/

Generated Files:

  1. latin_american_institutions_osm_enriched.jsonld (576 KB) - Linked Data format
  2. latin_american_institutions_osm_enriched.csv (113 KB) - Spreadsheet format
  3. latin_american_institutions_osm_enriched.geojson (124 KB) - Geographic format (187 institutions)
  4. latin_american_osm_enriched_statistics.json (0.9 KB) - Summary statistics

Export Script: scripts/export_latin_american_datasets.py

Other Directories

brazil/, chile/, mexico/

Individual country extraction workspaces. Superseded by consolidated file.

cache/

Geocoding and API response caches. Used for performance optimization.

reports/

Validation reports, quality checks, and analysis documents.

test_outputs/

Development and testing outputs. Not for production use.

backups/

Timestamped backup archives from previous processing stages:

  • 2025-11-06_pre-geocoding.tar.gz
  • 2025-11-06_chilean-geocoded-v2.tar.gz
  • 2025-11-06_mexican-geocoded-final.tar.gz
  • etc.

Data Quality Notes

Known Limitations

  1. VIAF Enrichment Incomplete

    • VIAF XML/JSON API unavailable (HTTP 404)
    • Only 19 VIAF IDs from original extractions
    • See PROGRESS.md Phase 4 for details
  2. OSM Enrichment Partial

    • 186 institutions have OSM IDs (61.2%)
    • Only 83 successfully enriched (44.6% enrichment rate)
    • 34 fetch errors (504 gateway timeouts)
    • Missing OSM tags for many heritage institutions
  3. ISIL Codes Missing

    • No public ISIL registries for BR/MX/CL
    • National library outreach in progress
    • Deadline: 2025-11-13
  4. Geocoding Coverage

    • 61.5% geocoded (187/304 institutions)
    • 117 institutions lack coordinates
    • Opportunities: Google Places API, manual verification

Confidence Scores

All extractions include provenance metadata with confidence scores:

  • 0.9-1.0: Explicit mentions with authoritative sources
  • 0.7-0.9: Clear mentions with context
  • 0.5-0.7: Inferred from context
  • 0.3-0.5: Low confidence, needs verification

Data Tiers

  • TIER_1_AUTHORITATIVE: CSV registries (not applicable to Latin America)
  • TIER_2_VERIFIED: Institutional websites (not yet applied)
  • TIER_3_CROWD_SOURCED: Wikidata, OpenStreetMap (56 + 83 institutions)
  • TIER_4_INFERRED: NLP extraction from conversations (all 304 institutions)

Usage Guidelines

Reading the Authoritative File

import yaml

with open('latin_american_institutions_AUTHORITATIVE.yaml', 'r', encoding='utf-8') as f:
    institutions = yaml.safe_load(f)

print(f"Total institutions: {len(institutions)}")

Validating Against Schema

linkml-validate -s schemas/heritage_custodian.yaml \
  data/instances/latin_american_institutions_AUTHORITATIVE.yaml

Generating Exports

python scripts/export_latin_american_datasets.py

Filtering by Country

import yaml

with open('latin_american_institutions_AUTHORITATIVE.yaml', 'r', encoding='utf-8') as f:
    institutions = yaml.safe_load(f)

brazilian_institutions = [
    inst for inst in institutions 
    if inst.get('locations') and 
    any(loc.get('country') == 'BR' for loc in inst['locations'])
]

Rollback Instructions

If you need to revert to a previous version:

  1. Identify the desired version in archive/2025-11-06_pre-consolidation/intermediate_versions/
  2. Copy to instances directory:
    cp archive/2025-11-06_pre-consolidation/intermediate_versions/latin_american_institutions_enriched.yaml \
       latin_american_institutions_AUTHORITATIVE.yaml
    
  3. Regenerate exports if needed

Next Steps

Immediate (By 2025-11-13)

  1. National Library Outreach: Submit 3 email drafts for ISIL codes
  2. Data Quality Review: Verify fuzzy Wikidata matches (37 < 95% confidence)
  3. Geographic Visualization: Create interactive map from GeoJSON

Future Enhancements

  1. Web Scraping: Crawl institutional websites (126 URLs available)
  2. Google Places API: Enrich 117 non-geocoded institutions
  3. OSM Contribution: Add missing heritage institutions to OpenStreetMap
  4. Schema Validation: Run linkml-validate on all 304 records
  5. Relationship Extraction: Map institutional partnerships and networks

Contact

Project: GLAM Data Extraction
Schema: LinkML v0.2.0 (modular)
Documentation: /docs/plan/global_glam/
Issues: See PROGRESS.md for known issues and blockers


Archive Date: 2025-11-06
Archival Reason: Consolidation to single authoritative file
Archived Files: 12 YAML files, 2 MD reports
Archive Size: ~2.5 MB total