| .. | ||
| algeria | ||
| all | ||
| archive | ||
| argentina | ||
| backups | ||
| belgium | ||
| brazil | ||
| canada | ||
| chile | ||
| conferences | ||
| georgia | ||
| germany | ||
| global | ||
| great_britain | ||
| italy | ||
| japan | ||
| journals | ||
| libya | ||
| luxembourg | ||
| mexico | ||
| morocco | ||
| netherlands | ||
| north_africa | ||
| norway | ||
| publications | ||
| test_outputs | ||
| tunisia | ||
| united_states | ||
| argentina_complete.yaml | ||
| argentina_conabip_raw.yaml | ||
| austria_complete.yaml | ||
| austria_isil.yaml | ||
| belarus_complete.yaml | ||
| belarus_isil_enriched.yaml | ||
| belgium_complete.yaml | ||
| belgium_isil.yaml | ||
| belgium_isil_institutions.yaml | ||
| belgium_isil_institutions_enriched.yaml | ||
| belgium_isil_institutions_wikidata.yaml | ||
| bulgaria_complete.yaml | ||
| bulgaria_isil_libraries.yaml | ||
| bulgaria_isil_libraries_enriched.yaml | ||
| conversations_extracted.yaml | ||
| czech_archives_aron.yaml | ||
| czech_sample.yaml | ||
| denmark_archives.json | ||
| denmark_libraries.json | ||
| denmark_libraries.yaml | ||
| denmark_libraries_v2.json | ||
| denmark_sample.json | ||
| egypt_institutions.yaml | ||
| egypt_institutions_final_enriched.yaml | ||
| egypt_institutions_ghcid.yaml | ||
| egypt_institutions_viaf_enriched.yaml | ||
| egypt_institutions_wikidata_corrected.yaml | ||
| egypt_institutions_wikidata_enriched.yaml | ||
| egypt_institutions_wikidata_viaf.yaml | ||
| egypt_step1.yaml | ||
| egypt_step1_2.yaml | ||
| egypt_step3.yaml | ||
| egypt_step4.yaml | ||
| eu_institutions.yaml | ||
| georgia_glam_institutions.yaml | ||
| georgia_glam_institutions_enriched.pre_enrichment_backfill_20251111_100230.yaml | ||
| georgia_glam_institutions_enriched.yaml | ||
| historical_institutions_validation.yaml | ||
| japan_archives.yaml | ||
| latin_american_institutions_AUTHORITATIVE.backup_20251106_124619.yaml | ||
| latin_american_institutions_AUTHORITATIVE.pre_enrichment_backfill_20251111_100229.yaml | ||
| latin_american_institutions_AUTHORITATIVE.yaml | ||
| libya_heritage_institutions_extracted.json | ||
| libya_historic_buildings_museums_batch4.json | ||
| libya_museums_batch2.json | ||
| libya_sites_digital_manuscripts_batch3.json | ||
| libya_universities_batch1.json | ||
| netherlands_complete.yaml | ||
| netherlands_isil_raw.yaml | ||
| palestinian_custodians_only.yaml | ||
| palestinian_heritage_custodians.yaml | ||
| palestinian_observations_only.yaml | ||
| README.md | ||
| test_geographic_restrictions.yaml | ||
| vietnamese_glam_institutions.yaml | ||
| vietnamese_institutions_extracted.json | ||
GLAM Instance Data - Authoritative Files
Last Updated: 2025-11-06
Status: Consolidated and Archived
Authoritative Dataset
Latin American GLAM Institutions (Brazil, Chile, Mexico)
File: latin_american_institutions_AUTHORITATIVE.yaml
- Total Institutions: 304
- Brazil: 97
- Chile: 90
- Mexico: 117
- Data Tier: TIER_4_INFERRED (conversation NLP extraction)
- Enrichments Applied:
- ✅ Wikidata IDs: 56 institutions (18.4%)
- ✅ VIAF IDs: 19 institutions (6.3%) - API unavailable, IDs preserved
- ✅ OpenStreetMap data: 83 institutions (27.3%)
- ✅ Geocoding: 187 institutions (61.5%)
- ✅ ISIL Gap Documentation: All 304 institutions
- File Size: 470 KB
- Schema Version: LinkML v0.2.0 (modular)
- Last Enrichment: 2025-11-06 (OpenStreetMap enrichment)
Enrichment Details:
| Enrichment Type | Count | Examples |
|---|---|---|
| Street addresses | 33 | "Avenida Feliciano Coelho 1502" |
| Contact info | 19 | Phone numbers, email addresses |
| Websites | 16 | Institutional URLs from OSM |
| Alternative names | 13 | Multilingual, official names |
| Opening hours | 10 | OSM opening_hours format |
Use This File For:
- Production data pipelines
- Export generation (JSON-LD, CSV, GeoJSON)
- Geographic visualization
- Cross-linking with other datasets
- Schema validation
- Research and analysis
Archived Files
All superseded files have been archived to maintain data provenance and enable rollback if needed.
Archive Location
archive/2025-11-06_pre-consolidation/
Archive Structure
archive/2025-11-06_pre-consolidation/
├── intermediate_versions/ # Enrichment pipeline stages
│ ├── latin_american_institutions.yaml # Original combined (313 KB)
│ ├── latin_american_institutions_documented.yaml # + ISIL gap notes (444 KB)
│ ├── latin_american_institutions_enriched.yaml # + Wikidata (329 KB)
│ ├── latin_american_institutions_viaf_enriched.yaml # + VIAF IDs (446 KB)
│ └── latin_american_institutions_osm_enriched.yaml # + OSM data (470 KB) ← SOURCE OF AUTHORITATIVE
├── individual_countries/ # Pre-combination country files
│ ├── brazilian_institutions.yaml # 97 institutions (84 KB)
│ ├── chilean_institutions.yaml # 90 institutions (107 KB)
│ └── mexican_institutions.yaml # 117 institutions (122 KB)
├── backup_files/ # Temporary backup files
│ ├── mexican_institutions.yaml.bak
│ └── mexican_institutions.yaml.bak2
├── latin_american_combination_report.md # Country combination report
└── latin_american_validation_report.md # Validation report
Enrichment Pipeline History
The authoritative file represents the final stage of a 5-phase enrichment pipeline:
-
Phase 1: Wikidata Enrichment (2025-11-06)
- Script:
scripts/enrich_from_wikidata.py - Result: 56 Wikidata IDs added
- Output:
latin_american_institutions_enriched.yaml
- Script:
-
Phase 2: ISIL Gap Documentation (2025-11-06)
- Script:
scripts/add_isil_gap_notes.py - Result: All 304 institutions documented
- Output:
latin_american_institutions_documented.yaml
- Script:
-
Phase 3: National Library Outreach (2025-11-06)
- Script:
scripts/draft_national_library_emails.py - Result: 3 bilingual emails drafted
- Documentation:
docs/national_library_outreach_emails.md
- Script:
-
Phase 4: VIAF Enrichment (2025-11-06) ❌ BLOCKED
- Script:
scripts/enrich_from_viaf.py - Status: VIAF XML/JSON API returns HTTP 404
- Result: 19 existing VIAF IDs preserved
- Output:
latin_american_institutions_viaf_enriched.yaml
- Script:
-
Phase 5: OpenStreetMap Enrichment (2025-11-06) ✅
- Scripts:
scripts/enrich_from_osm_batched.pyscripts/resume_osm_enrichment.py
- Result: 83 institutions enriched with OSM data
- Output:
latin_american_institutions_osm_enriched.yaml→ AUTHORITATIVE
- Scripts:
See PROGRESS.md for detailed enrichment statistics and docs/osm_enrichment_report.md for Phase 5 analysis.
Export Files
All exports are generated from the authoritative file.
Location: exports/
Generated Files:
latin_american_institutions_osm_enriched.jsonld(576 KB) - Linked Data formatlatin_american_institutions_osm_enriched.csv(113 KB) - Spreadsheet formatlatin_american_institutions_osm_enriched.geojson(124 KB) - Geographic format (187 institutions)latin_american_osm_enriched_statistics.json(0.9 KB) - Summary statistics
Export Script: scripts/export_latin_american_datasets.py
Other Directories
brazil/, chile/, mexico/
Individual country extraction workspaces. Superseded by consolidated file.
cache/
Geocoding and API response caches. Used for performance optimization.
reports/
Validation reports, quality checks, and analysis documents.
test_outputs/
Development and testing outputs. Not for production use.
backups/
Timestamped backup archives from previous processing stages:
2025-11-06_pre-geocoding.tar.gz2025-11-06_chilean-geocoded-v2.tar.gz2025-11-06_mexican-geocoded-final.tar.gz- etc.
Data Quality Notes
Known Limitations
-
VIAF Enrichment Incomplete
- VIAF XML/JSON API unavailable (HTTP 404)
- Only 19 VIAF IDs from original extractions
- See
PROGRESS.mdPhase 4 for details
-
OSM Enrichment Partial
- 186 institutions have OSM IDs (61.2%)
- Only 83 successfully enriched (44.6% enrichment rate)
- 34 fetch errors (504 gateway timeouts)
- Missing OSM tags for many heritage institutions
-
ISIL Codes Missing
- No public ISIL registries for BR/MX/CL
- National library outreach in progress
- Deadline: 2025-11-13
-
Geocoding Coverage
- 61.5% geocoded (187/304 institutions)
- 117 institutions lack coordinates
- Opportunities: Google Places API, manual verification
Confidence Scores
All extractions include provenance metadata with confidence scores:
- 0.9-1.0: Explicit mentions with authoritative sources
- 0.7-0.9: Clear mentions with context
- 0.5-0.7: Inferred from context
- 0.3-0.5: Low confidence, needs verification
Data Tiers
- TIER_1_AUTHORITATIVE: CSV registries (not applicable to Latin America)
- TIER_2_VERIFIED: Institutional websites (not yet applied)
- TIER_3_CROWD_SOURCED: Wikidata, OpenStreetMap (56 + 83 institutions)
- TIER_4_INFERRED: NLP extraction from conversations (all 304 institutions)
Usage Guidelines
Reading the Authoritative File
import yaml
with open('latin_american_institutions_AUTHORITATIVE.yaml', 'r', encoding='utf-8') as f:
institutions = yaml.safe_load(f)
print(f"Total institutions: {len(institutions)}")
Validating Against Schema
linkml-validate -s schemas/heritage_custodian.yaml \
data/instances/latin_american_institutions_AUTHORITATIVE.yaml
Generating Exports
python scripts/export_latin_american_datasets.py
Filtering by Country
import yaml
with open('latin_american_institutions_AUTHORITATIVE.yaml', 'r', encoding='utf-8') as f:
institutions = yaml.safe_load(f)
brazilian_institutions = [
inst for inst in institutions
if inst.get('locations') and
any(loc.get('country') == 'BR' for loc in inst['locations'])
]
Rollback Instructions
If you need to revert to a previous version:
- Identify the desired version in
archive/2025-11-06_pre-consolidation/intermediate_versions/ - Copy to instances directory:
cp archive/2025-11-06_pre-consolidation/intermediate_versions/latin_american_institutions_enriched.yaml \ latin_american_institutions_AUTHORITATIVE.yaml - Regenerate exports if needed
Next Steps
Immediate (By 2025-11-13)
- National Library Outreach: Submit 3 email drafts for ISIL codes
- Data Quality Review: Verify fuzzy Wikidata matches (37 < 95% confidence)
- Geographic Visualization: Create interactive map from GeoJSON
Future Enhancements
- Web Scraping: Crawl institutional websites (126 URLs available)
- Google Places API: Enrich 117 non-geocoded institutions
- OSM Contribution: Add missing heritage institutions to OpenStreetMap
- Schema Validation: Run linkml-validate on all 304 records
- Relationship Extraction: Map institutional partnerships and networks
Contact
Project: GLAM Data Extraction
Schema: LinkML v0.2.0 (modular)
Documentation: /docs/plan/global_glam/
Issues: See PROGRESS.md for known issues and blockers
Archive Date: 2025-11-06
Archival Reason: Consolidation to single authoritative file
Archived Files: 12 YAML files, 2 MD reports
Archive Size: ~2.5 MB total