glam/curate_chilean_institutions.md
2025-11-19 23:25:22 +01:00

58 lines
2.7 KiB
Markdown

# Task: Curate Chilean GLAM Institutions from Conversation
## Objective
Manually enrich the 90 existing Chilean GLAM institution records by extracting comprehensive information from the source conversation JSON file.
## Source Files
- **Conversation JSON**: `/Users/kempersc/apps/glam/data/raw/chilean_glam_conversation.json`
- **Current YAML**: `/Users/kempersc/apps/glam/data/instances/chilean_institutions.yaml` (90 minimally-populated records)
- **Target Output**: `/Users/kempersc/apps/glam/data/instances/chilean_institutions_curated.yaml`
## Required Enrichments
For each of the 90 institutions, extract and add:
1. **Rich Descriptions** - Contextual information about the institution from conversation
2. **Complete Location Data** - Cities, addresses, coordinates where mentioned
3. **Identifiers** - ISIL codes, Wikidata IDs, URLs, platform IDs
4. **Digital Platforms** - SURDOC, SINAR, institutional websites, catalogs
5. **Collection Metadata** - Types, subjects, temporal coverage, extent
6. **Change History** - Founding dates, mergers, organizational events
7. **Provenance Tracking** - Enhanced confidence scores based on explicit vs. inferred data
## Schema Compliance
All records MUST conform to LinkML schema v0.2.0:
- `schemas/core.yaml` - HeritageCustodian, Location, Identifier, DigitalPlatform
- `schemas/enums.yaml` - InstitutionTypeEnum, ChangeTypeEnum, DataSource, DataTier
- `schemas/provenance.yaml` - Provenance, ChangeEvent, GHCIDHistoryEntry
- `schemas/collections.yaml` - Collection, Accession, DigitalObject
## Key Conversation Content
The conversation contains information about:
- **695+ library services** nationwide
- **500,000+ digitized archival records**
- **72,000+ catalogued museum objects**
- National platforms: SURDOC, SINAR, Memoria Chilena
- Major institutions across all Chilean regions
- Regional networks and specialized collections
## Expected Deliverables
1. Fully curated YAML file with 90 enriched records
2. Report on data completeness and quality
3. List of top 5 most complete records
4. List of institutions with minimal data (need further research)
## Instructions for NLP Agent
Read the entire conversation JSON file and extract ALL available information for EACH of the 90 institutions currently in the YAML file. Create comprehensive, LinkML-compliant records with:
- Detailed descriptions synthesized from conversation context
- All mentioned locations, identifiers, platforms
- Inferred collection information where appropriate
- Founding dates and organizational history
- Proper confidence scores (0.9-1.0 for explicit mentions, 0.5-0.8 for inferred data)
Use your full comprehension abilities to create the most complete, accurate records possible.