glam/curate_chilean_institutions.md

# Task: Curate Chilean GLAM Institutions from Conversation

## Objective
Manually enrich the 90 existing Chilean GLAM institution records by extracting comprehensive information from the source conversation JSON file.

## Source Files
- **Conversation JSON**: `/Users/kempersc/apps/glam/data/raw/chilean_glam_conversation.json`
- **Current YAML**: `/Users/kempersc/apps/glam/data/instances/chilean_institutions.yaml` (90 minimally-populated records)
- **Target Output**: `/Users/kempersc/apps/glam/data/instances/chilean_institutions_curated.yaml`

## Required Enrichments

For each of the 90 institutions, extract and add:

1. **Rich Descriptions** - Contextual information about the institution from conversation
2. **Complete Location Data** - Cities, addresses, coordinates where mentioned
3. **Identifiers** - ISIL codes, Wikidata IDs, URLs, platform IDs
4. **Digital Platforms** - SURDOC, SINAR, institutional websites, catalogs
5. **Collection Metadata** - Types, subjects, temporal coverage, extent
6. **Change History** - Founding dates, mergers, organizational events
7. **Provenance Tracking** - Enhanced confidence scores based on explicit vs. inferred data

## Schema Compliance

All records MUST conform to LinkML schema v0.2.0:
- `schemas/core.yaml` - HeritageCustodian, Location, Identifier, DigitalPlatform
- `schemas/enums.yaml` - InstitutionTypeEnum, ChangeTypeEnum, DataSource, DataTier
- `schemas/provenance.yaml` - Provenance, ChangeEvent, GHCIDHistoryEntry
- `schemas/collections.yaml` - Collection, Accession, DigitalObject

## Key Conversation Content

The conversation contains information about:
- **695+ library services** nationwide
- **500,000+ digitized archival records**
- **72,000+ catalogued museum objects**
- National platforms: SURDOC, SINAR, Memoria Chilena
- Major institutions across all Chilean regions
- Regional networks and specialized collections

## Expected Deliverables

1. Fully curated YAML file with 90 enriched records
2. Report on data completeness and quality
3. List of top 5 most complete records
4. List of institutions with minimal data (need further research)

## Instructions for NLP Agent

Read the entire conversation JSON file and extract ALL available information for EACH of the 90 institutions currently in the YAML file. Create comprehensive, LinkML-compliant records with:

- Detailed descriptions synthesized from conversation context
- All mentioned locations, identifiers, platforms
- Inferred collection information where appropriate
- Founding dates and organizational history
- Proper confidence scores (0.9-1.0 for explicit mentions, 0.5-0.8 for inferred data)

Use your full comprehension abilities to create the most complete, accurate records possible.