58 lines
2.7 KiB
Markdown
58 lines
2.7 KiB
Markdown
# Task: Curate Chilean GLAM Institutions from Conversation
|
|
|
|
## Objective
|
|
Manually enrich the 90 existing Chilean GLAM institution records by extracting comprehensive information from the source conversation JSON file.
|
|
|
|
## Source Files
|
|
- **Conversation JSON**: `/Users/kempersc/apps/glam/data/raw/chilean_glam_conversation.json`
|
|
- **Current YAML**: `/Users/kempersc/apps/glam/data/instances/chilean_institutions.yaml` (90 minimally-populated records)
|
|
- **Target Output**: `/Users/kempersc/apps/glam/data/instances/chilean_institutions_curated.yaml`
|
|
|
|
## Required Enrichments
|
|
|
|
For each of the 90 institutions, extract and add:
|
|
|
|
1. **Rich Descriptions** - Contextual information about the institution from conversation
|
|
2. **Complete Location Data** - Cities, addresses, coordinates where mentioned
|
|
3. **Identifiers** - ISIL codes, Wikidata IDs, URLs, platform IDs
|
|
4. **Digital Platforms** - SURDOC, SINAR, institutional websites, catalogs
|
|
5. **Collection Metadata** - Types, subjects, temporal coverage, extent
|
|
6. **Change History** - Founding dates, mergers, organizational events
|
|
7. **Provenance Tracking** - Enhanced confidence scores based on explicit vs. inferred data
|
|
|
|
## Schema Compliance
|
|
|
|
All records MUST conform to LinkML schema v0.2.0:
|
|
- `schemas/core.yaml` - HeritageCustodian, Location, Identifier, DigitalPlatform
|
|
- `schemas/enums.yaml` - InstitutionTypeEnum, ChangeTypeEnum, DataSource, DataTier
|
|
- `schemas/provenance.yaml` - Provenance, ChangeEvent, GHCIDHistoryEntry
|
|
- `schemas/collections.yaml` - Collection, Accession, DigitalObject
|
|
|
|
## Key Conversation Content
|
|
|
|
The conversation contains information about:
|
|
- **695+ library services** nationwide
|
|
- **500,000+ digitized archival records**
|
|
- **72,000+ catalogued museum objects**
|
|
- National platforms: SURDOC, SINAR, Memoria Chilena
|
|
- Major institutions across all Chilean regions
|
|
- Regional networks and specialized collections
|
|
|
|
## Expected Deliverables
|
|
|
|
1. Fully curated YAML file with 90 enriched records
|
|
2. Report on data completeness and quality
|
|
3. List of top 5 most complete records
|
|
4. List of institutions with minimal data (need further research)
|
|
|
|
## Instructions for NLP Agent
|
|
|
|
Read the entire conversation JSON file and extract ALL available information for EACH of the 90 institutions currently in the YAML file. Create comprehensive, LinkML-compliant records with:
|
|
|
|
- Detailed descriptions synthesized from conversation context
|
|
- All mentioned locations, identifiers, platforms
|
|
- Inferred collection information where appropriate
|
|
- Founding dates and organizational history
|
|
- Proper confidence scores (0.9-1.0 for explicit mentions, 0.5-0.8 for inferred data)
|
|
|
|
Use your full comprehension abilities to create the most complete, accurate records possible.
|