2.7 KiB
2.7 KiB
Task: Curate Chilean GLAM Institutions from Conversation
Objective
Manually enrich the 90 existing Chilean GLAM institution records by extracting comprehensive information from the source conversation JSON file.
Source Files
- Conversation JSON:
/Users/kempersc/apps/glam/data/raw/chilean_glam_conversation.json - Current YAML:
/Users/kempersc/apps/glam/data/instances/chilean_institutions.yaml(90 minimally-populated records) - Target Output:
/Users/kempersc/apps/glam/data/instances/chilean_institutions_curated.yaml
Required Enrichments
For each of the 90 institutions, extract and add:
- Rich Descriptions - Contextual information about the institution from conversation
- Complete Location Data - Cities, addresses, coordinates where mentioned
- Identifiers - ISIL codes, Wikidata IDs, URLs, platform IDs
- Digital Platforms - SURDOC, SINAR, institutional websites, catalogs
- Collection Metadata - Types, subjects, temporal coverage, extent
- Change History - Founding dates, mergers, organizational events
- Provenance Tracking - Enhanced confidence scores based on explicit vs. inferred data
Schema Compliance
All records MUST conform to LinkML schema v0.2.0:
schemas/core.yaml- HeritageCustodian, Location, Identifier, DigitalPlatformschemas/enums.yaml- InstitutionTypeEnum, ChangeTypeEnum, DataSource, DataTierschemas/provenance.yaml- Provenance, ChangeEvent, GHCIDHistoryEntryschemas/collections.yaml- Collection, Accession, DigitalObject
Key Conversation Content
The conversation contains information about:
- 695+ library services nationwide
- 500,000+ digitized archival records
- 72,000+ catalogued museum objects
- National platforms: SURDOC, SINAR, Memoria Chilena
- Major institutions across all Chilean regions
- Regional networks and specialized collections
Expected Deliverables
- Fully curated YAML file with 90 enriched records
- Report on data completeness and quality
- List of top 5 most complete records
- List of institutions with minimal data (need further research)
Instructions for NLP Agent
Read the entire conversation JSON file and extract ALL available information for EACH of the 90 institutions currently in the YAML file. Create comprehensive, LinkML-compliant records with:
- Detailed descriptions synthesized from conversation context
- All mentioned locations, identifiers, platforms
- Inferred collection information where appropriate
- Founding dates and organizational history
- Proper confidence scores (0.9-1.0 for explicit mentions, 0.5-0.8 for inferred data)
Use your full comprehension abilities to create the most complete, accurate records possible.