glam/curate_chilean_institutions.md
2025-11-19 23:25:22 +01:00

2.7 KiB

Task: Curate Chilean GLAM Institutions from Conversation

Objective

Manually enrich the 90 existing Chilean GLAM institution records by extracting comprehensive information from the source conversation JSON file.

Source Files

  • Conversation JSON: /Users/kempersc/apps/glam/data/raw/chilean_glam_conversation.json
  • Current YAML: /Users/kempersc/apps/glam/data/instances/chilean_institutions.yaml (90 minimally-populated records)
  • Target Output: /Users/kempersc/apps/glam/data/instances/chilean_institutions_curated.yaml

Required Enrichments

For each of the 90 institutions, extract and add:

  1. Rich Descriptions - Contextual information about the institution from conversation
  2. Complete Location Data - Cities, addresses, coordinates where mentioned
  3. Identifiers - ISIL codes, Wikidata IDs, URLs, platform IDs
  4. Digital Platforms - SURDOC, SINAR, institutional websites, catalogs
  5. Collection Metadata - Types, subjects, temporal coverage, extent
  6. Change History - Founding dates, mergers, organizational events
  7. Provenance Tracking - Enhanced confidence scores based on explicit vs. inferred data

Schema Compliance

All records MUST conform to LinkML schema v0.2.0:

  • schemas/core.yaml - HeritageCustodian, Location, Identifier, DigitalPlatform
  • schemas/enums.yaml - InstitutionTypeEnum, ChangeTypeEnum, DataSource, DataTier
  • schemas/provenance.yaml - Provenance, ChangeEvent, GHCIDHistoryEntry
  • schemas/collections.yaml - Collection, Accession, DigitalObject

Key Conversation Content

The conversation contains information about:

  • 695+ library services nationwide
  • 500,000+ digitized archival records
  • 72,000+ catalogued museum objects
  • National platforms: SURDOC, SINAR, Memoria Chilena
  • Major institutions across all Chilean regions
  • Regional networks and specialized collections

Expected Deliverables

  1. Fully curated YAML file with 90 enriched records
  2. Report on data completeness and quality
  3. List of top 5 most complete records
  4. List of institutions with minimal data (need further research)

Instructions for NLP Agent

Read the entire conversation JSON file and extract ALL available information for EACH of the 90 institutions currently in the YAML file. Create comprehensive, LinkML-compliant records with:

  • Detailed descriptions synthesized from conversation context
  • All mentioned locations, identifiers, platforms
  • Inferred collection information where appropriate
  • Founding dates and organizational history
  • Proper confidence scores (0.9-1.0 for explicit mentions, 0.5-0.8 for inferred data)

Use your full comprehension abilities to create the most complete, accurate records possible.