glam/examples/README.md
2025-11-19 23:25:22 +01:00

1.5 KiB

Examples

This directory contains usage examples for the GLAM data extraction pipeline.

Available Examples

extract_identifiers.py

Demonstrates how to extract identifiers (ISIL, Wikidata, VIAF, KvK, URLs) from conversation JSON files.

Usage:

cd /Users/kempersc/Documents/claude/glam
PYTHONPATH=./src:$PYTHONPATH python examples/extract_identifiers.py

What it does:

  1. Loads a sample conversation JSON file
  2. Parses the conversation structure
  3. Extracts text from assistant messages
  4. Runs identifier extraction using regex patterns
  5. Displays results grouped by identifier type

Expected output:

=== Conversation: Test Dutch GLAM Institutions ===
Messages: 4
Total identifiers found: 4

Identifiers by scheme:
  ISIL: NL-ASDRM, NL-HANA
  URL: https://www.rijksmuseum.nl/en/rijksstudio, https://www.nationaalarchief.nl

Running Examples

All examples should be run from the project root with PYTHONPATH set:

# From project root
cd /Users/kempersc/Documents/claude/glam

# Set PYTHONPATH and run
PYTHONPATH=./src:$PYTHONPATH python examples/<example_name>.py

Future Examples

  • extract_from_csv.py - Parse Dutch ISIL registry and organizations CSV
  • extract_with_ner.py - Use subagent-based NER to extract institution names
  • geocode_locations.py - Geocode addresses to lat/lon coordinates
  • export_to_jsonld.py - Export extracted data to JSON-LD format
  • validate_schema.py - Validate data against LinkML schema
  • deduplicate.py - Find and merge duplicate institution records