# Examples This directory contains usage examples for the GLAM data extraction pipeline. ## Available Examples ### extract_identifiers.py Demonstrates how to extract identifiers (ISIL, Wikidata, VIAF, KvK, URLs) from conversation JSON files. **Usage**: ```bash cd /Users/kempersc/Documents/claude/glam PYTHONPATH=./src:$PYTHONPATH python examples/extract_identifiers.py ``` **What it does**: 1. Loads a sample conversation JSON file 2. Parses the conversation structure 3. Extracts text from assistant messages 4. Runs identifier extraction using regex patterns 5. Displays results grouped by identifier type **Expected output**: ``` === Conversation: Test Dutch GLAM Institutions === Messages: 4 Total identifiers found: 4 Identifiers by scheme: ISIL: NL-ASDRM, NL-HANA URL: https://www.rijksmuseum.nl/en/rijksstudio, https://www.nationaalarchief.nl ``` ## Running Examples All examples should be run from the project root with PYTHONPATH set: ```bash # From project root cd /Users/kempersc/Documents/claude/glam # Set PYTHONPATH and run PYTHONPATH=./src:$PYTHONPATH python examples/.py ``` ## Future Examples - **extract_from_csv.py** - Parse Dutch ISIL registry and organizations CSV - **extract_with_ner.py** - Use subagent-based NER to extract institution names - **geocode_locations.py** - Geocode addresses to lat/lon coordinates - **export_to_jsonld.py** - Export extracted data to JSON-LD format - **validate_schema.py** - Validate data against LinkML schema - **deduplicate.py** - Find and merge duplicate institution records