glam/QUICK_START_AUSTRALIA.md
2025-11-19 23:25:22 +01:00

2.6 KiB

Quick Start: Australian Heritage Institution Extraction

Time Required: 10-15 minutes
Expected Output: 200-500 Australian heritage institutions
Data Quality: TIER_1_AUTHORITATIVE (0.95 confidence)


Step 1: Get Trove API Key (5 minutes)

  1. Visit: https://trove.nla.gov.au/about/create-something/using-api
  2. Click "Sign up for an API key"
  3. Fill out form:
    • Name
    • Email
    • Intended use: "Heritage institution research"
  4. Check email for API key (arrives immediately)
  5. Save key somewhere secure

Step 2: Run Extraction (2-5 minutes)

cd /Users/kempersc/apps/glam

python scripts/extract_trove_contributors.py --api-key YOUR_TROVE_API_KEY

What happens:

  • Fetches all Trove contributors (200-500 institutions)
  • Retrieves full details (respects 200 req/min rate limit)
  • Classifies by GLAMORCUBESFIXPHDNT type
  • Generates GHCID identifiers
  • Exports to YAML, JSON, CSV

Step 3: Check Results (1 minute)

# Count institutions
wc -l data/instances/trove_contributors_*.csv

# View sample
head -n 50 data/instances/trove_contributors_*.yaml

# Check types
grep "institution_type:" data/instances/trove_contributors_*.yaml | sort | uniq -c

Expected Output

data/instances/
├── trove_contributors_20251118_143000.yaml  # Full records
├── trove_contributors_20251118_143000.json  # JSON format
└── trove_contributors_20251118_143000.csv   # Spreadsheet

Sample Record:

- name: National Library of Australia
  institution_type: L  # Library
  ghcid_current: AU-ACT-CAN-L-NLA
  identifiers:
    - identifier_scheme: NUC
      identifier_value: NLA
    - identifier_scheme: ISIL
      identifier_value: AU-NLA
  homepage: https://www.nla.gov.au

Troubleshooting

"API key required": Get key from https://trove.nla.gov.au/
Rate limit errors: Add --delay 0.5 to slow down requests
No results: Check internet connection and Trove status


Advanced Options

# Custom output directory
python scripts/extract_trove_contributors.py \
  --api-key YOUR_KEY \
  --output-dir data/australia

# Slower rate (safer)
python scripts/extract_trove_contributors.py \
  --api-key YOUR_KEY \
  --delay 0.5

# YAML only
python scripts/extract_trove_contributors.py \
  --api-key YOUR_KEY \
  --formats yaml

Documentation

  • Full Guide: docs/AUSTRALIA_TROVE_EXTRACTION.md
  • Session Summary: SESSION_SUMMARY_20251118_AUSTRALIA_TROVE.md
  • Next Steps: NEXT_STEPS.md

Status: Ready to run
Your Action: Get API key, then run the script