glam/QUICK_START_AUSTRALIA.md
2025-11-19 23:25:22 +01:00

116 lines
2.6 KiB
Markdown

# Quick Start: Australian Heritage Institution Extraction
**Time Required**: 10-15 minutes
**Expected Output**: 200-500 Australian heritage institutions
**Data Quality**: TIER_1_AUTHORITATIVE (0.95 confidence)
---
## Step 1: Get Trove API Key (5 minutes)
1. Visit: https://trove.nla.gov.au/about/create-something/using-api
2. Click "Sign up for an API key"
3. Fill out form:
- Name
- Email
- Intended use: "Heritage institution research"
4. Check email for API key (arrives immediately)
5. Save key somewhere secure
---
## Step 2: Run Extraction (2-5 minutes)
```bash
cd /Users/kempersc/apps/glam
python scripts/extract_trove_contributors.py --api-key YOUR_TROVE_API_KEY
```
**What happens**:
- Fetches all Trove contributors (200-500 institutions)
- Retrieves full details (respects 200 req/min rate limit)
- Classifies by GLAMORCUBESFIXPHDNT type
- Generates GHCID identifiers
- Exports to YAML, JSON, CSV
---
## Step 3: Check Results (1 minute)
```bash
# Count institutions
wc -l data/instances/trove_contributors_*.csv
# View sample
head -n 50 data/instances/trove_contributors_*.yaml
# Check types
grep "institution_type:" data/instances/trove_contributors_*.yaml | sort | uniq -c
```
---
## Expected Output
```
data/instances/
├── trove_contributors_20251118_143000.yaml # Full records
├── trove_contributors_20251118_143000.json # JSON format
└── trove_contributors_20251118_143000.csv # Spreadsheet
```
**Sample Record**:
```yaml
- name: National Library of Australia
institution_type: L # Library
ghcid_current: AU-ACT-CAN-L-NLA
identifiers:
- identifier_scheme: NUC
identifier_value: NLA
- identifier_scheme: ISIL
identifier_value: AU-NLA
homepage: https://www.nla.gov.au
```
---
## Troubleshooting
**"API key required"**: Get key from https://trove.nla.gov.au/
**Rate limit errors**: Add `--delay 0.5` to slow down requests
**No results**: Check internet connection and Trove status
---
## Advanced Options
```bash
# Custom output directory
python scripts/extract_trove_contributors.py \
--api-key YOUR_KEY \
--output-dir data/australia
# Slower rate (safer)
python scripts/extract_trove_contributors.py \
--api-key YOUR_KEY \
--delay 0.5
# YAML only
python scripts/extract_trove_contributors.py \
--api-key YOUR_KEY \
--formats yaml
```
---
## Documentation
- **Full Guide**: `docs/AUSTRALIA_TROVE_EXTRACTION.md`
- **Session Summary**: `SESSION_SUMMARY_20251118_AUSTRALIA_TROVE.md`
- **Next Steps**: `NEXT_STEPS.md`
---
**Status**: ✅ Ready to run
**Your Action**: Get API key, then run the script