# German ISIL Data - Quick Start Guide ## Files Location ``` /Users/kempersc/apps/glam/data/isil/germany/ ├── german_isil_complete_20251119_134939.json # Full dataset (37 MB) ├── german_isil_complete_20251119_134939.jsonl # Line-delimited (24 MB) ├── german_isil_stats_20251119_134941.json # Statistics (7.6 KB) └── HARVEST_REPORT.md # This harvest report ``` ## Quick Access Examples ### 1. Load Full Dataset in Python ```python import json # Load complete dataset with open('german_isil_complete_20251119_134939.json', 'r') as f: data = json.load(f) print(f"Total records: {data['metadata']['total_records']}") print(f"First institution: {data['records'][0]['name']}") ``` ### 2. Stream Processing (JSONL) ```python import json # Process one record at a time (memory-efficient) with open('german_isil_complete_20251119_134939.jsonl', 'r') as f: for line in f: record = json.loads(line) if record['address'].get('city') == 'Berlin': print(f"{record['name']} - {record['isil']}") ``` ### 3. Find Records by ISIL ```bash # Using jq (JSON query tool) jq '.records[] | select(.isil == "DE-1")' german_isil_complete_20251119_134939.json # Using grep (fast text search) grep -i "staatsbibliothek" german_isil_complete_20251119_134939.jsonl ``` ### 4. Extract All Libraries in Munich ```bash jq '.records[] | select(.address.city == "München")' german_isil_complete_20251119_134939.json ``` ### 5. Count Institutions by Region ```bash jq '.records | group_by(.interloan_region) | map({region: .[0].interloan_region, count: length})' german_isil_complete_20251119_134939.json ``` ### 6. Export to CSV ```python import json import csv with open('german_isil_complete_20251119_134939.json', 'r') as f: data = json.load(f) with open('german_isil.csv', 'w', newline='', encoding='utf-8') as f: writer = csv.writer(f) writer.writerow(['ISIL', 'Name', 'City', 'Email', 'URL']) for record in data['records']: writer.writerow([ record['isil'], record['name'], record['address'].get('city', ''), record['contact'].get('email', ''), record['urls'][0]['url'] if record['urls'] else '' ]) ``` ### 7. Filter by Institution Type ```python import json with open('german_isil_complete_20251119_134939.json', 'r') as f: data = json.load(f) # Find all Max Planck Institute libraries mpi_libraries = [ r for r in data['records'] if r['institution_type'] and r['institution_type'].startswith('MPI') ] print(f"Found {len(mpi_libraries)} Max Planck Institute libraries") ``` ### 8. Create Geographic Map ```python import json import folium with open('german_isil_complete_20251119_134939.json', 'r') as f: data = json.load(f) # Create map centered on Germany m = folium.Map(location=[51.1657, 10.4515], zoom_start=6) # Add markers for institutions with coordinates for record in data['records']: lat = record['address'].get('latitude') lon = record['address'].get('longitude') if lat and lon: folium.Marker( location=[float(lat), float(lon)], popup=f"{record['name']}
{record['isil']}", tooltip=record['name'] ).add_to(m) m.save('german_isil_map.html') ``` ### 9. Convert to LinkML YAML ```python import json import yaml with open('german_isil_complete_20251119_134939.json', 'r') as f: data = json.load(f) # Convert to GLAM project schema glam_records = [] for record in data['records']: glam_record = { 'id': f"https://w3id.org/heritage/custodian/de/{record['isil'].lower()}", 'name': record['name'], 'institution_type': 'LIBRARY', # TODO: classify properly 'locations': [{ 'city': record['address'].get('city'), 'street_address': record['address'].get('street'), 'postal_code': record['address'].get('postal_code'), 'country': 'DE', 'latitude': float(record['address']['latitude']) if record['address'].get('latitude') else None, 'longitude': float(record['address']['longitude']) if record['address'].get('longitude') else None }], 'identifiers': [{ 'identifier_scheme': 'ISIL', 'identifier_value': record['isil'], 'identifier_url': f"https://sigel.staatsbibliothek-berlin.de/isil/{record['isil']}" }], 'provenance': { 'data_source': 'CSV_REGISTRY', 'data_tier': 'TIER_1_AUTHORITATIVE', 'extraction_date': '2025-11-19T12:49:39Z' } } glam_records.append(glam_record) # Save first 10 as example with open('german_isil_linkml_example.yaml', 'w') as f: yaml.dump(glam_records[:10], f, allow_unicode=True, default_flow_style=False) ``` ## Common Use Cases ### Find All Archives ```bash jq '.records[] | select(.name | test("archiv"; "i"))' german_isil_complete_20251119_134939.json ``` ### Find All Museums ```bash jq '.records[] | select(.name | test("museum"; "i"))' german_isil_complete_20251119_134939.json ``` ### Get Records with Email ```bash jq '.records[] | select(.contact.email != null)' german_isil_complete_20251119_134939.json ``` ### Extract URLs Only ```bash jq -r '.records[].urls[]?.url' german_isil_complete_20251119_134939.json > german_isil_urls.txt ``` ### Statistics by Federal State ```python import json from collections import Counter with open('german_isil_complete_20251119_134939.json', 'r') as f: data = json.load(f) states = Counter(r['address'].get('region') for r in data['records']) for state, count in states.most_common(10): print(f"{state}: {count}") ``` ## Integration with GLAM Project See `HARVEST_REPORT.md` for detailed integration recommendations. **Key Tasks**: 1. Map institution types to GLAMORCUBESFIXPHDNT taxonomy 2. Generate GHCIDs for each institution 3. Enrich with Wikidata Q-numbers 4. Cross-reference with other German registries 5. Export to RDF/Turtle for Linked Data ## Data Quality Notes - ✅ 87% have coordinates - excellent for mapping - ✅ 79% have URLs - good for web scraping - ⚠️ 38% have emails - may need enrichment - ⚠️ 97% lack institution type codes - requires classification ## Need Help? See the GLAM project documentation: - `/Users/kempersc/apps/glam/AGENTS.md` - AI agent instructions - `/Users/kempersc/apps/glam/docs/` - Schema and design docs - `/Users/kempersc/apps/glam/scripts/` - Example parsers and converters