glam/data/isil/germany/QUICK_START.md

# German ISIL Data - Quick Start Guide

## Files Location

```
/Users/kempersc/apps/glam/data/isil/germany/
├── german_isil_complete_20251119_134939.json    # Full dataset (37 MB)
├── german_isil_complete_20251119_134939.jsonl   # Line-delimited (24 MB)
├── german_isil_stats_20251119_134941.json       # Statistics (7.6 KB)
└── HARVEST_REPORT.md                             # This harvest report
```

## Quick Access Examples

### 1. Load Full Dataset in Python

```python
import json

# Load complete dataset
with open('german_isil_complete_20251119_134939.json', 'r') as f:
    data = json.load(f)

print(f"Total records: {data['metadata']['total_records']}")
print(f"First institution: {data['records'][0]['name']}")
```

### 2. Stream Processing (JSONL)

```python
import json

# Process one record at a time (memory-efficient)
with open('german_isil_complete_20251119_134939.jsonl', 'r') as f:
    for line in f:
        record = json.loads(line)
        if record['address'].get('city') == 'Berlin':
            print(f"{record['name']} - {record['isil']}")
```

### 3. Find Records by ISIL

```bash
# Using jq (JSON query tool)
jq '.records[] | select(.isil == "DE-1")' german_isil_complete_20251119_134939.json

# Using grep (fast text search)
grep -i "staatsbibliothek" german_isil_complete_20251119_134939.jsonl
```

### 4. Extract All Libraries in Munich

```bash
jq '.records[] | select(.address.city == "München")' german_isil_complete_20251119_134939.json
```

### 5. Count Institutions by Region

```bash
jq '.records | group_by(.interloan_region) | map({region: .[0].interloan_region, count: length})' german_isil_complete_20251119_134939.json
```

### 6. Export to CSV

```python
import json
import csv

with open('german_isil_complete_20251119_134939.json', 'r') as f:
    data = json.load(f)

with open('german_isil.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['ISIL', 'Name', 'City', 'Email', 'URL'])

    for record in data['records']:
        writer.writerow([
            record['isil'],
            record['name'],
            record['address'].get('city', ''),
            record['contact'].get('email', ''),
            record['urls'][0]['url'] if record['urls'] else ''
        ])
```

### 7. Filter by Institution Type

```python
import json

with open('german_isil_complete_20251119_134939.json', 'r') as f:
    data = json.load(f)

# Find all Max Planck Institute libraries
mpi_libraries = [
    r for r in data['records']
    if r['institution_type'] and r['institution_type'].startswith('MPI')
]

print(f"Found {len(mpi_libraries)} Max Planck Institute libraries")
```

### 8. Create Geographic Map

```python
import json
import folium

with open('german_isil_complete_20251119_134939.json', 'r') as f:
    data = json.load(f)

# Create map centered on Germany
m = folium.Map(location=[51.1657, 10.4515], zoom_start=6)

# Add markers for institutions with coordinates
for record in data['records']:
    lat = record['address'].get('latitude')
    lon = record['address'].get('longitude')

    if lat and lon:
        folium.Marker(
            location=[float(lat), float(lon)],
            popup=f"{record['name']}<br>{record['isil']}",
            tooltip=record['name']
        ).add_to(m)

m.save('german_isil_map.html')
```

### 9. Convert to LinkML YAML

```python
import json
import yaml

with open('german_isil_complete_20251119_134939.json', 'r') as f:
    data = json.load(f)

# Convert to GLAM project schema
glam_records = []

for record in data['records']:
    glam_record = {
        'id': f"https://w3id.org/heritage/custodian/de/{record['isil'].lower()}",
        'name': record['name'],
        'institution_type': 'LIBRARY',  # TODO: classify properly
        'locations': [{
            'city': record['address'].get('city'),
            'street_address': record['address'].get('street'),
            'postal_code': record['address'].get('postal_code'),
            'country': 'DE',
            'latitude': float(record['address']['latitude']) if record['address'].get('latitude') else None,
            'longitude': float(record['address']['longitude']) if record['address'].get('longitude') else None
        }],
        'identifiers': [{
            'identifier_scheme': 'ISIL',
            'identifier_value': record['isil'],
            'identifier_url': f"https://sigel.staatsbibliothek-berlin.de/isil/{record['isil']}"
        }],
        'provenance': {
            'data_source': 'CSV_REGISTRY',
            'data_tier': 'TIER_1_AUTHORITATIVE',
            'extraction_date': '2025-11-19T12:49:39Z'
        }
    }
    glam_records.append(glam_record)

# Save first 10 as example
with open('german_isil_linkml_example.yaml', 'w') as f:
    yaml.dump(glam_records[:10], f, allow_unicode=True, default_flow_style=False)
```

## Common Use Cases

### Find All Archives
```bash
jq '.records[] | select(.name | test("archiv"; "i"))' german_isil_complete_20251119_134939.json
```

### Find All Museums
```bash
jq '.records[] | select(.name | test("museum"; "i"))' german_isil_complete_20251119_134939.json
```

### Get Records with Email
```bash
jq '.records[] | select(.contact.email != null)' german_isil_complete_20251119_134939.json
```

### Extract URLs Only
```bash
jq -r '.records[].urls[]?.url' german_isil_complete_20251119_134939.json > german_isil_urls.txt
```

### Statistics by Federal State
```python
import json
from collections import Counter

with open('german_isil_complete_20251119_134939.json', 'r') as f:
    data = json.load(f)

states = Counter(r['address'].get('region') for r in data['records'])
for state, count in states.most_common(10):
    print(f"{state}: {count}")
```

## Integration with GLAM Project

See `HARVEST_REPORT.md` for detailed integration recommendations.

**Key Tasks**:
1. Map institution types to GLAMORCUBESFIXPHDNT taxonomy
2. Generate GHCIDs for each institution
3. Enrich with Wikidata Q-numbers
4. Cross-reference with other German registries
5. Export to RDF/Turtle for Linked Data

## Data Quality Notes

- ✅ 87% have coordinates - excellent for mapping
- ✅ 79% have URLs - good for web scraping
- ⚠️ 38% have emails - may need enrichment
- ⚠️ 97% lack institution type codes - requires classification

## Need Help?

See the GLAM project documentation:
- `/Users/kempersc/apps/glam/AGENTS.md` - AI agent instructions
- `/Users/kempersc/apps/glam/docs/` - Schema and design docs
- `/Users/kempersc/apps/glam/scripts/` - Example parsers and converters