glam/DENMARK_QUICK_REFERENCE.md
2025-11-19 23:25:22 +01:00

141 lines
3.6 KiB
Markdown

# Denmark GLAM Dataset - Quick Reference Card
**Last Updated**: 2025-11-19
**Status**: ✅ Complete (2,348 institutions)
---
## Files at a Glance
| File | Institutions | Size | Purpose |
|------|--------------|------|---------|
| `denmark_complete.json` | **2,348** | 3.06 MB | ⭐ **MASTER FILE** - Use this |
| `denmark_libraries_v2.json` | 555 | 964 KB | Main libraries only |
| `denmark_archives.json` | 594 | 918 KB | Archives only |
| `denmark_library_branches.json` | 1,199 | 1.2 MB | Library branches only |
**Location**: `/Users/kempersc/apps/glam/data/instances/`
---
## Quick Statistics
```
Total Institutions: 2,348
├── Libraries (main): 555
│ ├── Public libraries: 108
│ └── Research libraries (FFU): 447
├── Archives: 594
└── Library branches: 1,199
├── Public branches: 594
└── FFU branches: 605
GHCID Coverage: 998/2,348 (42.5%)
ISIL Coverage: 555/2,348 (23.6%)
Hierarchical Links: 1,176/1,199 (98.1%)
```
---
## Common Queries
### Load complete dataset
```python
import json
with open('data/instances/denmark_complete.json', 'r') as f:
danish_glam = json.load(f)
```
### Filter by institution type
```python
archives = [i for i in danish_glam if i['institution_type'] == 'ARCHIVE']
libraries = [i for i in danish_glam if i['institution_type'] == 'LIBRARY']
main_libs = [i for i in libraries if not i.get('parent_organization')]
branches = [i for i in libraries if i.get('parent_organization')]
```
### Find institutions by city
```python
copenhagen = [
i for i in danish_glam
if any(loc.get('city') == 'København K' for loc in i.get('locations', []))
]
```
### Get institutions with GHCID
```python
with_ghcid = [i for i in danish_glam if i.get('ghcid_current')]
```
### Get hierarchical structure (library + branches)
```python
def get_library_with_branches(library_id):
"""Get a library and all its branches."""
library = next(i for i in danish_glam if i['id'] == library_id)
branches = [
i for i in danish_glam
if i.get('parent_organization') == library_id
]
return {'library': library, 'branches': branches}
# Example
kb_system = get_library_with_branches(
'https://w3id.org/heritage/custodian/dk/library/k%C3%B8benhavn-k/k%C3%B8benhavns-biblioteker'
)
print(f"{kb_system['library']['name']}: {len(kb_system['branches'])} branches")
```
---
## Top Cities
| City | Count | City | Count |
|------|-------|------|-------|
| Aalborg | 35 | København K | 30 |
| Esbjerg | 30 | Hjørring | 28 |
| Vejle | 28 | Herning | 26 |
| Aarhus | 22 | Ringkøbing-Skjern | 22 |
---
## Next Steps
**For RDF Export**:
```bash
# Generate Turtle RDF
linkml-convert -s schemas/heritage_custodian.yaml -t rdf \
data/instances/denmark_complete.json > data/rdf/denmark.ttl
```
**For Wikidata Enrichment**:
```bash
# Query Wikidata for Danish institutions
python3 scripts/enrich_denmark_wikidata.py
```
**For Analysis**:
```bash
# Generate statistics report
python3 scripts/analyze_denmark_dataset.py
```
---
## Key Design Decisions
**Library branches use parent_organization** - Reduces redundancy
**Archives get GHCID (no ISIL)** - GHCID is primary identifier
**Nordic characters normalized** - æ→ae, ø→oe, å→aa in GHCID
**98.1% hierarchical linkage** - Near-perfect parent-child matching
---
## Session Documents
- `SESSION_SUMMARY_20251119_DENMARK_COMPLETE.md` - Full session report
- `SESSION_SUMMARY_20251119_DENMARK_ARCHIVES_COMPLETE.md` - Archive processing
- `SESSION_SUMMARY_20251119_DENMARK_ISIL_COMPLETE.md` - Library processing
---
**Questions?** See `SESSION_SUMMARY_20251119_DENMARK_COMPLETE.md` for detailed documentation.