349 lines
11 KiB
Markdown
349 lines
11 KiB
Markdown
# German ISIL Database Harvest Report
|
|
|
|
**Date**: November 19, 2025
|
|
**Harvester**: OpenCode + MCP Wikidata Tools
|
|
**Status**: ✅ **COMPLETE**
|
|
|
|
## Overview
|
|
|
|
Successfully harvested the complete German ISIL (International Standard Identifier for Libraries and Related Organizations) database from the Staatsbibliothek zu Berlin and Deutsche Nationalbibliothek.
|
|
|
|
## Data Source
|
|
|
|
- **Provider**: Staatsbibliothek zu Berlin (German ISIL Agency)
|
|
- **Website**: https://sigel.staatsbibliothek-berlin.de/
|
|
- **API Protocol**: SRU 1.1 (Search/Retrieve via URL)
|
|
- **API Endpoint**: https://services.dnb.de/sru/bib
|
|
- **Data Format**: PicaPlus-XML (parsed to JSON)
|
|
- **License**: CC0 1.0 Universal (Public Domain)
|
|
|
|
## Coverage
|
|
|
|
The German ISIL database is the **authoritative registry** for heritage institutions in Germany with ISIL codes. It covers:
|
|
|
|
- **Libraries** (Bibliotheken)
|
|
- Public libraries
|
|
- Academic libraries
|
|
- Research libraries
|
|
- Special libraries
|
|
|
|
- **Archives** (Archive)
|
|
- State archives
|
|
- City archives
|
|
- Corporate archives
|
|
- Personal archives
|
|
|
|
- **Museums** (Museen)
|
|
- Art museums
|
|
- History museums
|
|
- Science museums
|
|
- Technical museums
|
|
|
|
- **Related Organizations**
|
|
- Documentation centers
|
|
- Research institutes with libraries
|
|
- Heritage societies
|
|
|
|
## Harvest Statistics
|
|
|
|
### Total Records
|
|
- **16,979 institutions** with German ISIL codes (DE-*)
|
|
|
|
### Data Completeness
|
|
- **87.0%** have street addresses (14,765 records)
|
|
- **79.4%** have URLs (13,483 records)
|
|
- **79.1%** have phone numbers (13,429 records)
|
|
- **87.0%** have geographic coordinates (14,771 records)
|
|
- **37.8%** have email addresses (6,420 records)
|
|
|
|
### Geographic Distribution (Top 10 Regions)
|
|
1. **NRW** (North Rhine-Westphalia): 1,503 institutions
|
|
2. **BAW** (Baden-Württemberg): 1,295 institutions
|
|
3. **BAY** (Bavaria): 1,204 institutions
|
|
4. **HES** (Hesse): 659 institutions
|
|
5. **BER** (Berlin): 614 institutions
|
|
6. **NIE** (Lower Saxony): 598 institutions
|
|
7. **HAM** (Hamburg): 450 institutions
|
|
8. **SAX** (Saxony): 397 institutions
|
|
9. **SAA** (Saxony-Anhalt): 308 institutions
|
|
10. **THU** (Thuringia): 249 institutions
|
|
|
|
**Note**: 9,654 records (56.9%) have no interloan region code assigned.
|
|
|
|
### Institution Types
|
|
|
|
The database uses institutional codes (e.g., SBBPK, TUM, UBM) rather than standardized type classifications. The most common codes include:
|
|
|
|
- **University libraries** (UB-*): 50+ institutions
|
|
- **State libraries**: 10+ institutions
|
|
- **Max Planck Institute libraries** (MPI-*): 20+ institutions
|
|
- **Federal agency libraries** (B-*): 15+ institutions
|
|
- **University of Applied Sciences libraries** (FH-*): 80+ institutions
|
|
|
|
**Note**: 16,408 records (96.6%) have no institution type code in the database.
|
|
|
|
## Output Files
|
|
|
|
### 1. Complete Dataset (JSON)
|
|
**File**: `german_isil_complete_20251119_134939.json`
|
|
**Size**: 37 MB
|
|
**Format**: Structured JSON with metadata header
|
|
|
|
**Structure**:
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"source": "German ISIL Database (Staatsbibliothek zu Berlin)",
|
|
"harvest_date": "2025-11-19T12:49:39Z",
|
|
"total_records": 16979,
|
|
"license": "CC0 1.0 Universal"
|
|
},
|
|
"records": [...]
|
|
}
|
|
```
|
|
|
|
### 2. Complete Dataset (JSONL)
|
|
**File**: `german_isil_complete_20251119_134939.jsonl`
|
|
**Size**: 24 MB
|
|
**Format**: JSON Lines (one record per line)
|
|
|
|
**Use case**: Stream processing, database imports, line-by-line analysis
|
|
|
|
### 3. Statistics Summary
|
|
**File**: `german_isil_stats_20251119_134941.json`
|
|
**Size**: 7.6 KB
|
|
**Format**: JSON
|
|
|
|
**Contains**:
|
|
- Total record counts
|
|
- Data completeness percentages
|
|
- Institution type distribution
|
|
- Geographic distribution by interloan region
|
|
|
|
## Record Schema
|
|
|
|
Each record contains:
|
|
|
|
```json
|
|
{
|
|
"isil": "DE-1", // ISIL identifier
|
|
"name": "Staatsbibliothek zu Berlin", // Official name
|
|
"alternative_names": [], // Alternative forms
|
|
"institution_type": "SBBPK", // Institution code (optional)
|
|
"address": {
|
|
"street": "Unter den Linden 8",
|
|
"city": "Berlin",
|
|
"postal_code": "10117",
|
|
"country": "DE",
|
|
"region": "Berlin",
|
|
"latitude": "52.51755",
|
|
"longitude": "13.39162"
|
|
},
|
|
"contact": {
|
|
"phone": "+49-30-2 66-433888",
|
|
"fax": "+49-30-2 66-333701",
|
|
"email": "info@sbb.spk-berlin.de"
|
|
},
|
|
"urls": [
|
|
{
|
|
"url": "http://staatsbibliothek-berlin.de",
|
|
"type": "A",
|
|
"label": null
|
|
}
|
|
],
|
|
"parent_org": null, // Parent institution (if branch)
|
|
"interloan_region": "BER", // Interloan region code
|
|
"notes": "...", // Collection descriptions, etc.
|
|
"raw_pica": {...} // Full PICA+ data structure
|
|
}
|
|
```
|
|
|
|
## Data Quality
|
|
|
|
### Strengths
|
|
✅ **Authoritative source** - Official German ISIL registry
|
|
✅ **Complete coverage** - All German ISIL-registered institutions
|
|
✅ **High geographic precision** - 87% have coordinates
|
|
✅ **Rich contact data** - Phone, email, URLs for most records
|
|
✅ **Well-structured addresses** - Standardized format
|
|
✅ **Public domain license** - No restrictions on reuse
|
|
|
|
### Limitations
|
|
⚠️ **Limited type classification** - 96.6% have no institution type code
|
|
⚠️ **No English translations** - Names and descriptions in German only
|
|
⚠️ **Incomplete interloan data** - 56.9% have no region assignment
|
|
⚠️ **Email coverage** - Only 37.8% have email addresses
|
|
⚠️ **Historical data** - No founding dates or closure dates
|
|
|
|
## API Access Methods
|
|
|
|
The German ISIL database offers three API access methods:
|
|
|
|
### 1. SRU (Search/Retrieve via URL)
|
|
**Endpoint**: https://services.dnb.de/sru/bib
|
|
**Protocol**: SRU 1.1
|
|
**Formats**: PicaPlus-XML, RDF/XML
|
|
**Query Language**: CQL (Common Query Language)
|
|
|
|
**Example**:
|
|
```bash
|
|
curl "https://services.dnb.de/sru/bib?version=1.1&operation=searchRetrieve&query=isil%3DDE-1&recordSchema=PicaPlus-xml&maximumRecords=1"
|
|
```
|
|
|
|
### 2. JSON-API
|
|
**Endpoint**: https://isil.staatsbibliothek-berlin.de/api/org.jsonld
|
|
**Format**: JSON-LD
|
|
**Query Language**: CQL
|
|
|
|
**Example**:
|
|
```bash
|
|
curl "https://isil.staatsbibliothek-berlin.de/api/org.jsonld?q=ort%3DBerlin&size=10"
|
|
```
|
|
|
|
### 3. Linked Data Service
|
|
**Endpoint**: https://ld.zdb-services.de/resource/organisations/<ISIL>
|
|
**Formats**: RDF/XML, Turtle, JSON-LD, HTML
|
|
**Protocol**: Content negotiation (303 redirects)
|
|
|
|
**Example**:
|
|
```bash
|
|
curl -H "Accept: application/rdf+xml" "https://ld.zdb-services.de/resource/organisations/DE-1"
|
|
```
|
|
|
|
## Integration with GLAM Project
|
|
|
|
### Recommended Next Steps
|
|
|
|
1. **Parse and Convert to LinkML**
|
|
- Map PICA+ fields to `HeritageCustodian` schema
|
|
- Classify institutions using GLAMORCUBESFIXPHDNT taxonomy
|
|
- Assign data tier: **TIER_1_AUTHORITATIVE**
|
|
|
|
2. **Enrich with Wikidata**
|
|
- Query Wikidata for matching Q-numbers
|
|
- Add founding dates, collection information
|
|
- Link to parent organizations
|
|
|
|
3. **Cross-reference with Other Sources**
|
|
- Compare with International ISIL Registry
|
|
- Link to Museum Digital (museum-digital.de)
|
|
- Connect with Archive Portal Germany (archivportal-d.de)
|
|
|
|
4. **Generate GHCIDs**
|
|
- Create persistent identifiers for each institution
|
|
- Use format: `DE-[REGION]-[CITY]-[TYPE]-[ABBR]`
|
|
- Link ISIL codes as `Identifier` records
|
|
|
|
## Harvester Implementation
|
|
|
|
**Script**: `scripts/scrapers/harvest_german_isil_sru.py`
|
|
|
|
**Features**:
|
|
- ✅ Batch processing (100 records per request)
|
|
- ✅ Rate limiting (1 second delay between requests)
|
|
- ✅ Automatic retry on failure (3 attempts)
|
|
- ✅ Progress tracking
|
|
- ✅ Error handling
|
|
- ✅ Multiple output formats (JSON, JSONL)
|
|
- ✅ Complete PICA+ field parsing
|
|
|
|
**Performance**:
|
|
- **Total time**: ~3 minutes
|
|
- **Records/second**: ~94
|
|
- **Requests**: 170 (batch size 100)
|
|
- **No errors or failed requests**
|
|
|
|
## Example Records
|
|
|
|
### Example 1: Staatsbibliothek zu Berlin
|
|
```json
|
|
{
|
|
"isil": "DE-1",
|
|
"name": "Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Unter den Linden",
|
|
"alternative_names": ["Berlin SBB Haus Unter d.Linden"],
|
|
"institution_type": "SBBPK",
|
|
"address": {
|
|
"street": "Unter den Linden 8",
|
|
"city": "Berlin",
|
|
"postal_code": "10117",
|
|
"country": "DE",
|
|
"region": "Berlin",
|
|
"latitude": "52.51755",
|
|
"longitude": "13.39162"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Example 2: Stadtarchiv Augsburg
|
|
```json
|
|
{
|
|
"isil": "DE-Aug9",
|
|
"name": "Stadtarchiv Augsburg",
|
|
"alternative_names": ["Augsburg Stadtarchiv"],
|
|
"address": {
|
|
"street": "Zur Kammgarnspinnerei 11",
|
|
"city": "Augsburg",
|
|
"postal_code": "86153",
|
|
"country": "DE",
|
|
"region": "Bayern",
|
|
"latitude": "48.36337",
|
|
"longitude": "10.91350"
|
|
},
|
|
"interloan_region": "BAY"
|
|
}
|
|
```
|
|
|
|
## Comparison with Other National ISIL Registries
|
|
|
|
| Country | Registry | Records | API | Coverage |
|
|
|---------|----------|---------|-----|----------|
|
|
| **Germany** | Staatsbibliothek zu Berlin | **16,979** | ✅ SRU, JSON, Linked Data | **Comprehensive** |
|
|
| Netherlands | KB | ~1,400 | ✅ CSV | Libraries, archives |
|
|
| Austria | OBVSG | ~3,000 | ✅ Search | Libraries only |
|
|
| Switzerland | Swiss National Library | ~1,500 | ✅ Search | Libraries, archives |
|
|
| France | ABES | ~5,000 | ✅ API | Academic libraries |
|
|
| UK | British Library | ~4,000 | ⚠️ (cyber attack) | Libraries |
|
|
|
|
**Germany has the largest and most comprehensive ISIL registry in Europe.**
|
|
|
|
## References
|
|
|
|
### Documentation
|
|
- ISIL Registry Homepage: https://sigel.staatsbibliothek-berlin.de/
|
|
- SRU API Documentation: https://sigel.staatsbibliothek-berlin.de/schnittstellen/api/sru
|
|
- JSON API Documentation: https://sigel.staatsbibliothek-berlin.de/schnittstellen/api/json-api
|
|
- Linked Data Service: https://sigel.staatsbibliothek-berlin.de/schnittstellen/api/linked-data-service
|
|
- PICA+ Format Specification: https://sigel.staatsbibliothek-berlin.de/vergabe/adressenformat
|
|
|
|
### Standards
|
|
- ISO 15511:2019 - International Standard Identifier for Libraries
|
|
- SRU 1.1 - Search/Retrieve via URL (Library of Congress)
|
|
- PICA+ - Library cataloging format (OCLC)
|
|
- CC0 1.0 Universal - Public Domain Dedication
|
|
|
|
### Contact
|
|
- **German ISIL Agency**: isil@slks.dk (international) / Staatsbibliothek zu Berlin
|
|
- **Technical Contact**: Carsten Klee (carsten.klee@sbb.spk-berlin.de)
|
|
- **Phone**: +49 30 266 434402
|
|
|
|
## License
|
|
|
|
The harvested data is licensed under **CC0 1.0 Universal (Public Domain Dedication)**.
|
|
|
|
You are free to:
|
|
- ✅ Copy, modify, distribute the data
|
|
- ✅ Use for commercial purposes
|
|
- ✅ Use without attribution (though attribution is appreciated)
|
|
|
|
**Attribution** (optional but recommended):
|
|
```
|
|
Data source: German ISIL Database, Staatsbibliothek zu Berlin
|
|
Retrieved: November 19, 2025
|
|
License: CC0 1.0 Universal
|
|
```
|
|
|
|
---
|
|
|
|
**Generated by**: OpenCode + MCP Wikidata Tools
|
|
**Report Date**: November 19, 2025
|
|
**Report Version**: 1.0
|