# German ISIL Database Harvest Report **Date**: November 19, 2025 **Harvester**: OpenCode + MCP Wikidata Tools **Status**: ✅ **COMPLETE** ## Overview Successfully harvested the complete German ISIL (International Standard Identifier for Libraries and Related Organizations) database from the Staatsbibliothek zu Berlin and Deutsche Nationalbibliothek. ## Data Source - **Provider**: Staatsbibliothek zu Berlin (German ISIL Agency) - **Website**: https://sigel.staatsbibliothek-berlin.de/ - **API Protocol**: SRU 1.1 (Search/Retrieve via URL) - **API Endpoint**: https://services.dnb.de/sru/bib - **Data Format**: PicaPlus-XML (parsed to JSON) - **License**: CC0 1.0 Universal (Public Domain) ## Coverage The German ISIL database is the **authoritative registry** for heritage institutions in Germany with ISIL codes. It covers: - **Libraries** (Bibliotheken) - Public libraries - Academic libraries - Research libraries - Special libraries - **Archives** (Archive) - State archives - City archives - Corporate archives - Personal archives - **Museums** (Museen) - Art museums - History museums - Science museums - Technical museums - **Related Organizations** - Documentation centers - Research institutes with libraries - Heritage societies ## Harvest Statistics ### Total Records - **16,979 institutions** with German ISIL codes (DE-*) ### Data Completeness - **87.0%** have street addresses (14,765 records) - **79.4%** have URLs (13,483 records) - **79.1%** have phone numbers (13,429 records) - **87.0%** have geographic coordinates (14,771 records) - **37.8%** have email addresses (6,420 records) ### Geographic Distribution (Top 10 Regions) 1. **NRW** (North Rhine-Westphalia): 1,503 institutions 2. **BAW** (Baden-Württemberg): 1,295 institutions 3. **BAY** (Bavaria): 1,204 institutions 4. **HES** (Hesse): 659 institutions 5. **BER** (Berlin): 614 institutions 6. **NIE** (Lower Saxony): 598 institutions 7. **HAM** (Hamburg): 450 institutions 8. **SAX** (Saxony): 397 institutions 9. **SAA** (Saxony-Anhalt): 308 institutions 10. **THU** (Thuringia): 249 institutions **Note**: 9,654 records (56.9%) have no interloan region code assigned. ### Institution Types The database uses institutional codes (e.g., SBBPK, TUM, UBM) rather than standardized type classifications. The most common codes include: - **University libraries** (UB-*): 50+ institutions - **State libraries**: 10+ institutions - **Max Planck Institute libraries** (MPI-*): 20+ institutions - **Federal agency libraries** (B-*): 15+ institutions - **University of Applied Sciences libraries** (FH-*): 80+ institutions **Note**: 16,408 records (96.6%) have no institution type code in the database. ## Output Files ### 1. Complete Dataset (JSON) **File**: `german_isil_complete_20251119_134939.json` **Size**: 37 MB **Format**: Structured JSON with metadata header **Structure**: ```json { "metadata": { "source": "German ISIL Database (Staatsbibliothek zu Berlin)", "harvest_date": "2025-11-19T12:49:39Z", "total_records": 16979, "license": "CC0 1.0 Universal" }, "records": [...] } ``` ### 2. Complete Dataset (JSONL) **File**: `german_isil_complete_20251119_134939.jsonl` **Size**: 24 MB **Format**: JSON Lines (one record per line) **Use case**: Stream processing, database imports, line-by-line analysis ### 3. Statistics Summary **File**: `german_isil_stats_20251119_134941.json` **Size**: 7.6 KB **Format**: JSON **Contains**: - Total record counts - Data completeness percentages - Institution type distribution - Geographic distribution by interloan region ## Record Schema Each record contains: ```json { "isil": "DE-1", // ISIL identifier "name": "Staatsbibliothek zu Berlin", // Official name "alternative_names": [], // Alternative forms "institution_type": "SBBPK", // Institution code (optional) "address": { "street": "Unter den Linden 8", "city": "Berlin", "postal_code": "10117", "country": "DE", "region": "Berlin", "latitude": "52.51755", "longitude": "13.39162" }, "contact": { "phone": "+49-30-2 66-433888", "fax": "+49-30-2 66-333701", "email": "info@sbb.spk-berlin.de" }, "urls": [ { "url": "http://staatsbibliothek-berlin.de", "type": "A", "label": null } ], "parent_org": null, // Parent institution (if branch) "interloan_region": "BER", // Interloan region code "notes": "...", // Collection descriptions, etc. "raw_pica": {...} // Full PICA+ data structure } ``` ## Data Quality ### Strengths ✅ **Authoritative source** - Official German ISIL registry ✅ **Complete coverage** - All German ISIL-registered institutions ✅ **High geographic precision** - 87% have coordinates ✅ **Rich contact data** - Phone, email, URLs for most records ✅ **Well-structured addresses** - Standardized format ✅ **Public domain license** - No restrictions on reuse ### Limitations ⚠️ **Limited type classification** - 96.6% have no institution type code ⚠️ **No English translations** - Names and descriptions in German only ⚠️ **Incomplete interloan data** - 56.9% have no region assignment ⚠️ **Email coverage** - Only 37.8% have email addresses ⚠️ **Historical data** - No founding dates or closure dates ## API Access Methods The German ISIL database offers three API access methods: ### 1. SRU (Search/Retrieve via URL) **Endpoint**: https://services.dnb.de/sru/bib **Protocol**: SRU 1.1 **Formats**: PicaPlus-XML, RDF/XML **Query Language**: CQL (Common Query Language) **Example**: ```bash curl "https://services.dnb.de/sru/bib?version=1.1&operation=searchRetrieve&query=isil%3DDE-1&recordSchema=PicaPlus-xml&maximumRecords=1" ``` ### 2. JSON-API **Endpoint**: https://isil.staatsbibliothek-berlin.de/api/org.jsonld **Format**: JSON-LD **Query Language**: CQL **Example**: ```bash curl "https://isil.staatsbibliothek-berlin.de/api/org.jsonld?q=ort%3DBerlin&size=10" ``` ### 3. Linked Data Service **Endpoint**: https://ld.zdb-services.de/resource/organisations/ **Formats**: RDF/XML, Turtle, JSON-LD, HTML **Protocol**: Content negotiation (303 redirects) **Example**: ```bash curl -H "Accept: application/rdf+xml" "https://ld.zdb-services.de/resource/organisations/DE-1" ``` ## Integration with GLAM Project ### Recommended Next Steps 1. **Parse and Convert to LinkML** - Map PICA+ fields to `HeritageCustodian` schema - Classify institutions using GLAMORCUBESFIXPHDNT taxonomy - Assign data tier: **TIER_1_AUTHORITATIVE** 2. **Enrich with Wikidata** - Query Wikidata for matching Q-numbers - Add founding dates, collection information - Link to parent organizations 3. **Cross-reference with Other Sources** - Compare with International ISIL Registry - Link to Museum Digital (museum-digital.de) - Connect with Archive Portal Germany (archivportal-d.de) 4. **Generate GHCIDs** - Create persistent identifiers for each institution - Use format: `DE-[REGION]-[CITY]-[TYPE]-[ABBR]` - Link ISIL codes as `Identifier` records ## Harvester Implementation **Script**: `scripts/scrapers/harvest_german_isil_sru.py` **Features**: - ✅ Batch processing (100 records per request) - ✅ Rate limiting (1 second delay between requests) - ✅ Automatic retry on failure (3 attempts) - ✅ Progress tracking - ✅ Error handling - ✅ Multiple output formats (JSON, JSONL) - ✅ Complete PICA+ field parsing **Performance**: - **Total time**: ~3 minutes - **Records/second**: ~94 - **Requests**: 170 (batch size 100) - **No errors or failed requests** ## Example Records ### Example 1: Staatsbibliothek zu Berlin ```json { "isil": "DE-1", "name": "Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Unter den Linden", "alternative_names": ["Berlin SBB Haus Unter d.Linden"], "institution_type": "SBBPK", "address": { "street": "Unter den Linden 8", "city": "Berlin", "postal_code": "10117", "country": "DE", "region": "Berlin", "latitude": "52.51755", "longitude": "13.39162" } } ``` ### Example 2: Stadtarchiv Augsburg ```json { "isil": "DE-Aug9", "name": "Stadtarchiv Augsburg", "alternative_names": ["Augsburg Stadtarchiv"], "address": { "street": "Zur Kammgarnspinnerei 11", "city": "Augsburg", "postal_code": "86153", "country": "DE", "region": "Bayern", "latitude": "48.36337", "longitude": "10.91350" }, "interloan_region": "BAY" } ``` ## Comparison with Other National ISIL Registries | Country | Registry | Records | API | Coverage | |---------|----------|---------|-----|----------| | **Germany** | Staatsbibliothek zu Berlin | **16,979** | ✅ SRU, JSON, Linked Data | **Comprehensive** | | Netherlands | KB | ~1,400 | ✅ CSV | Libraries, archives | | Austria | OBVSG | ~3,000 | ✅ Search | Libraries only | | Switzerland | Swiss National Library | ~1,500 | ✅ Search | Libraries, archives | | France | ABES | ~5,000 | ✅ API | Academic libraries | | UK | British Library | ~4,000 | ⚠️ (cyber attack) | Libraries | **Germany has the largest and most comprehensive ISIL registry in Europe.** ## References ### Documentation - ISIL Registry Homepage: https://sigel.staatsbibliothek-berlin.de/ - SRU API Documentation: https://sigel.staatsbibliothek-berlin.de/schnittstellen/api/sru - JSON API Documentation: https://sigel.staatsbibliothek-berlin.de/schnittstellen/api/json-api - Linked Data Service: https://sigel.staatsbibliothek-berlin.de/schnittstellen/api/linked-data-service - PICA+ Format Specification: https://sigel.staatsbibliothek-berlin.de/vergabe/adressenformat ### Standards - ISO 15511:2019 - International Standard Identifier for Libraries - SRU 1.1 - Search/Retrieve via URL (Library of Congress) - PICA+ - Library cataloging format (OCLC) - CC0 1.0 Universal - Public Domain Dedication ### Contact - **German ISIL Agency**: isil@slks.dk (international) / Staatsbibliothek zu Berlin - **Technical Contact**: Carsten Klee (carsten.klee@sbb.spk-berlin.de) - **Phone**: +49 30 266 434402 ## License The harvested data is licensed under **CC0 1.0 Universal (Public Domain Dedication)**. You are free to: - ✅ Copy, modify, distribute the data - ✅ Use for commercial purposes - ✅ Use without attribution (though attribution is appreciated) **Attribution** (optional but recommended): ``` Data source: German ISIL Database, Staatsbibliothek zu Berlin Retrieved: November 19, 2025 License: CC0 1.0 Universal ``` --- **Generated by**: OpenCode + MCP Wikidata Tools **Report Date**: November 19, 2025 **Report Version**: 1.0