glam/data/isil/germany/HARVEST_REPORT.md
2025-11-19 23:25:22 +01:00

11 KiB

German ISIL Database Harvest Report

Date: November 19, 2025
Harvester: OpenCode + MCP Wikidata Tools
Status: COMPLETE

Overview

Successfully harvested the complete German ISIL (International Standard Identifier for Libraries and Related Organizations) database from the Staatsbibliothek zu Berlin and Deutsche Nationalbibliothek.

Data Source

Coverage

The German ISIL database is the authoritative registry for heritage institutions in Germany with ISIL codes. It covers:

  • Libraries (Bibliotheken)

    • Public libraries
    • Academic libraries
    • Research libraries
    • Special libraries
  • Archives (Archive)

    • State archives
    • City archives
    • Corporate archives
    • Personal archives
  • Museums (Museen)

    • Art museums
    • History museums
    • Science museums
    • Technical museums
  • Related Organizations

    • Documentation centers
    • Research institutes with libraries
    • Heritage societies

Harvest Statistics

Total Records

  • 16,979 institutions with German ISIL codes (DE-*)

Data Completeness

  • 87.0% have street addresses (14,765 records)
  • 79.4% have URLs (13,483 records)
  • 79.1% have phone numbers (13,429 records)
  • 87.0% have geographic coordinates (14,771 records)
  • 37.8% have email addresses (6,420 records)

Geographic Distribution (Top 10 Regions)

  1. NRW (North Rhine-Westphalia): 1,503 institutions
  2. BAW (Baden-Württemberg): 1,295 institutions
  3. BAY (Bavaria): 1,204 institutions
  4. HES (Hesse): 659 institutions
  5. BER (Berlin): 614 institutions
  6. NIE (Lower Saxony): 598 institutions
  7. HAM (Hamburg): 450 institutions
  8. SAX (Saxony): 397 institutions
  9. SAA (Saxony-Anhalt): 308 institutions
  10. THU (Thuringia): 249 institutions

Note: 9,654 records (56.9%) have no interloan region code assigned.

Institution Types

The database uses institutional codes (e.g., SBBPK, TUM, UBM) rather than standardized type classifications. The most common codes include:

  • University libraries (UB-*): 50+ institutions
  • State libraries: 10+ institutions
  • Max Planck Institute libraries (MPI-*): 20+ institutions
  • Federal agency libraries (B-*): 15+ institutions
  • University of Applied Sciences libraries (FH-*): 80+ institutions

Note: 16,408 records (96.6%) have no institution type code in the database.

Output Files

1. Complete Dataset (JSON)

File: german_isil_complete_20251119_134939.json
Size: 37 MB
Format: Structured JSON with metadata header

Structure:

{
  "metadata": {
    "source": "German ISIL Database (Staatsbibliothek zu Berlin)",
    "harvest_date": "2025-11-19T12:49:39Z",
    "total_records": 16979,
    "license": "CC0 1.0 Universal"
  },
  "records": [...]
}

2. Complete Dataset (JSONL)

File: german_isil_complete_20251119_134939.jsonl
Size: 24 MB
Format: JSON Lines (one record per line)

Use case: Stream processing, database imports, line-by-line analysis

3. Statistics Summary

File: german_isil_stats_20251119_134941.json
Size: 7.6 KB
Format: JSON

Contains:

  • Total record counts
  • Data completeness percentages
  • Institution type distribution
  • Geographic distribution by interloan region

Record Schema

Each record contains:

{
  "isil": "DE-1",                          // ISIL identifier
  "name": "Staatsbibliothek zu Berlin",   // Official name
  "alternative_names": [],                 // Alternative forms
  "institution_type": "SBBPK",            // Institution code (optional)
  "address": {
    "street": "Unter den Linden 8",
    "city": "Berlin",
    "postal_code": "10117",
    "country": "DE",
    "region": "Berlin",
    "latitude": "52.51755",
    "longitude": "13.39162"
  },
  "contact": {
    "phone": "+49-30-2 66-433888",
    "fax": "+49-30-2 66-333701",
    "email": "info@sbb.spk-berlin.de"
  },
  "urls": [
    {
      "url": "http://staatsbibliothek-berlin.de",
      "type": "A",
      "label": null
    }
  ],
  "parent_org": null,                      // Parent institution (if branch)
  "interloan_region": "BER",              // Interloan region code
  "notes": "...",                          // Collection descriptions, etc.
  "raw_pica": {...}                        // Full PICA+ data structure
}

Data Quality

Strengths

Authoritative source - Official German ISIL registry
Complete coverage - All German ISIL-registered institutions
High geographic precision - 87% have coordinates
Rich contact data - Phone, email, URLs for most records
Well-structured addresses - Standardized format
Public domain license - No restrictions on reuse

Limitations

⚠️ Limited type classification - 96.6% have no institution type code
⚠️ No English translations - Names and descriptions in German only
⚠️ Incomplete interloan data - 56.9% have no region assignment
⚠️ Email coverage - Only 37.8% have email addresses
⚠️ Historical data - No founding dates or closure dates

API Access Methods

The German ISIL database offers three API access methods:

1. SRU (Search/Retrieve via URL)

Endpoint: https://services.dnb.de/sru/bib
Protocol: SRU 1.1
Formats: PicaPlus-XML, RDF/XML
Query Language: CQL (Common Query Language)

Example:

curl "https://services.dnb.de/sru/bib?version=1.1&operation=searchRetrieve&query=isil%3DDE-1&recordSchema=PicaPlus-xml&maximumRecords=1"

2. JSON-API

Endpoint: https://isil.staatsbibliothek-berlin.de/api/org.jsonld
Format: JSON-LD
Query Language: CQL

Example:

curl "https://isil.staatsbibliothek-berlin.de/api/org.jsonld?q=ort%3DBerlin&size=10"

3. Linked Data Service

Endpoint: https://ld.zdb-services.de/resource/organisations/
Formats: RDF/XML, Turtle, JSON-LD, HTML
Protocol: Content negotiation (303 redirects)

Example:

curl -H "Accept: application/rdf+xml" "https://ld.zdb-services.de/resource/organisations/DE-1"

Integration with GLAM Project

  1. Parse and Convert to LinkML

    • Map PICA+ fields to HeritageCustodian schema
    • Classify institutions using GLAMORCUBESFIXPHDNT taxonomy
    • Assign data tier: TIER_1_AUTHORITATIVE
  2. Enrich with Wikidata

    • Query Wikidata for matching Q-numbers
    • Add founding dates, collection information
    • Link to parent organizations
  3. Cross-reference with Other Sources

    • Compare with International ISIL Registry
    • Link to Museum Digital (museum-digital.de)
    • Connect with Archive Portal Germany (archivportal-d.de)
  4. Generate GHCIDs

    • Create persistent identifiers for each institution
    • Use format: DE-[REGION]-[CITY]-[TYPE]-[ABBR]
    • Link ISIL codes as Identifier records

Harvester Implementation

Script: scripts/scrapers/harvest_german_isil_sru.py

Features:

  • Batch processing (100 records per request)
  • Rate limiting (1 second delay between requests)
  • Automatic retry on failure (3 attempts)
  • Progress tracking
  • Error handling
  • Multiple output formats (JSON, JSONL)
  • Complete PICA+ field parsing

Performance:

  • Total time: ~3 minutes
  • Records/second: ~94
  • Requests: 170 (batch size 100)
  • No errors or failed requests

Example Records

Example 1: Staatsbibliothek zu Berlin

{
  "isil": "DE-1",
  "name": "Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Unter den Linden",
  "alternative_names": ["Berlin SBB Haus Unter d.Linden"],
  "institution_type": "SBBPK",
  "address": {
    "street": "Unter den Linden 8",
    "city": "Berlin",
    "postal_code": "10117",
    "country": "DE",
    "region": "Berlin",
    "latitude": "52.51755",
    "longitude": "13.39162"
  }
}

Example 2: Stadtarchiv Augsburg

{
  "isil": "DE-Aug9",
  "name": "Stadtarchiv Augsburg",
  "alternative_names": ["Augsburg Stadtarchiv"],
  "address": {
    "street": "Zur Kammgarnspinnerei 11",
    "city": "Augsburg",
    "postal_code": "86153",
    "country": "DE",
    "region": "Bayern",
    "latitude": "48.36337",
    "longitude": "10.91350"
  },
  "interloan_region": "BAY"
}

Comparison with Other National ISIL Registries

Country Registry Records API Coverage
Germany Staatsbibliothek zu Berlin 16,979 SRU, JSON, Linked Data Comprehensive
Netherlands KB ~1,400 CSV Libraries, archives
Austria OBVSG ~3,000 Search Libraries only
Switzerland Swiss National Library ~1,500 Search Libraries, archives
France ABES ~5,000 API Academic libraries
UK British Library ~4,000 ⚠️ (cyber attack) Libraries

Germany has the largest and most comprehensive ISIL registry in Europe.

References

Documentation

Standards

  • ISO 15511:2019 - International Standard Identifier for Libraries
  • SRU 1.1 - Search/Retrieve via URL (Library of Congress)
  • PICA+ - Library cataloging format (OCLC)
  • CC0 1.0 Universal - Public Domain Dedication

Contact

License

The harvested data is licensed under CC0 1.0 Universal (Public Domain Dedication).

You are free to:

  • Copy, modify, distribute the data
  • Use for commercial purposes
  • Use without attribution (though attribution is appreciated)

Attribution (optional but recommended):

Data source: German ISIL Database, Staatsbibliothek zu Berlin
Retrieved: November 19, 2025
License: CC0 1.0 Universal

Generated by: OpenCode + MCP Wikidata Tools
Report Date: November 19, 2025
Report Version: 1.0