glam/docs/LIBRARY_ISIL_CSV_TO_YAML_CONVERSION_REPORT.md
2025-11-19 23:25:22 +01:00

10 KiB

Library ISIL CSV to YAML Conversion Report

Date: 2025-11-17
Input: /data/isil/nl/kb/20250401 Bnetwerk overzicht ISIL-codes Bibliotheken Nederland.csv
Output: /data/isil/nl/kb/20250401_Bnetwerk_ISIL_Bibliotheken_Nederland.yaml
Script: /scripts/convert_library_isil_csv_to_yaml.py
Data Date: Stand 1 april 2025 (As of April 1, 2025)


Conversion Summary

Records Processed

  • Total records: 153 Dutch library ISIL codes
  • Field preservation: 100% (765 fields preserved exactly)
  • Value mismatches: 0 (perfect fidelity)

CSV Structure (Original)

Cleaner structure compared to national archive ISIL CSV:

  • UTF-8 encoding with BOM ( standard)
  • Semicolon delimiter
  • 3 metadata header rows (title, date, blank)
  • Column headers (row 4)
  • Data rows (rows 5+)

Fields:

  1. ISIL-code (NL-XXXXXXXXXX format)
  2. Naam bibliotheek (library name)
  3. Vestigingsplaats (city/location)
  4. Opmerking (remarks/classification)

YAML Structure (Output)

Each record contains:

CSV Fields (preserved exactly):

  • csv_row_number: Original row number
  • csv_isil_code: ISIL identifier (13-character numeric code)
  • csv_naam_bibliotheek: Library name
  • csv_vestigingsplaats: City/location
  • csv_opmerking: Remarks (19 records have remarks)

LinkML Mapped Fields:

  • name: Library name (mapped from csv_naam_bibliotheek)
  • institution_type: LIBRARY (all records)
  • locations: List with city and country (NL)
  • identifiers: ISIL identifier with scheme, value, URL
  • library_type: Classification based on remarks
  • description: Library type description
  • provenance: Data source metadata (TIER_1_AUTHORITATIVE)

Data Quality Findings

Geographic Distribution

  • Unique cities: 134 across Netherlands
  • Top cities:
    1. Deventer: 5 libraries (Rijnbrink systems)
    2. Den Haag: 4 libraries (KB + national orgs)
    3. Groningen: 3 libraries (Biblionet + POI)
    4. Assen: 3 libraries (Biblionet Drenthe)

Library Type Classification

Automated classification based on opmerking field:

Type Count % Description
public_library 134 87.6% Regular public libraries (no special classification)
library_automation_system 11 7.2% POI (Public Online Information) systems
national_library_organization 5 3.3% National library organizations (Muziekweb, SDI, etc.)
provincial_library_organization 2 1.3% Provincial Musidesk systems
national_library 1 0.7% KB (Koninklijke Bibliotheek)

ISIL Code Patterns

Uniform Structure:

  • All codes: exactly 13 characters (NL-XXXXXXXXXX)
  • Format: NL- + 10 digits
  • All codes start with NL-
  • All characters after prefix are numeric
  • No duplicates (153 unique codes)

Examples by Type:

  • National Library: NL-0100030000 (KB)
  • Public Library: NL-0800070000 (OBA Amsterdam)
  • Automation System: NL-0700130000 (Zeeuwse Bibliotheken POI)
  • National Org: NL-0735650000 (Muziekweb)

Remarks Field Analysis

19 libraries (12.4%) have remarks documenting:

National Organizations (5 libraries):

  • Muziekweb, Coöperatie SDI, online bibliotheek, Passend Lezen, Dedicon
  • Classification: "landelijke bibliotheekorganisatie"

Provincial Organizations (2 libraries):

  • Rijnbrink Musidesk Gelderland, Rijnbrink Musidesk Overijssel
  • Classification: "provinciale bibliotheekorganisatie"

POI Systems (11 libraries):

  • Library automation/consortium systems
  • Examples: Zeeuwse Bibliotheken, FERS Friesland, Rijnbrink, Cubiss, Probiblio, BiSC, Biblionet
  • Classification: "POI" (Public Online Information)

National Library (1 library):

  • KB (Koninklijke Bibliotheek)
  • Classification: "KB, Nationale Bibliotheek"

LinkML Schema Compliance

Required Fields

All 153 records contain:

  • name (library name)
  • institution_type (LIBRARY)
  • locations (city + country)
  • identifiers (ISIL code details)
  • library_type (automated classification)
  • provenance (data source metadata)

Identifier Structure

Each ISIL identifier includes:

identifiers:
  - identifier_scheme: ISIL
    identifier_value: NL-0800070000
    identifier_url: https://isil.org/NL-0800070000

Provenance Metadata

All records marked as:


Library Network Structure

National Level (6 organizations)

  1. KB, nationale bibliotheek (Den Haag) - National library
  2. Muziekweb (Rotterdam) - Music library service
  3. Coöperatie SDI (Utrecht) - Digital library cooperation
  4. online bibliotheek (Den Haag) - Online library platform
  5. Passend Lezen (Den Haag) - Accessible reading service
  6. Dedicon (Grave) - Audio book production

Provincial Level (2 organizations)

  1. Rijnbrink Musidesk Gelderland (Deventer)
  2. Rijnbrink Musidesk Overijssel (Deventer)

Library Automation Systems (11 POI systems)

Regional consortia providing shared library management systems:

  • Zeeland: Zeeuwse Bibliotheken (Middelburg)
  • Friesland: FERS Friesland (Leeuwarden)
  • Gelderland: Rijnbrink Gelderland (Deventer)
  • Groningen: Biblionet Groningen + POI variant
  • Limburg: Cubiss Limburg (Heerlen)
  • Noord-Brabant: Cubiss Noord-Brabant (Tilburg)
  • Noord-Holland: Probiblio (Hoofddorp)
  • Overijssel: Rijnbrink Overijssel (Deventer)
  • Utrecht: BiSC Utrecht (Houten)
  • Drenthe: Biblionet Drenthe + POI variant
  • Flevoland: Bibliotheeknetwerk Flevoland (Lelystad)

Public Libraries (134 organizations)

Major city libraries include:

  • OBA (Amsterdam)
  • Bibliotheek Rotterdam
  • Bibliotheek Den Haag
  • Rozet (Arnhem)
  • Bibliotheek Schiedam
  • And 129 other municipal/regional libraries

Validation Results

Field Preservation Test

Total records:       153
Total fields:        765
Fields preserved:    765
Value mismatches:    0
Preservation rate:   100.0%

VALIDATION PASSED

LinkML Schema Compliance

All required fields present
All CSV fields preserved
Institution type set to LIBRARY
Library type classification applied
No data loss during conversion
YAML structure valid


Key Differences from National Archive ISIL Dataset

Aspect National Archive ISIL Library ISIL
Records 371 153
ISIL Format Variable (7-17 chars) Uniform (13 chars)
ISIL Pattern NL-{City}{Abbrev} NL-XXXXXXXXXX
Institution Types Mixed (archives, museums) Libraries only
Encoding Latin-1 (problematic) UTF-8 (clean)
CSV Structure Malformed (quotes/semicolons) Clean (standard)
Remarks 18 records (4.9%) 19 records (12.4%)
Organization Types N/A Classified (5 types)

Use Cases

This YAML file can be used for:

  1. Library Network Mapping: Understand Dutch public library infrastructure
  2. Automation System Analysis: Track which libraries use which POI systems
  3. Service Coverage: Map library services by region/province
  4. Data Integration: Merge with NDE dataset or national archive ISIL codes
  5. LinkML Validation: Test schema compliance with library registry data
  6. Collection Management: Link library collections to authoritative ISIL codes

Insights and Patterns

Decentralization

Dutch public libraries operate through 11 regional automation consortia (POI systems), showing strong provincial/regional organization rather than a single national system.

Key Players

  • Rijnbrink dominates eastern Netherlands (Gelderland, Overijssel)
  • Cubiss serves southern provinces (Limburg, Noord-Brabant)
  • Biblionet serves northern provinces (Groningen, Drenthe)

National Services

KB (national library) coordinates national-level services:

  • Digital collections (online bibliotheek)
  • Accessible reading (Passend Lezen, Dedicon)
  • Specialized collections (Muziekweb)
  • Shared infrastructure (Coöperatie SDI)

ISIL Code Assignment Pattern

Numeric codes suggest sequential assignment rather than semantic encoding:

  • KB: NL-0100030000 (low number = early assignment)
  • Recent libraries likely have higher numbers
  • Different from archive ISIL codes which encode city/institution

Next Steps

Data Enrichment

  • Geocode city names to latitude/longitude
  • Add library website URLs
  • Cross-link with NDE organization dataset
  • Query Wikidata for Q-numbers
  • Link libraries to their POI systems (parent-child relationships)

Analysis

  • Map library coverage by municipality
  • Analyze POI system membership
  • Identify gaps in library service coverage
  • Compare with population density data

Integration

  • Merge with national archive ISIL dataset (371 records)
  • Create unified Dutch heritage custodian registry
  • Generate GHCID identifiers
  • Link to museum/archive records where libraries share buildings

Files Created

Data

  • /data/isil/nl/kb/20250401_Bnetwerk_ISIL_Bibliotheken_Nederland.yaml (3,577 lines, 153 records)

Scripts

  • /scripts/convert_library_isil_csv_to_yaml.py (conversion + validation + classification)

Documentation

  • /docs/LIBRARY_ISIL_CSV_TO_YAML_CONVERSION_REPORT.md (this file)

Technical Notes

Automated Library Classification

The script classifies libraries into 5 types based on keyword matching in the opmerking field:

if 'landelijke bibliotheekorganisatie' in remark:
    return 'national_library_organization'
elif 'provinciale bibliotheekorganisatie' in remark:
    return 'provincial_library_organization'
elif 'poi' in remark.lower():
    return 'library_automation_system'
elif 'nationale bibliotheek' in remark:
    return 'national_library'
else:
    return 'public_library'  # default

This classification helps distinguish organizational hierarchy and service types.

Performance

  • Parsing: ~0.05 seconds
  • Mapping: ~0.1 seconds
  • Classification: ~0.05 seconds
  • Validation: ~0.05 seconds
  • YAML write: ~0.3 seconds
  • Total time: < 0.6 seconds

YAML Generation

Used PyYAML with identical settings to national archive conversion for consistency.


Status: Conversion complete
Quality: 100% field preservation
Classification: Automated library type classification applied
Ready for: Data enrichment, integration, and network analysis