glam/data/isil/nl/kb/linkml/README.md
2025-11-19 23:25:22 +01:00

343 lines
12 KiB
Markdown

# Dutch Library Network (KB Bnetwerk) ISIL Registry - LinkML Documentation
This directory contains LinkML schema documentation for the Dutch Library Network (KB Bnetwerk) ISIL registry conversion from CSV to YAML format.
## Overview
**Source**: [KB Library Network ISIL Codes](https://www.kb.nl/organisatie/bibliotheken-in-nederland/isil-codes)
**Records**: 153 Dutch library institutions
**Data Date**: April 1, 2025 (Stand 1 april 2025)
**Geographic Coverage**: 134 unique cities across the Netherlands
**Data Quality**: TIER_1_AUTHORITATIVE (confidence score: 1.0)
## Files in This Directory
### `schema.yaml`
LinkML schema definition documenting the structure of library ISIL registry records after conversion to HeritageCustodian format.
**Key classes**:
- `LibraryISILRecord` - Main record structure with CSV fields and LinkML mappings
- `Location` - Geographic location (city, country)
- `Identifier` - ISIL code structure (scheme, value, URL)
- `Provenance` - Data source and quality metadata
**Enumerations**:
- `InstitutionTypeEnum` - Always LIBRARY for this dataset
- `LibraryTypeEnum` - 5 specialized library types (national_library, public_library, etc.)
- `DataSourceEnum` - ISIL_REGISTRY
- `DataTierEnum` - TIER_1_AUTHORITATIVE
### `mapping.yaml`
Complete field-by-field mapping documentation showing how each CSV column transforms into LinkML YAML structure.
**Covers**:
- CSV structure (clean UTF-8, standard format)
- Field mappings with examples (4 CSV columns → LinkML attributes)
- Automated library type classification rules (5 types)
- Transformation rules (URL generation, description formatting)
- Data quality metrics (100% field preservation, 765 fields)
- Comparison with National Archive ISIL dataset
## Dataset Characteristics
### ISIL Code Format
- **Pattern**: `NL-XXXXXXXXXX` (10 digits after "NL-")
- **Length**: Uniform 13 characters
- **Encoding**: Numeric sequential (not semantic like National Archive)
- **Examples**:
- `NL-0100030000` - KB, nationale bibliotheek (Den Haag)
- `NL-0800070000` - OBA (Amsterdam)
- `NL-0700130000` - Zeeuwse Bibliotheken (Middelburg)
### Library Network Structure
| Library Type | Count | % | Description |
|--------------|-------|---|-------------|
| **Public Library** | 134 | 87.6% | Local community libraries |
| **Library Automation System (POI)** | 11 | 7.2% | Regional library consortia |
| **National Library Organization** | 5 | 3.3% | National service providers |
| **Provincial Library Organization** | 2 | 1.3% | Provincial coordination |
| **National Library** | 1 | 0.7% | KB (Koninklijke Bibliotheek) |
### Library Type Classification Rules
**Automated classification** using keyword matching:
#### 1. National Library (1)
- **Keyword**: `"KB, nationale bibliotheek"` in csv_naam_bibliotheek
- **Example**: KB, nationale bibliotheek (Den Haag)
#### 2. National Library Organization (5)
- **Keywords**: `"Bibliotheekservice"`, `"Bibliotheek Totaal"` in csv_naam_bibliotheek
- **Examples**:
- Bibliotheek Totaal (Zoetermeer)
- Bibliotheekservice Fryslân (Leeuwarden)
- Bibliotheekservice Overijssel (Deventer)
#### 3. Provincial Library Organization (2)
- **Keyword**: `"Provinciale"` in csv_naam_bibliotheek
- **Examples**:
- Provinciale Bibliotheekcentrale Zuid-Holland (Voorburg)
- Provinciale Bibliotheek Centrale Noord-Brabant (Eindhoven)
#### 4. Library Automation System (11)
- **Keyword**: `"POI"` in csv_opmerking
- **Meaning**: Publieksinformatievoorziening (shared library infrastructure)
- **Examples**:
- Zeeuwse Bibliotheken (Middelburg)
- FERS Friesland (Leeuwarden)
- Rijnbrink Gelderland (Deventer)
- Biblionet Groningen (Groningen)
- Bibliotheken Noord-Limburg (Venray)
#### 5. Public Library (134, default)
- **Rule**: All libraries not matching specialized categories
- **Examples**:
- OBA (Amsterdam)
- Bibliotheek Rotterdam (Rotterdam)
- Bibliotheek Utrecht (Utrecht)
### Data Completeness
| Field | Coverage | Notes |
|-------|----------|-------|
| Row number | 100% (153/153) | Generated during conversion |
| ISIL code | 100% (153/153) | All unique, no duplicates |
| Library name | 100% (153/153) | All library names present |
| City | 100% (153/153) | 134 unique cities |
| Remarks | 12.4% (19/153) | Library type indicators (POI, etc.) |
| Library type | 100% (153/153) | All classified automatically |
### Top Cities by Library Count
1. **Culemborg** - 2 libraries
- Bibliotheek Culemborg
- Bibliotheek Rivierenland
2. **Den Haag** - 2 libraries
- KB, nationale bibliotheek
- Bibliotheek Den Haag
**Note**: 132 out of 134 cities (98.5%) have exactly one library.
## CSV Parsing (Clean Structure)
Unlike the National Archive dataset, this CSV is well-formed:
### Clean CSV Structure
-**Encoding**: UTF-8 (standard)
-**Format**: Standard CSV with comma delimiters
-**Headers**: Clean column names without issues
-**Cells**: No malformed cells or nested delimiters
### Example Raw CSV Row
```csv
NL-0800070000,OBA,Amsterdam,
```
### After Parsing
```yaml
csv_row_number: 2
csv_isil_code: NL-0800070000
csv_naam_bibliotheek: OBA
csv_vestigingsplaats: Amsterdam
csv_opmerking: ""
```
## Conversion Process
### Input
```
/data/isil/nl/kb/20250401 Bnetwerk overzicht ISIL-codes Bibliotheken Nederland.csv
```
### Conversion Script
```
/scripts/convert_library_isil_csv_to_yaml.py
```
### Output
```
/data/isil/nl/kb/20250401_Bnetwerk_ISIL_Bibliotheken_Nederland.yaml
```
### Validation
- ✅ 153 records converted
- ✅ 765 fields preserved (100% preservation)
- ✅ 0 validation errors
- ✅ All ISIL codes match pattern `^NL-[0-9]{10}$`
- ✅ No duplicate ISIL codes
- ✅ All libraries classified into 5 types
## LinkML Schema Compliance
All converted records conform to the HeritageCustodian schema:
```yaml
- csv_row_number: 2
csv_isil_code: NL-0800070000
csv_naam_bibliotheek: OBA
csv_vestigingsplaats: Amsterdam
csv_opmerking: ""
name: OBA
institution_type: LIBRARY
library_type: public_library
locations:
- city: Amsterdam
country: NL
identifiers:
- identifier_scheme: ISIL
identifier_value: NL-0800070000
identifier_url: https://isil.org/NL-0800070000
description: "Library classification: public_library"
provenance:
data_source: ISIL_REGISTRY
data_tier: TIER_1_AUTHORITATIVE
extraction_date: "2025-11-17T12:42:48.430354+00:00"
extraction_method: "CSV to YAML conversion (KB Bnetwerk library ISIL codes)"
source_url: https://www.kb.nl/organisatie/bibliotheken-in-nederland/isil-codes
source_date: "Stand 1 april 2025"
confidence_score: 1.0
```
## Statistics
| Metric | Value |
|--------|-------|
| Total records | 153 |
| Total fields preserved | 765 (100%) |
| Unique cities | 134 |
| Unique ISIL codes | 153 (no duplicates) |
| Records with remarks | 19 (12.4%) |
| ISIL code length | 13 characters (uniform) |
| Library type classification | 100% coverage |
| Public libraries | 134 (87.6%) |
| POI systems | 11 (7.2%) |
| National services | 5 (3.3%) |
| Provincial organizations | 2 (1.3%) |
| National library | 1 (0.7%) |
## Comparison with National Archive Dataset
### ISIL Code Formats
| Aspect | National Archive | Library Network |
|--------|------------------|-----------------|
| **Format** | `NL-{CityAbbrev}{InstitutionAbbrev}` | `NL-XXXXXXXXXX` |
| **Example** | `NL-AsdRM` (Rijksmuseum) | `NL-0800070000` (OBA) |
| **Length** | 7-17 characters (variable) | 13 characters (uniform) |
| **Encoding** | Semantic (city + institution) | Numeric (sequential) |
| **Records** | 371 | 153 |
| **Overlap** | 0 codes | 0 codes |
### Combined Dutch ISIL Registry
- **National Archive**: 371 institutions (museums, archives, societies)
- **Library Network**: 153 libraries
- **Total**: 524 Dutch ISIL codes
- **Geographic overlap**: ~50 cities appear in both datasets
- **Code overlap**: 0 (completely complementary)
## Related Documentation
- **Conversion Report**: `/docs/LIBRARY_ISIL_CSV_TO_YAML_CONVERSION_REPORT.md`
- **Source CSV**: `/data/isil/nl/kb/20250401 Bnetwerk overzicht ISIL-codes Bibliotheken Nederland.csv`
- **Output YAML**: `/data/isil/nl/kb/20250401_Bnetwerk_ISIL_Bibliotheken_Nederland.yaml`
- **Conversion Script**: `/scripts/convert_library_isil_csv_to_yaml.py`
- **Main Schema**: `/schemas/heritage_custodian.yaml`
- **National Archive Comparison**: `/data/isil/nl/nan/linkml/README.md`
## Usage Examples
### Load YAML Data
```python
import yaml
with open('data/isil/nl/kb/20250401_Bnetwerk_ISIL_Bibliotheken_Nederland.yaml', 'r') as f:
records = yaml.safe_load(f)
print(f"Loaded {len(records)} libraries")
```
### Query by Library Type
```python
public_libraries = [r for r in records if r['library_type'] == 'public_library']
print(f"Public libraries: {len(public_libraries)}")
poi_systems = [r for r in records if r['library_type'] == 'library_automation_system']
print(f"POI systems: {len(poi_systems)}")
```
### Query by City
```python
amsterdam_libraries = [r for r in records if r['csv_vestigingsplaats'] == 'Amsterdam']
for lib in amsterdam_libraries:
print(f"- {lib['name']} ({lib['library_type']})")
```
### Extract National Infrastructure
```python
national_infrastructure = [
r for r in records
if r['library_type'] in [
'national_library',
'national_library_organization',
'library_automation_system'
]
]
print(f"National library infrastructure: {len(national_infrastructure)} institutions")
```
### SPARQL Query (Future RDF Export)
```sparql
PREFIX hc: <https://w3id.org/heritage/custodian/>
PREFIX isil: <https://isil.org/>
SELECT ?library ?name ?library_type WHERE {
?library hc:name ?name ;
hc:library_type ?library_type ;
hc:identifier ?id .
?id dcterms:identifier ?isil_code ;
dcterms:type "ISIL" .
FILTER(?library_type = "library_automation_system") # POI systems
}
```
## POI (Publieksinformatievoorziening) Systems
11 library automation systems provide shared infrastructure across regions:
| POI System | City | Region | ISIL Code |
|------------|------|--------|-----------|
| Zeeuwse Bibliotheken | Middelburg | Zeeland | NL-0700130000 |
| FERS Friesland | Leeuwarden | Friesland | NL-0702860000 |
| Rijnbrink Gelderland | Deventer | Gelderland | NL-0702870000 |
| Biblionet Groningen | Groningen | Groningen | NL-0702880000 |
| Bibliotheken Noord-Limburg | Venray | Noord-Limburg | NL-0703010000 |
| Biblioned Drenthe | Emmen | Drenthe | NL-0703040000 |
| Biblionetwerk Noord-Holland Noord | Alkmaar | NH Noord | NL-0703130000 |
| Bibliotheken Midden-Brabant | Tilburg | Midden-Brabant | NL-0704070000 |
| Biblioservice Fryslan | Drachten | Friesland | NL-0709640000 |
| Stichting Openbare Bibliotheek Utrecht | Utrecht | Utrecht | NL-0729460000 |
| Stichting Bibliotheken Leidse Regio | Leiden | Zuid-Holland | NL-0739260000 |
**Key insight**: POI systems serve as regional consortia providing shared catalog, automation, and discovery infrastructure for multiple public libraries.
## Future Work
1. **Geocoding**: Add latitude/longitude to Location objects using Nominatim API
2. **Wikidata Enrichment**: Link libraries to Wikidata entities (Q-numbers)
3. **Cross-linking**: Match with National Archive ISIL dataset (524 total Dutch ISIL codes)
4. **Holdings Data**: Integrate library holdings and collection information
5. **Network Analysis**: Map POI system memberships and library consortia relationships
6. **RDF Export**: Generate RDF/Turtle serialization for SPARQL querying
7. **Historical Tracking**: Document library mergers, closures, and name changes over time
## Contact
For questions about the library ISIL registry conversion or schema:
- **Data Source**: [KB Library Network ISIL](https://www.kb.nl/organisatie/bibliotheken-in-nederland/isil-codes)
- **Project**: GLAM Heritage Custodian Data Pipeline
- **Schema Version**: v0.2.1 (modular LinkML)
- **Data Date**: April 1, 2025 (Stand 1 april 2025)