| .. | ||
| mapping.yaml | ||
| README.md | ||
| schema.yaml | ||
Dutch National Archive ISIL Registry - LinkML Documentation
This directory contains LinkML schema documentation for the Dutch National Archive ISIL registry conversion from CSV to YAML format.
Overview
Source: Nationaal Archief ISIL Registry
Records: 371 Dutch heritage institutions
Date Range: 2008-10-10 to 2025-09-18
Geographic Coverage: 201 unique cities across the Netherlands
Data Quality: TIER_1_AUTHORITATIVE (confidence score: 1.0)
Files in This Directory
schema.yaml
LinkML schema definition documenting the structure of ISIL registry records after conversion to HeritageCustodian format.
Key classes:
ISILRegistryRecord- Main record structure with CSV fields and LinkML mappingsLocation- Geographic location (city, country)Identifier- ISIL code structure (scheme, value, URL, assigned_date)Provenance- Data source and quality metadata
Enumerations:
InstitutionTypeEnum- Heritage institution types (GALLERY, LIBRARY, ARCHIVE, MUSEUM, etc.)DataSourceEnum- ISIL_REGISTRYDataTierEnum- TIER_1_AUTHORITATIVE
mapping.yaml
Complete field-by-field mapping documentation showing how each CSV column transforms into LinkML YAML structure.
Covers:
- CSV structure and parsing challenges (latin-1 encoding, malformed cells)
- Field mappings with examples (6 CSV columns → LinkML attributes)
- Transformation rules (date parsing, URL generation, description formatting)
- Data quality metrics (100% field preservation, 2,226 fields)
- Organizational change event detection (18 records with merger/closure notes)
Dataset Characteristics
ISIL Code Format
- Pattern:
NL-{CityAbbrev}{InstitutionAbbrev} - Length: Variable (7-17 characters)
- Encoding: Semantic (city + institution abbreviations)
- Examples:
NL-AsdRM- Rijksmuseum (Amsterdam)NL-HaNa- Nationaal Archief (Den Haag)NL-LlsBatavialand- Batavialand (Lelystad)
Data Completeness
| Field | Coverage | Notes |
|---|---|---|
| Row number | 100% (371/371) | Sequential 1-371 |
| City (Plaats) | 100% (371/371) | 201 unique cities |
| Institution (Instelling) | 100% (371/371) | All institution names present |
| ISIL code | 100% (371/371) | All unique, no duplicates |
| Assignment date (Toegekend op) | ~95% | Most have dates, some empty |
| Remarks (Opmerking) | 4.9% (18/371) | Organizational history notes |
Top Cities by Institution Count
- Den Haag - 38 institutions (10.2%)
- Amsterdam - 29 institutions (7.8%)
- Deventer - 11 institutions (3.0%)
- Groningen - 10 institutions (2.7%)
Organizational Change Events
18 records (4.9%) contain organizational history in the csv_opmerking field:
Event types detected:
- MERGER: "fusie tussen", "samenvoeging"
- Example: RHCL-Rijckheyt merger (2020)
- NAME_CHANGE: "naamswijziging", "hernoemd"
- CLOSURE: "in onbruik", "gesloten"
- RELOCATION: "verhuisd naar", "overgebracht naar"
Future processing: These remarks can be extracted as structured ChangeEvent objects in the HeritageCustodian schema.
CSV Parsing Challenges
The original CSV file had several issues requiring custom parsing:
Encoding
- Issue: File uses
latin-1encoding (not UTF-8) - Solution:
encoding='latin-1'parameter in file reader
Malformed Structure
- Issue: All fields stored in single CSV cell with
","delimiter - Solution: Split on
","pattern, strip quotes and semicolons
Header Row
- Issue: Contains sequence number as first field before actual headers
- Solution: Extract headers from indices 1-5, skip index 0
Example Raw CSV Row
"1","Amsterdam","Rijksmuseum","NL-AsdRM","2013-03-07","";"
After Parsing
csv_row_number: 1
csv_plaats: Amsterdam
csv_instelling: Rijksmuseum
csv_isil_code: NL-AsdRM
csv_toegekend_op: "2013-03-07"
csv_opmerking: ""
Conversion Process
Input
/data/isil/nl/nan/ISIL-codes_2025-11-06.csv
Conversion Script
/scripts/convert_isil_csv_to_yaml.py
Output
/data/isil/nl/nan/ISIL-codes_2025-11-06.yaml
Validation
- ✅ 371 records converted
- ✅ 2,226 fields preserved (100% preservation)
- ✅ 0 validation errors
- ✅ All ISIL codes match pattern
^NL-[A-Za-z0-9]+ - ✅ All dates parse as ISO 8601 (YYYY-MM-DD)
- ✅ No duplicate ISIL codes
LinkML Schema Compliance
All converted records conform to the HeritageCustodian schema:
- id: https://w3id.org/heritage/custodian/nl/{slug}
name: {csv_instelling}
institution_type: {ARCHIVE|MUSEUM|LIBRARY|...} # Requires classification
locations:
- city: {csv_plaats}
country: NL
identifiers:
- identifier_scheme: ISIL
identifier_value: {csv_isil_code}
identifier_url: https://isil.org/{csv_isil_code}
assigned_date: {csv_toegekend_op}
description: "Opmerking: {csv_opmerking}" # If present
provenance:
data_source: ISIL_REGISTRY
data_tier: TIER_1_AUTHORITATIVE
extraction_date: {timestamp}
extraction_method: "CSV to YAML conversion (National Archive ISIL codes)"
source_url: https://www.nationaalarchief.nl/isil
confidence_score: 1.0
Statistics
| Metric | Value |
|---|---|
| Total records | 371 |
| Total fields preserved | 2,226 (100%) |
| Unique cities | 201 |
| Unique ISIL codes | 371 (no duplicates) |
| Records with remarks | 18 (4.9%) |
| ISIL code length (min) | 7 characters |
| ISIL code length (max) | 17 characters |
| ISIL code length (mean) | 10.3 characters |
| Earliest assignment date | 2008-10-10 |
| Latest assignment date | 2025-09-18 |
Related Documentation
- Conversion Report:
/docs/ISIL_CSV_TO_YAML_CONVERSION_REPORT.md - Source CSV:
/data/isil/nl/nan/ISIL-codes_2025-11-06.csv - Output YAML:
/data/isil/nl/nan/ISIL-codes_2025-11-06.yaml - Conversion Script:
/scripts/convert_isil_csv_to_yaml.py - Main Schema:
/schemas/heritage_custodian.yaml
Usage Examples
Load YAML Data
import yaml
with open('data/isil/nl/nan/ISIL-codes_2025-11-06.yaml', 'r') as f:
records = yaml.safe_load(f)
print(f"Loaded {len(records)} institutions")
Query by City
amsterdam_records = [r for r in records if r['csv_plaats'] == 'Amsterdam']
print(f"Amsterdam has {len(amsterdam_records)} institutions")
Extract Change Events
change_events = [
r for r in records
if r.get('csv_opmerking') and any(
keyword in r['csv_opmerking'].lower()
for keyword in ['fusie', 'naamswijziging', 'in onbruik']
)
]
print(f"Found {len(change_events)} institutions with organizational changes")
SPARQL Query (Future RDF Export)
PREFIX hc: <https://w3id.org/heritage/custodian/>
PREFIX isil: <https://isil.org/>
SELECT ?institution ?name ?isil_code WHERE {
?institution hc:name ?name ;
hc:identifier ?id .
?id dcterms:identifier ?isil_code ;
dcterms:type "ISIL" .
FILTER(STRSTARTS(?isil_code, "NL-Asd")) # Amsterdam institutions
}
Future Work
- Institution Type Classification: Assign institution_type (ARCHIVE, MUSEUM, LIBRARY) using NLP or manual review
- Change Event Extraction: Parse organizational history from csv_opmerking into structured ChangeEvent objects
- Geocoding: Add latitude/longitude to Location objects using Nominatim API
- Wikidata Enrichment: Link institutions to Wikidata entities (Q-numbers)
- Cross-linking: Match with KB library ISIL dataset (524 total Dutch ISIL codes)
- RDF Export: Generate RDF/Turtle serialization for SPARQL querying
Contact
For questions about the ISIL registry conversion or schema:
- Data Source: Nationaal Archief ISIL
- Project: GLAM Heritage Custodian Data Pipeline
- Schema Version: v0.2.1 (modular LinkML)