glam/data/isil/nl/nan/linkml
2025-11-19 23:25:22 +01:00
..
mapping.yaml add isil entries 2025-11-19 23:25:22 +01:00
README.md add isil entries 2025-11-19 23:25:22 +01:00
schema.yaml add isil entries 2025-11-19 23:25:22 +01:00

Dutch National Archive ISIL Registry - LinkML Documentation

This directory contains LinkML schema documentation for the Dutch National Archive ISIL registry conversion from CSV to YAML format.

Overview

Source: Nationaal Archief ISIL Registry
Records: 371 Dutch heritage institutions
Date Range: 2008-10-10 to 2025-09-18
Geographic Coverage: 201 unique cities across the Netherlands
Data Quality: TIER_1_AUTHORITATIVE (confidence score: 1.0)

Files in This Directory

schema.yaml

LinkML schema definition documenting the structure of ISIL registry records after conversion to HeritageCustodian format.

Key classes:

  • ISILRegistryRecord - Main record structure with CSV fields and LinkML mappings
  • Location - Geographic location (city, country)
  • Identifier - ISIL code structure (scheme, value, URL, assigned_date)
  • Provenance - Data source and quality metadata

Enumerations:

  • InstitutionTypeEnum - Heritage institution types (GALLERY, LIBRARY, ARCHIVE, MUSEUM, etc.)
  • DataSourceEnum - ISIL_REGISTRY
  • DataTierEnum - TIER_1_AUTHORITATIVE

mapping.yaml

Complete field-by-field mapping documentation showing how each CSV column transforms into LinkML YAML structure.

Covers:

  • CSV structure and parsing challenges (latin-1 encoding, malformed cells)
  • Field mappings with examples (6 CSV columns → LinkML attributes)
  • Transformation rules (date parsing, URL generation, description formatting)
  • Data quality metrics (100% field preservation, 2,226 fields)
  • Organizational change event detection (18 records with merger/closure notes)

Dataset Characteristics

ISIL Code Format

  • Pattern: NL-{CityAbbrev}{InstitutionAbbrev}
  • Length: Variable (7-17 characters)
  • Encoding: Semantic (city + institution abbreviations)
  • Examples:
    • NL-AsdRM - Rijksmuseum (Amsterdam)
    • NL-HaNa - Nationaal Archief (Den Haag)
    • NL-LlsBatavialand - Batavialand (Lelystad)

Data Completeness

Field Coverage Notes
Row number 100% (371/371) Sequential 1-371
City (Plaats) 100% (371/371) 201 unique cities
Institution (Instelling) 100% (371/371) All institution names present
ISIL code 100% (371/371) All unique, no duplicates
Assignment date (Toegekend op) ~95% Most have dates, some empty
Remarks (Opmerking) 4.9% (18/371) Organizational history notes

Top Cities by Institution Count

  1. Den Haag - 38 institutions (10.2%)
  2. Amsterdam - 29 institutions (7.8%)
  3. Deventer - 11 institutions (3.0%)
  4. Groningen - 10 institutions (2.7%)

Organizational Change Events

18 records (4.9%) contain organizational history in the csv_opmerking field:

Event types detected:

  • MERGER: "fusie tussen", "samenvoeging"
    • Example: RHCL-Rijckheyt merger (2020)
  • NAME_CHANGE: "naamswijziging", "hernoemd"
  • CLOSURE: "in onbruik", "gesloten"
  • RELOCATION: "verhuisd naar", "overgebracht naar"

Future processing: These remarks can be extracted as structured ChangeEvent objects in the HeritageCustodian schema.

CSV Parsing Challenges

The original CSV file had several issues requiring custom parsing:

Encoding

  • Issue: File uses latin-1 encoding (not UTF-8)
  • Solution: encoding='latin-1' parameter in file reader

Malformed Structure

  • Issue: All fields stored in single CSV cell with "," delimiter
  • Solution: Split on "," pattern, strip quotes and semicolons

Header Row

  • Issue: Contains sequence number as first field before actual headers
  • Solution: Extract headers from indices 1-5, skip index 0

Example Raw CSV Row

"1","Amsterdam","Rijksmuseum","NL-AsdRM","2013-03-07","";"

After Parsing

csv_row_number: 1
csv_plaats: Amsterdam
csv_instelling: Rijksmuseum
csv_isil_code: NL-AsdRM
csv_toegekend_op: "2013-03-07"
csv_opmerking: ""

Conversion Process

Input

/data/isil/nl/nan/ISIL-codes_2025-11-06.csv

Conversion Script

/scripts/convert_isil_csv_to_yaml.py

Output

/data/isil/nl/nan/ISIL-codes_2025-11-06.yaml

Validation

  • 371 records converted
  • 2,226 fields preserved (100% preservation)
  • 0 validation errors
  • All ISIL codes match pattern ^NL-[A-Za-z0-9]+
  • All dates parse as ISO 8601 (YYYY-MM-DD)
  • No duplicate ISIL codes

LinkML Schema Compliance

All converted records conform to the HeritageCustodian schema:

- id: https://w3id.org/heritage/custodian/nl/{slug}
  name: {csv_instelling}
  institution_type: {ARCHIVE|MUSEUM|LIBRARY|...}  # Requires classification
  locations:
    - city: {csv_plaats}
      country: NL
  identifiers:
    - identifier_scheme: ISIL
      identifier_value: {csv_isil_code}
      identifier_url: https://isil.org/{csv_isil_code}
      assigned_date: {csv_toegekend_op}
  description: "Opmerking: {csv_opmerking}"  # If present
  provenance:
    data_source: ISIL_REGISTRY
    data_tier: TIER_1_AUTHORITATIVE
    extraction_date: {timestamp}
    extraction_method: "CSV to YAML conversion (National Archive ISIL codes)"
    source_url: https://www.nationaalarchief.nl/isil
    confidence_score: 1.0

Statistics

Metric Value
Total records 371
Total fields preserved 2,226 (100%)
Unique cities 201
Unique ISIL codes 371 (no duplicates)
Records with remarks 18 (4.9%)
ISIL code length (min) 7 characters
ISIL code length (max) 17 characters
ISIL code length (mean) 10.3 characters
Earliest assignment date 2008-10-10
Latest assignment date 2025-09-18
  • Conversion Report: /docs/ISIL_CSV_TO_YAML_CONVERSION_REPORT.md
  • Source CSV: /data/isil/nl/nan/ISIL-codes_2025-11-06.csv
  • Output YAML: /data/isil/nl/nan/ISIL-codes_2025-11-06.yaml
  • Conversion Script: /scripts/convert_isil_csv_to_yaml.py
  • Main Schema: /schemas/heritage_custodian.yaml

Usage Examples

Load YAML Data

import yaml

with open('data/isil/nl/nan/ISIL-codes_2025-11-06.yaml', 'r') as f:
    records = yaml.safe_load(f)

print(f"Loaded {len(records)} institutions")

Query by City

amsterdam_records = [r for r in records if r['csv_plaats'] == 'Amsterdam']
print(f"Amsterdam has {len(amsterdam_records)} institutions")

Extract Change Events

change_events = [
    r for r in records 
    if r.get('csv_opmerking') and any(
        keyword in r['csv_opmerking'].lower() 
        for keyword in ['fusie', 'naamswijziging', 'in onbruik']
    )
]
print(f"Found {len(change_events)} institutions with organizational changes")

SPARQL Query (Future RDF Export)

PREFIX hc: <https://w3id.org/heritage/custodian/>
PREFIX isil: <https://isil.org/>

SELECT ?institution ?name ?isil_code WHERE {
  ?institution hc:name ?name ;
               hc:identifier ?id .
  ?id dcterms:identifier ?isil_code ;
      dcterms:type "ISIL" .
  FILTER(STRSTARTS(?isil_code, "NL-Asd"))  # Amsterdam institutions
}

Future Work

  1. Institution Type Classification: Assign institution_type (ARCHIVE, MUSEUM, LIBRARY) using NLP or manual review
  2. Change Event Extraction: Parse organizational history from csv_opmerking into structured ChangeEvent objects
  3. Geocoding: Add latitude/longitude to Location objects using Nominatim API
  4. Wikidata Enrichment: Link institutions to Wikidata entities (Q-numbers)
  5. Cross-linking: Match with KB library ISIL dataset (524 total Dutch ISIL codes)
  6. RDF Export: Generate RDF/Turtle serialization for SPARQL querying

Contact

For questions about the ISIL registry conversion or schema:

  • Data Source: Nationaal Archief ISIL
  • Project: GLAM Heritage Custodian Data Pipeline
  • Schema Version: v0.2.1 (modular LinkML)