6.4 KiB
ISIL CSV to YAML Conversion Report
Date: 2025-11-17
Input: /data/isil/nl/nan/ISIL-codes_2025-11-06.csv
Output: /data/isil/nl/nan/ISIL-codes_2025-11-06.yaml
Script: /scripts/convert_isil_csv_to_yaml.py
Conversion Summary
Records Processed
- Total records: 371 Dutch ISIL codes
- Field preservation: 100% (2,226 fields preserved exactly)
- Value mismatches: 0 (perfect fidelity)
CSV Structure (Original)
The input CSV had a malformed structure:
- All fields contained in single cell separated by
"," - Extra trailing semicolons (;;;;)
- Latin-1 encoding (not UTF-8)
- Header includes sequence number as first field
Fields:
- Row number (sequence)
- Plaats (city)
- Instelling (institution name)
- ISIL code
- Toegekend op (assigned date)
- Opmerking (remarks)
YAML Structure (Output)
Each record contains:
CSV Fields (preserved exactly):
csv_row_number: Original row numbercsv_plaats: City namecsv_instelling: Institution namecsv_isil_code: ISIL identifier codecsv_toegekend_op: Assignment date (YYYY-MM-DD)csv_opmerking: Remarks/notes (18 records have remarks)
LinkML Mapped Fields:
name: Institution name (mapped from csv_instelling)locations: List with city and country (NL)identifiers: ISIL identifier with scheme, value, URL, assigned dateprovenance: Data source metadata (TIER_1_AUTHORITATIVE)description: Created from opmerking when present (optional)
Data Quality Findings
Geographic Distribution
- Unique cities: 201 across Netherlands
- Top cities:
- Den Haag: 34 institutions
- Amsterdam: 28 institutions
- Leiden: 8 institutions
- Rotterdam: 8 institutions
- Zwolle: 8 institutions
Temporal Coverage
- Date range: 2008-10-10 to 2025-09-18
- 18 records with remarks documenting:
- Organizational mergers (8 cases)
- Name changes (7 cases)
- Institutional history (3 cases)
ISIL Code Patterns
- Total codes: 371 (all unique, no duplicates)
- Standard format: NL-{CityCode}{InstitutionAbbreviation}
- Code lengths: 7 to 17 characters
- Shortest: NL-AhMA (Alkmaarsche Historiën)
- Longest: NL-LlsBatavialand (Batavialand museum/archief)
- Non-standard: 1 code with lowercase prefix (Nl-GdSAMH)
Remarks Field Analysis
18 institutions (4.9%) have remarks documenting:
Mergers (8 institutions):
- Historisch Centrum Limburg (2020: RHCL + Rijckheyt)
- Archief Gooi- en Vechtstreek (2024: SAGV + Gemeentearchief Gooise Meren)
- Noord-Veluws Archief (multiple archives consolidated)
- Stichting OverO (Stadskamer Zwolle + OB Kampen)
Name Changes (7 institutions):
- Historisch Centrum Overijssel (2021: added vestiging designation)
- Het Nieuwe Instituut (2024: abbreviation change)
- Tracé/SHCL (2024: rebranded from Sociaal Historisch Centrum)
- Nederlands Instituut voor Militaire Historie (2023: name correction)
Deprecated Codes (3 institutions):
- Marked "in onbruik" (no longer in use) due to merger/renaming
- References to successor organizations provided
LinkML Schema Compliance
Required Fields
✅ All 371 records contain:
name(institution name)locations(city + country)identifiers(ISIL code details)provenance(data source metadata)
Identifier Structure
Each ISIL identifier includes:
identifiers:
- identifier_scheme: ISIL
identifier_value: NL-AsdRM
identifier_url: https://isil.org/NL-AsdRM
assigned_date: '2013-03-07'
Provenance Metadata
All records marked as:
- Data source: ISIL_REGISTRY
- Data tier: TIER_1_AUTHORITATIVE
- Source URL: https://www.nationaalarchief.nl/isil
- Confidence score: 1.0 (authoritative)
Validation Results
Field Preservation Test
Total records: 371
Total fields: 2,226
Fields preserved: 2,226
Value mismatches: 0
Preservation rate: 100.0%
✅ VALIDATION PASSED
LinkML Schema Compliance
✅ All required fields present
✅ All CSV fields preserved
✅ No data loss during conversion
✅ YAML structure valid
Use Cases
This YAML file can be used for:
- Cross-referencing: Link Dutch heritage institutions to authoritative ISIL codes
- Geocoding: City names can be geocoded to coordinates
- Merger tracking: Remarks document organizational history
- Data integration: Merge with other datasets (NDE organizations, Wikidata)
- LinkML validation: Test schema compliance with ISIL registry data
Next Steps
Data Enrichment
- Geocode city names to latitude/longitude
- Add institution type classification (museum, archive, library)
- Cross-link with NDE organization dataset
- Query Wikidata for Q-numbers
- Extract merger/name change events into ChangeEvent objects
Schema Enhancement
- Add
institution_typefield based on institution name patterns - Create
change_historyentries from opmerking field - Link related organizations (predecessors/successors)
- Add website URLs where available
- Classify by heritage custodian type (GLAMORCUBESFIXPHDNT taxonomy)
Integration
- Merge with
/data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml - Identify institutions with both ISIL codes and NDE platform data
- Create unified heritage custodian records
- Generate GHCID identifiers for all institutions
Files Created
Data
/data/isil/nl/nan/ISIL-codes_2025-11-06.yaml(8,184 lines, 371 records)
Scripts
/scripts/convert_isil_csv_to_yaml.py(conversion + validation)
Documentation
/docs/ISIL_CSV_TO_YAML_CONVERSION_REPORT.md(this file)
Technical Notes
CSV Parsing Strategy
The malformed CSV required custom parsing:
- Read with
latin-1encoding (UTF-8 failed) - Split each line on
","delimiter - Strip quotes and trailing semicolons
- Handle empty opmerking fields
YAML Generation
Used PyYAML with settings:
allow_unicode=True(preserve Dutch characters)default_flow_style=False(readable block style)sort_keys=False(preserve field order)width=120(line wrapping)
Performance
- Parsing: ~0.1 seconds
- Mapping: ~0.2 seconds
- Validation: ~0.1 seconds
- YAML write: ~0.5 seconds
- Total time: < 1 second
Status: ✅ Conversion complete
Quality: 100% field preservation
Ready for: Data enrichment and integration