# ISIL CSV to YAML Conversion Report **Date**: 2025-11-17 **Input**: `/data/isil/nl/nan/ISIL-codes_2025-11-06.csv` **Output**: `/data/isil/nl/nan/ISIL-codes_2025-11-06.yaml` **Script**: `/scripts/convert_isil_csv_to_yaml.py` --- ## Conversion Summary ### Records Processed - **Total records**: 371 Dutch ISIL codes - **Field preservation**: 100% (2,226 fields preserved exactly) - **Value mismatches**: 0 (perfect fidelity) ### CSV Structure (Original) The input CSV had a malformed structure: - All fields contained in single cell separated by `","` - Extra trailing semicolons (;;;;) - Latin-1 encoding (not UTF-8) - Header includes sequence number as first field **Fields**: 1. Row number (sequence) 2. Plaats (city) 3. Instelling (institution name) 4. ISIL code 5. Toegekend op (assigned date) 6. Opmerking (remarks) ### YAML Structure (Output) Each record contains: **CSV Fields (preserved exactly)**: - `csv_row_number`: Original row number - `csv_plaats`: City name - `csv_instelling`: Institution name - `csv_isil_code`: ISIL identifier code - `csv_toegekend_op`: Assignment date (YYYY-MM-DD) - `csv_opmerking`: Remarks/notes (18 records have remarks) **LinkML Mapped Fields**: - `name`: Institution name (mapped from csv_instelling) - `locations`: List with city and country (NL) - `identifiers`: ISIL identifier with scheme, value, URL, assigned date - `provenance`: Data source metadata (TIER_1_AUTHORITATIVE) - `description`: Created from opmerking when present (optional) --- ## Data Quality Findings ### Geographic Distribution - **Unique cities**: 201 across Netherlands - **Top cities**: 1. Den Haag: 34 institutions 2. Amsterdam: 28 institutions 3. Leiden: 8 institutions 4. Rotterdam: 8 institutions 5. Zwolle: 8 institutions ### Temporal Coverage - **Date range**: 2008-10-10 to 2025-09-18 - **18 records with remarks** documenting: - Organizational mergers (8 cases) - Name changes (7 cases) - Institutional history (3 cases) ### ISIL Code Patterns - **Total codes**: 371 (all unique, no duplicates) - **Standard format**: NL-{CityCode}{InstitutionAbbreviation} - **Code lengths**: 7 to 17 characters - **Shortest**: NL-AhMA (Alkmaarsche Historiën) - **Longest**: NL-LlsBatavialand (Batavialand museum/archief) - **Non-standard**: 1 code with lowercase prefix (Nl-GdSAMH) ### Remarks Field Analysis 18 institutions (4.9%) have remarks documenting: **Mergers** (8 institutions): - Historisch Centrum Limburg (2020: RHCL + Rijckheyt) - Archief Gooi- en Vechtstreek (2024: SAGV + Gemeentearchief Gooise Meren) - Noord-Veluws Archief (multiple archives consolidated) - Stichting OverO (Stadskamer Zwolle + OB Kampen) **Name Changes** (7 institutions): - Historisch Centrum Overijssel (2021: added vestiging designation) - Het Nieuwe Instituut (2024: abbreviation change) - Tracé/SHCL (2024: rebranded from Sociaal Historisch Centrum) - Nederlands Instituut voor Militaire Historie (2023: name correction) **Deprecated Codes** (3 institutions): - Marked "in onbruik" (no longer in use) due to merger/renaming - References to successor organizations provided --- ## LinkML Schema Compliance ### Required Fields ✅ All 371 records contain: - `name` (institution name) - `locations` (city + country) - `identifiers` (ISIL code details) - `provenance` (data source metadata) ### Identifier Structure Each ISIL identifier includes: ```yaml identifiers: - identifier_scheme: ISIL identifier_value: NL-AsdRM identifier_url: https://isil.org/NL-AsdRM assigned_date: '2013-03-07' ``` ### Provenance Metadata All records marked as: - **Data source**: ISIL_REGISTRY - **Data tier**: TIER_1_AUTHORITATIVE - **Source URL**: https://www.nationaalarchief.nl/isil - **Confidence score**: 1.0 (authoritative) --- ## Validation Results ### Field Preservation Test ``` Total records: 371 Total fields: 2,226 Fields preserved: 2,226 Value mismatches: 0 Preservation rate: 100.0% ``` ✅ **VALIDATION PASSED** ### LinkML Schema Compliance ✅ All required fields present ✅ All CSV fields preserved ✅ No data loss during conversion ✅ YAML structure valid --- ## Use Cases This YAML file can be used for: 1. **Cross-referencing**: Link Dutch heritage institutions to authoritative ISIL codes 2. **Geocoding**: City names can be geocoded to coordinates 3. **Merger tracking**: Remarks document organizational history 4. **Data integration**: Merge with other datasets (NDE organizations, Wikidata) 5. **LinkML validation**: Test schema compliance with ISIL registry data --- ## Next Steps ### Data Enrichment - [ ] Geocode city names to latitude/longitude - [ ] Add institution type classification (museum, archive, library) - [ ] Cross-link with NDE organization dataset - [ ] Query Wikidata for Q-numbers - [ ] Extract merger/name change events into ChangeEvent objects ### Schema Enhancement - [ ] Add `institution_type` field based on institution name patterns - [ ] Create `change_history` entries from opmerking field - [ ] Link related organizations (predecessors/successors) - [ ] Add website URLs where available - [ ] Classify by heritage custodian type (GLAMORCUBESFIXPHDNT taxonomy) ### Integration - [ ] Merge with `/data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml` - [ ] Identify institutions with both ISIL codes and NDE platform data - [ ] Create unified heritage custodian records - [ ] Generate GHCID identifiers for all institutions --- ## Files Created ### Data - `/data/isil/nl/nan/ISIL-codes_2025-11-06.yaml` (8,184 lines, 371 records) ### Scripts - `/scripts/convert_isil_csv_to_yaml.py` (conversion + validation) ### Documentation - `/docs/ISIL_CSV_TO_YAML_CONVERSION_REPORT.md` (this file) --- ## Technical Notes ### CSV Parsing Strategy The malformed CSV required custom parsing: 1. Read with `latin-1` encoding (UTF-8 failed) 2. Split each line on `","` delimiter 3. Strip quotes and trailing semicolons 4. Handle empty opmerking fields ### YAML Generation Used PyYAML with settings: - `allow_unicode=True` (preserve Dutch characters) - `default_flow_style=False` (readable block style) - `sort_keys=False` (preserve field order) - `width=120` (line wrapping) ### Performance - Parsing: ~0.1 seconds - Mapping: ~0.2 seconds - Validation: ~0.1 seconds - YAML write: ~0.5 seconds - **Total time**: < 1 second --- **Status**: ✅ Conversion complete **Quality**: 100% field preservation **Ready for**: Data enrichment and integration