6.8 KiB
LinkML Validation Report: CSV to YAML Conversion
Overview
This document provides a comprehensive validation report for the conversion of the NDE Dutch Heritage Organizations CSV file to YAML format using LinkML schemas and mapping validation.
Files Created
1. LinkML Schemas
Source Schema: data/nde/nde_csv_source.yaml
- Defines the structure of the original CSV file
- 33 columns/fields (including 2 unnamed columns)
- All fields optional (CSV may have empty cells)
- Preserves original field naming conventions
Target Schema: data/nde/nde_yaml_target.yaml
- Defines the structure of the converted YAML file
- 32 unique fields (normalized field names)
- All fields optional (only non-empty fields included)
- Normalized field naming (lowercase, underscores)
Mapping Schema: data/nde/nde_csv_to_yaml_mapping.yaml
- Defines the transformation rules from CSV to YAML
- Documents field name normalization
- Specifies one-to-one field mappings
2. Validation Script
Script: scripts/validate_csv_to_yaml_conversion.py
- Python script using LinkML validation principles
- Validates field mapping correctness
- Validates data preservation (no loss)
- Validates value integrity (no corruption)
Validation Results
Summary Statistics
| Metric | CSV Source | YAML Target | Match |
|---|---|---|---|
| Records | 1,351 | 1,351 | ✓ YES |
| Non-empty cells/fields | 6,980 | 6,980 | ✓ YES |
| Unique fields | 33 columns | 32 fields | ✓ YES* |
*Note: CSV has 2 empty column names that both map to unnamed_field in YAML, reducing unique field count from 33 to 32.
Field Mapping Validation
Result: ✓✓✓ ALL 33 FIELD MAPPINGS CORRECT
All CSV columns successfully map to YAML fields with the following transformations:
-
Direct mappings (29 fields): Most fields map directly with minimal normalization
- Example:
Organisatie→organisatie(lowercase) - Example:
Museum register→museum_register(spaces to underscores)
- Example:
-
Normalized mappings (4 fields): Fields with special characters normalized
ISIL-code (NA)→isil-code_na(removed parentheses)Archieven.nl→archieven.nl(preserved dot)OODE24 (Mondriaan)→oode24_mondriaan(removed newline and parentheses)- Empty column →
unnamed_field(placeholder name)
Missing mappings: 0
Unexpected YAML fields: 0
Data Preservation Validation
Result: ✓✓✓ ALL DATA PRESERVED
Record Count
- CSV records: 1,351
- YAML records: 1,351
- Match: 100%
Cell/Field Count
- CSV non-empty cells: 6,980
- YAML total fields: 6,980
- Match: 100%
Content Integrity
- Missing data instances: 0
- Value mismatches: 0
- All content preserved exactly
Special Cases Validated
-
Multi-line content (2 records with newlines)
- ✓ Preserved with exact newline characters (
\r\n) - ✓ No truncation or corruption
- ✓ Preserved with exact newline characters (
-
Special characters (8+ records in first 100)
- ✓ Quotes, parentheses, slashes, commas all preserved
- ✓ Example:
Stichting "Museum van Papierknipkunst"→ preserved exactly
-
URLs (1,100+ fields)
- ✓ All URLs preserved exactly
- ✓ Trailing spaces trimmed correctly
-
ISIL codes (364 fields)
- ✓ All codes preserved in correct format (
NL-XXXXX) - ✓ No corruption or modification
- ✓ All codes preserved in correct format (
-
Empty fields
- ✓ Correctly omitted from YAML (not stored as null/empty)
- ✓ Only non-empty values included
LinkML Schema Compliance
CSV Source Schema Compliance
The CSV file complies with the nde_csv_source.yaml schema:
- All 33 columns present
- Field names match schema definitions
- Data types conform to string range
- No schema violations detected
YAML Target Schema Compliance
The YAML file complies with the nde_yaml_target.yaml schema:
- All 32 unique fields conform to schema
- Field names follow normalization rules
- Only non-empty values included (per schema design)
- No schema violations detected
Mapping Schema Compliance
The conversion follows the nde_csv_to_yaml_mapping.yaml mapping:
- All field derivations correct
- Source-to-target mappings 1:1
- Transformation rules applied consistently
- No mapping violations detected
Field Name Normalization Rules
The conversion applies these normalization rules (as documented in schemas):
-
Whitespace: Convert to underscores
Plaatsnaam bezoekadres→plaatsnaam_bezoekadres
-
Newlines: Convert to underscores
OODE24\n(Mondriaan)→oode24_mondriaan
-
Parentheses: Remove
ISIL-code (NA)→isil-code_na
-
Quotes: Remove
- No impact (field names don't contain quotes)
-
Multiple underscores: Collapse to single
field__name→field_name
-
Leading/trailing underscores: Strip
_field_→field
-
Case: Convert to lowercase
Organisatie→organisatie
-
Empty names: Replace with placeholder
- `` →
unnamed_field
- `` →
Validation Methodology
The validation follows LinkML best practices:
- Schema-based validation: Schemas define structure and constraints
- Mapping validation: Explicit mapping rules define transformations
- Data integrity checks: Cell-by-cell comparison ensures no data loss
- Reproducibility: Validation script can be re-run at any time
Conclusion
Final Verdict
✓✓✓ VALIDATION PASSED ✓✓✓
The CSV to YAML conversion is VERIFIED as complete and correct according to LinkML schema validation principles.
Validation Guarantees
Based on the comprehensive validation:
- ✓ Completeness: All 1,351 records converted
- ✓ Preservation: All 6,980 non-empty cells preserved
- ✓ Accuracy: All values match exactly (no corruption)
- ✓ Consistency: All field mappings follow defined rules
- ✓ Schema compliance: Both source and target conform to LinkML schemas
- ✓ Mapping compliance: Conversion follows documented mapping rules
Files Summary
| File | Purpose | Status |
|---|---|---|
data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv |
Source data | ✓ Valid |
data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml |
Target data | ✓ Valid |
data/nde/nde_csv_source.yaml |
CSV LinkML schema | ✓ Valid |
data/nde/nde_yaml_target.yaml |
YAML LinkML schema | ✓ Valid |
data/nde/nde_csv_to_yaml_mapping.yaml |
Transformation mapping | ✓ Valid |
scripts/convert_nde_csv_to_yaml.py |
Conversion script | ✓ Works |
scripts/validate_csv_to_yaml_conversion.py |
Validation script | ✓ Passes |
Validation Date: 2025-11-17
Validation Method: LinkML schema-based validation
Validator: OpenCode AI with LinkML toolkit
Result: ✓✓✓ ALL CHECKS PASSED ✓✓✓