glam/docs/CSV_TO_YAML_QUICK_REFERENCE.md
2025-11-19 23:25:22 +01:00

2.5 KiB

Quick Reference: CSV to YAML Conversion & Validation

Files Created

data/nde/
├── nde_csv_source.yaml              # LinkML schema for CSV structure
├── nde_yaml_target.yaml             # LinkML schema for YAML structure
├── nde_csv_to_yaml_mapping.yaml     # Transformation mapping
├── voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv  # Source
├── voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml # Target
├── sample_yaml_for_validation.yaml  # Sample data for testing
└── README.md                        # Archive documentation

scripts/
├── convert_nde_csv_to_yaml.py       # Conversion script
└── validate_csv_to_yaml_conversion.py # Validation script

docs/
├── NDE_CSV_TO_YAML_LINKML_VALIDATION.md  # Full validation report
└── CSV_TO_YAML_QUICK_REFERENCE.md        # This file

Usage

Convert CSV to YAML

python scripts/convert_nde_csv_to_yaml.py

Output:

  • Creates YAML file in data/nde/ directory
  • Preserves all non-empty CSV data
  • Normalizes field names (lowercase, underscores)

Validate Conversion

python scripts/validate_csv_to_yaml_conversion.py

Output:

  • Validates field mappings
  • Checks data preservation
  • Reports any missing/corrupted data

Validation Results

✓✓✓ ALL CHECKS PASSED ✓✓✓

  • Records: 1,351 / 1,351 ✓
  • Fields: 6,980 / 6,980 ✓
  • Missing data: 0 ✓
  • Value mismatches: 0 ✓

Field Name Transformations

CSV Column YAML Field
Organisatie organisatie
ISIL-code (NA) isil-code_na
Archieven.nl archieven.nl
OODE24 (Mondriaan) oode24_mondriaan
Empty column unnamed_field

LinkML Schema Details

  • Source schema: data/nde/nde_csv_source.yaml - 33 columns, all optional
  • Target schema: data/nde/nde_yaml_target.yaml - 32 fields, all optional
  • Mapping: 1:1 field mappings with normalization rules

Key Features

  1. Lossless conversion: All data preserved exactly
  2. Field normalization: Consistent naming conventions
  3. Empty field handling: Only non-empty values included
  4. Special character support: Multi-line content, quotes, etc.
  5. LinkML validation: Schema-based verification

Re-running Validation

To re-validate at any time:

cd /Users/kempersc/apps/glam
python scripts/validate_csv_to_yaml_conversion.py

Expected output: ✓✓✓ VALIDATION PASSED ✓✓✓