glam/docs/CSV_TO_YAML_QUICK_REFERENCE.md
2025-11-19 23:25:22 +01:00

90 lines
2.5 KiB
Markdown

# Quick Reference: CSV to YAML Conversion & Validation
## Files Created
```
data/nde/
├── nde_csv_source.yaml # LinkML schema for CSV structure
├── nde_yaml_target.yaml # LinkML schema for YAML structure
├── nde_csv_to_yaml_mapping.yaml # Transformation mapping
├── voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv # Source
├── voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml # Target
├── sample_yaml_for_validation.yaml # Sample data for testing
└── README.md # Archive documentation
scripts/
├── convert_nde_csv_to_yaml.py # Conversion script
└── validate_csv_to_yaml_conversion.py # Validation script
docs/
├── NDE_CSV_TO_YAML_LINKML_VALIDATION.md # Full validation report
└── CSV_TO_YAML_QUICK_REFERENCE.md # This file
```
## Usage
### Convert CSV to YAML
```bash
python scripts/convert_nde_csv_to_yaml.py
```
Output:
- Creates YAML file in `data/nde/` directory
- Preserves all non-empty CSV data
- Normalizes field names (lowercase, underscores)
### Validate Conversion
```bash
python scripts/validate_csv_to_yaml_conversion.py
```
Output:
- Validates field mappings
- Checks data preservation
- Reports any missing/corrupted data
## Validation Results
✓✓✓ **ALL CHECKS PASSED** ✓✓✓
- **Records**: 1,351 / 1,351 ✓
- **Fields**: 6,980 / 6,980 ✓
- **Missing data**: 0 ✓
- **Value mismatches**: 0 ✓
## Field Name Transformations
| CSV Column | YAML Field |
|------------|------------|
| `Organisatie` | `organisatie` |
| `ISIL-code (NA)` | `isil-code_na` |
| `Archieven.nl` | `archieven.nl` |
| `OODE24 (Mondriaan)` | `oode24_mondriaan` |
| Empty column | `unnamed_field` |
## LinkML Schema Details
- **Source schema**: `data/nde/nde_csv_source.yaml` - 33 columns, all optional
- **Target schema**: `data/nde/nde_yaml_target.yaml` - 32 fields, all optional
- **Mapping**: 1:1 field mappings with normalization rules
## Key Features
1. **Lossless conversion**: All data preserved exactly
2. **Field normalization**: Consistent naming conventions
3. **Empty field handling**: Only non-empty values included
4. **Special character support**: Multi-line content, quotes, etc.
5. **LinkML validation**: Schema-based verification
## Re-running Validation
To re-validate at any time:
```bash
cd /Users/kempersc/apps/glam
python scripts/validate_csv_to_yaml_conversion.py
```
Expected output: `✓✓✓ VALIDATION PASSED ✓✓✓`