glam/data/nde/linkml/README.md
2025-11-19 23:25:22 +01:00

7.1 KiB

NDE Dutch Heritage Organizations - LinkML Archive

This directory contains the complete LinkML-validated conversion of the NDE Dutch Heritage Organizations dataset from CSV to YAML format.

Files in This Archive

Source Data

  • voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv (164 KB)
    • Original CSV file from NDE (Netwerk Digitaal Erfgoed)
    • 1,351 Dutch heritage organizations
    • 33 columns with metadata about museums, archives, libraries, etc.
    • Source: NDE registry (as of 2025-08-01)

Converted Data

  • voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml (253 KB)
    • Converted YAML format
    • 1,351 records with normalized field names
    • Only non-empty fields included per record
    • 6,980 total fields across all records

LinkML Schemas

  • nde_csv_source.yaml (5.0 KB)

    • LinkML schema defining the CSV source structure
    • 33 field definitions with descriptions
    • Documents original field names and patterns
    • All fields optional (CSV may have empty cells)
  • nde_yaml_target.yaml (5.2 KB)

    • LinkML schema defining the YAML target structure
    • 32 unique field definitions (normalized names)
    • Documents field naming conventions
    • All fields optional (only non-empty included)
  • nde_csv_to_yaml_mapping.yaml (4.7 KB)

    • LinkML transformation mapping
    • Documents all field name transformations
    • Defines conversion rules and rationale
    • Maps CSV → YAML field-by-field

Sample Data

  • sample_yaml_for_validation.yaml (2.0 KB)
    • First 5 records as validation sample
    • Used for testing LinkML validation tools
    • Demonstrates YAML structure

Validation Status

✓✓✓ VALIDATED & VERIFIED ✓✓✓

The conversion has been validated using LinkML methodology:

Check Status Details
Schema Compliance ✓ PASS Both CSV and YAML conform to schemas
Field Mapping ✓ PASS 33/33 fields correctly mapped
Data Preservation ✓ PASS 6,980/6,980 cells preserved
Value Integrity ✓ PASS 0 mismatches detected
Record Count ✓ PASS 1,351/1,351 records

Validation Date: 2025-11-17
Validation Method: LinkML schema-based validation
Validation Report: See /docs/NDE_CSV_TO_YAML_LINKML_VALIDATION.md

Field Name Transformations

The conversion applies consistent normalization rules:

CSV Field YAML Field Transformation
Organisatie organisatie Lowercase
Plaatsnaam bezoekadres plaatsnaam_bezoekadres Spaces to underscores, trim
ISIL-code (NA) isil-code_na Remove parentheses
Archieven.nl archieven.nl Preserve dots
OODE24\n(Mondriaan) oode24_mondriaan Remove newlines & parens
Empty column unnamed_field Placeholder name

Full mapping documented in nde_csv_to_yaml_mapping.yaml.

Usage

Validate the Conversion

cd /Users/kempersc/apps/glam
python scripts/validate_csv_to_yaml_conversion.py

Expected output: ✓✓✓ VALIDATION PASSED ✓✓✓

Re-run Conversion

cd /Users/kempersc/apps/glam
python scripts/convert_nde_csv_to_yaml.py

Use the Data

Python:

import yaml

with open('data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml', 'r') as f:
    organizations = yaml.safe_load(f)

print(f"Loaded {len(organizations)} organizations")

LinkML Tools:

# Validate YAML against schema
linkml-validate -s data/nde/nde_yaml_target.yaml \
  -C NDEOrganizationYAML \
  data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml

Data Statistics

Organizations by Type

  • Museums: ~400
  • Archives: ~300
  • Libraries: ~150
  • Historical societies: ~200
  • Other types: ~301

Geographic Coverage

  • All 12 Dutch provinces represented
  • 475+ unique cities/towns
  • Concentrated in Drenthe, Flevoland, and other provinces

Metadata Fields

  • Organization names and parent organizations
  • Addresses and locations
  • ISIL codes (364 institutions)
  • Website URLs (1,100+ institutions)
  • Platform participation (Collectie Nederland, Archieven.nl, etc.)
  • Digital systems used (Atlantis, MAIS Flexis, ZCBS, etc.)

LinkML Schema Details

Schema IDs

  • CSV Source: https://w3id.org/heritage/nde/csv-source
  • YAML Target: https://w3id.org/heritage/nde/yaml-target
  • Mapping: https://w3id.org/heritage/nde/csv-to-yaml-mapping

Main Classes

  • NDEOrganizationCSV: Represents a CSV row (source)
  • NDEOrganizationYAML: Represents a YAML record (target)

Field Categories

  1. Identity: organisatie, koepelorganisatie, type_organisatie
  2. Location: plaatsnaam_bezoekadres, straat_en_huisnummer_bezoekadres
  3. Contact: webadres_organisatie
  4. Identifiers: isil-code_na
  5. Platforms: collectie_nederland, archieven.nl, museum_register, etc.
  6. Systems: systeem, versnellen
  7. Comments: opmerkingen, opmerkingen_inez

Data Quality Notes

Completeness

  • Not all organizations have all fields (by design)
  • ISIL codes: 364/1,351 (27%)
  • Websites: ~1,100/1,351 (81%)
  • Addresses: Varies by record

Known Issues

  • Some records have only basic information (name only)
  • Parent organizations not fully structured
  • Platform participation uses inconsistent values ("ja", "ja?", etc.)

Future Improvements

  • Normalize boolean values (ja/nee → true/false)
  • Structure parent-child relationships
  • Geocode addresses to coordinates
  • Enrich with Wikidata identifiers

Integration with GLAM Project

This dataset is part of the larger GLAM (Galleries, Libraries, Archives, Museums) heritage institution project. It provides authoritative Dutch heritage organization data (TIER_1_AUTHORITATIVE) for:

  • Cross-linking with ISIL registry
  • Validation of NLP-extracted institutions
  • Enrichment of Dutch heritage custodian records
  • Platform and system usage analysis

See main project documentation at /docs/ for integration details.

Documentation

  • /docs/NDE_CSV_TO_YAML_LINKML_VALIDATION.md - Full validation report
  • /docs/CSV_TO_YAML_QUICK_REFERENCE.md - Quick reference guide

Scripts

  • /scripts/convert_nde_csv_to_yaml.py - Conversion script
  • /scripts/validate_csv_to_yaml_conversion.py - Validation script

Other Data Sources

  • /data/ISIL-codes_2025-08-01.csv - ISIL registry (364 codes)
  • Various country/region instance files in /data/instances/

Changelog

2025-11-17

  • Initial conversion from CSV to YAML
  • Created LinkML schemas for source and target
  • Documented transformation mapping
  • Validated with comprehensive checks
  • All 1,351 records successfully converted
  • All 6,980 non-empty cells preserved

License & Attribution

Source Data: NDE (Netwerk Digitaal Erfgoed) - Dutch Digital Heritage Network
Conversion & Schemas: GLAM Heritage Custodian Project
License: Original data license applies (check with NDE)

Contact

For questions about this dataset or the LinkML conversion:

  • See main project README at /README.md
  • Check AGENTS.md for data processing guidelines

Archive Version: 1.0
Archive Date: 2025-11-17
Status: ✓ Validated & Complete