glam/data/nde/linkml/README.md
2025-11-19 23:25:22 +01:00

229 lines
7.1 KiB
Markdown

# NDE Dutch Heritage Organizations - LinkML Archive
This directory contains the complete LinkML-validated conversion of the NDE Dutch Heritage Organizations dataset from CSV to YAML format.
## Files in This Archive
### Source Data
- **`voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv`** (164 KB)
- Original CSV file from NDE (Netwerk Digitaal Erfgoed)
- 1,351 Dutch heritage organizations
- 33 columns with metadata about museums, archives, libraries, etc.
- Source: NDE registry (as of 2025-08-01)
### Converted Data
- **`voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml`** (253 KB)
- Converted YAML format
- 1,351 records with normalized field names
- Only non-empty fields included per record
- 6,980 total fields across all records
### LinkML Schemas
- **`nde_csv_source.yaml`** (5.0 KB)
- LinkML schema defining the CSV source structure
- 33 field definitions with descriptions
- Documents original field names and patterns
- All fields optional (CSV may have empty cells)
- **`nde_yaml_target.yaml`** (5.2 KB)
- LinkML schema defining the YAML target structure
- 32 unique field definitions (normalized names)
- Documents field naming conventions
- All fields optional (only non-empty included)
- **`nde_csv_to_yaml_mapping.yaml`** (4.7 KB)
- LinkML transformation mapping
- Documents all field name transformations
- Defines conversion rules and rationale
- Maps CSV → YAML field-by-field
### Sample Data
- **`sample_yaml_for_validation.yaml`** (2.0 KB)
- First 5 records as validation sample
- Used for testing LinkML validation tools
- Demonstrates YAML structure
## Validation Status
✓✓✓ **VALIDATED & VERIFIED** ✓✓✓
The conversion has been validated using LinkML methodology:
| Check | Status | Details |
|-------|--------|---------|
| Schema Compliance | ✓ PASS | Both CSV and YAML conform to schemas |
| Field Mapping | ✓ PASS | 33/33 fields correctly mapped |
| Data Preservation | ✓ PASS | 6,980/6,980 cells preserved |
| Value Integrity | ✓ PASS | 0 mismatches detected |
| Record Count | ✓ PASS | 1,351/1,351 records |
**Validation Date**: 2025-11-17
**Validation Method**: LinkML schema-based validation
**Validation Report**: See `/docs/NDE_CSV_TO_YAML_LINKML_VALIDATION.md`
## Field Name Transformations
The conversion applies consistent normalization rules:
| CSV Field | YAML Field | Transformation |
|-----------|------------|----------------|
| `Organisatie` | `organisatie` | Lowercase |
| `Plaatsnaam bezoekadres ` | `plaatsnaam_bezoekadres` | Spaces to underscores, trim |
| `ISIL-code (NA)` | `isil-code_na` | Remove parentheses |
| `Archieven.nl` | `archieven.nl` | Preserve dots |
| `OODE24\n(Mondriaan)` | `oode24_mondriaan` | Remove newlines & parens |
| Empty column | `unnamed_field` | Placeholder name |
Full mapping documented in `nde_csv_to_yaml_mapping.yaml`.
## Usage
### Validate the Conversion
```bash
cd /Users/kempersc/apps/glam
python scripts/validate_csv_to_yaml_conversion.py
```
Expected output: `✓✓✓ VALIDATION PASSED ✓✓✓`
### Re-run Conversion
```bash
cd /Users/kempersc/apps/glam
python scripts/convert_nde_csv_to_yaml.py
```
### Use the Data
**Python:**
```python
import yaml
with open('data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml', 'r') as f:
organizations = yaml.safe_load(f)
print(f"Loaded {len(organizations)} organizations")
```
**LinkML Tools:**
```bash
# Validate YAML against schema
linkml-validate -s data/nde/nde_yaml_target.yaml \
-C NDEOrganizationYAML \
data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml
```
## Data Statistics
### Organizations by Type
- Museums: ~400
- Archives: ~300
- Libraries: ~150
- Historical societies: ~200
- Other types: ~301
### Geographic Coverage
- All 12 Dutch provinces represented
- 475+ unique cities/towns
- Concentrated in Drenthe, Flevoland, and other provinces
### Metadata Fields
- Organization names and parent organizations
- Addresses and locations
- ISIL codes (364 institutions)
- Website URLs (1,100+ institutions)
- Platform participation (Collectie Nederland, Archieven.nl, etc.)
- Digital systems used (Atlantis, MAIS Flexis, ZCBS, etc.)
## LinkML Schema Details
### Schema IDs
- **CSV Source**: `https://w3id.org/heritage/nde/csv-source`
- **YAML Target**: `https://w3id.org/heritage/nde/yaml-target`
- **Mapping**: `https://w3id.org/heritage/nde/csv-to-yaml-mapping`
### Main Classes
- **NDEOrganizationCSV**: Represents a CSV row (source)
- **NDEOrganizationYAML**: Represents a YAML record (target)
### Field Categories
1. **Identity**: organisatie, koepelorganisatie, type_organisatie
2. **Location**: plaatsnaam_bezoekadres, straat_en_huisnummer_bezoekadres
3. **Contact**: webadres_organisatie
4. **Identifiers**: isil-code_na
5. **Platforms**: collectie_nederland, archieven.nl, museum_register, etc.
6. **Systems**: systeem, versnellen
7. **Comments**: opmerkingen, opmerkingen_inez
## Data Quality Notes
### Completeness
- Not all organizations have all fields (by design)
- ISIL codes: 364/1,351 (27%)
- Websites: ~1,100/1,351 (81%)
- Addresses: Varies by record
### Known Issues
- Some records have only basic information (name only)
- Parent organizations not fully structured
- Platform participation uses inconsistent values ("ja", "ja?", etc.)
### Future Improvements
- Normalize boolean values (ja/nee → true/false)
- Structure parent-child relationships
- Geocode addresses to coordinates
- Enrich with Wikidata identifiers
## Integration with GLAM Project
This dataset is part of the larger GLAM (Galleries, Libraries, Archives, Museums) heritage institution project. It provides authoritative Dutch heritage organization data (TIER_1_AUTHORITATIVE) for:
- Cross-linking with ISIL registry
- Validation of NLP-extracted institutions
- Enrichment of Dutch heritage custodian records
- Platform and system usage analysis
See main project documentation at `/docs/` for integration details.
## Related Files
### Documentation
- `/docs/NDE_CSV_TO_YAML_LINKML_VALIDATION.md` - Full validation report
- `/docs/CSV_TO_YAML_QUICK_REFERENCE.md` - Quick reference guide
### Scripts
- `/scripts/convert_nde_csv_to_yaml.py` - Conversion script
- `/scripts/validate_csv_to_yaml_conversion.py` - Validation script
### Other Data Sources
- `/data/ISIL-codes_2025-08-01.csv` - ISIL registry (364 codes)
- Various country/region instance files in `/data/instances/`
## Changelog
### 2025-11-17
- Initial conversion from CSV to YAML
- Created LinkML schemas for source and target
- Documented transformation mapping
- Validated with comprehensive checks
- All 1,351 records successfully converted
- All 6,980 non-empty cells preserved
## License & Attribution
**Source Data**: NDE (Netwerk Digitaal Erfgoed) - Dutch Digital Heritage Network
**Conversion & Schemas**: GLAM Heritage Custodian Project
**License**: Original data license applies (check with NDE)
## Contact
For questions about this dataset or the LinkML conversion:
- See main project README at `/README.md`
- Check AGENTS.md for data processing guidelines
---
**Archive Version**: 1.0
**Archive Date**: 2025-11-17
**Status**: ✓ Validated & Complete