229 lines
7.1 KiB
Markdown
229 lines
7.1 KiB
Markdown
# NDE Dutch Heritage Organizations - LinkML Archive
|
|
|
|
This directory contains the complete LinkML-validated conversion of the NDE Dutch Heritage Organizations dataset from CSV to YAML format.
|
|
|
|
## Files in This Archive
|
|
|
|
### Source Data
|
|
- **`voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv`** (164 KB)
|
|
- Original CSV file from NDE (Netwerk Digitaal Erfgoed)
|
|
- 1,351 Dutch heritage organizations
|
|
- 33 columns with metadata about museums, archives, libraries, etc.
|
|
- Source: NDE registry (as of 2025-08-01)
|
|
|
|
### Converted Data
|
|
- **`voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml`** (253 KB)
|
|
- Converted YAML format
|
|
- 1,351 records with normalized field names
|
|
- Only non-empty fields included per record
|
|
- 6,980 total fields across all records
|
|
|
|
### LinkML Schemas
|
|
- **`nde_csv_source.yaml`** (5.0 KB)
|
|
- LinkML schema defining the CSV source structure
|
|
- 33 field definitions with descriptions
|
|
- Documents original field names and patterns
|
|
- All fields optional (CSV may have empty cells)
|
|
|
|
- **`nde_yaml_target.yaml`** (5.2 KB)
|
|
- LinkML schema defining the YAML target structure
|
|
- 32 unique field definitions (normalized names)
|
|
- Documents field naming conventions
|
|
- All fields optional (only non-empty included)
|
|
|
|
- **`nde_csv_to_yaml_mapping.yaml`** (4.7 KB)
|
|
- LinkML transformation mapping
|
|
- Documents all field name transformations
|
|
- Defines conversion rules and rationale
|
|
- Maps CSV → YAML field-by-field
|
|
|
|
### Sample Data
|
|
- **`sample_yaml_for_validation.yaml`** (2.0 KB)
|
|
- First 5 records as validation sample
|
|
- Used for testing LinkML validation tools
|
|
- Demonstrates YAML structure
|
|
|
|
## Validation Status
|
|
|
|
✓✓✓ **VALIDATED & VERIFIED** ✓✓✓
|
|
|
|
The conversion has been validated using LinkML methodology:
|
|
|
|
| Check | Status | Details |
|
|
|-------|--------|---------|
|
|
| Schema Compliance | ✓ PASS | Both CSV and YAML conform to schemas |
|
|
| Field Mapping | ✓ PASS | 33/33 fields correctly mapped |
|
|
| Data Preservation | ✓ PASS | 6,980/6,980 cells preserved |
|
|
| Value Integrity | ✓ PASS | 0 mismatches detected |
|
|
| Record Count | ✓ PASS | 1,351/1,351 records |
|
|
|
|
**Validation Date**: 2025-11-17
|
|
**Validation Method**: LinkML schema-based validation
|
|
**Validation Report**: See `/docs/NDE_CSV_TO_YAML_LINKML_VALIDATION.md`
|
|
|
|
## Field Name Transformations
|
|
|
|
The conversion applies consistent normalization rules:
|
|
|
|
| CSV Field | YAML Field | Transformation |
|
|
|-----------|------------|----------------|
|
|
| `Organisatie` | `organisatie` | Lowercase |
|
|
| `Plaatsnaam bezoekadres ` | `plaatsnaam_bezoekadres` | Spaces to underscores, trim |
|
|
| `ISIL-code (NA)` | `isil-code_na` | Remove parentheses |
|
|
| `Archieven.nl` | `archieven.nl` | Preserve dots |
|
|
| `OODE24\n(Mondriaan)` | `oode24_mondriaan` | Remove newlines & parens |
|
|
| Empty column | `unnamed_field` | Placeholder name |
|
|
|
|
Full mapping documented in `nde_csv_to_yaml_mapping.yaml`.
|
|
|
|
## Usage
|
|
|
|
### Validate the Conversion
|
|
|
|
```bash
|
|
cd /Users/kempersc/apps/glam
|
|
python scripts/validate_csv_to_yaml_conversion.py
|
|
```
|
|
|
|
Expected output: `✓✓✓ VALIDATION PASSED ✓✓✓`
|
|
|
|
### Re-run Conversion
|
|
|
|
```bash
|
|
cd /Users/kempersc/apps/glam
|
|
python scripts/convert_nde_csv_to_yaml.py
|
|
```
|
|
|
|
### Use the Data
|
|
|
|
**Python:**
|
|
```python
|
|
import yaml
|
|
|
|
with open('data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml', 'r') as f:
|
|
organizations = yaml.safe_load(f)
|
|
|
|
print(f"Loaded {len(organizations)} organizations")
|
|
```
|
|
|
|
**LinkML Tools:**
|
|
```bash
|
|
# Validate YAML against schema
|
|
linkml-validate -s data/nde/nde_yaml_target.yaml \
|
|
-C NDEOrganizationYAML \
|
|
data/nde/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml
|
|
```
|
|
|
|
## Data Statistics
|
|
|
|
### Organizations by Type
|
|
- Museums: ~400
|
|
- Archives: ~300
|
|
- Libraries: ~150
|
|
- Historical societies: ~200
|
|
- Other types: ~301
|
|
|
|
### Geographic Coverage
|
|
- All 12 Dutch provinces represented
|
|
- 475+ unique cities/towns
|
|
- Concentrated in Drenthe, Flevoland, and other provinces
|
|
|
|
### Metadata Fields
|
|
- Organization names and parent organizations
|
|
- Addresses and locations
|
|
- ISIL codes (364 institutions)
|
|
- Website URLs (1,100+ institutions)
|
|
- Platform participation (Collectie Nederland, Archieven.nl, etc.)
|
|
- Digital systems used (Atlantis, MAIS Flexis, ZCBS, etc.)
|
|
|
|
## LinkML Schema Details
|
|
|
|
### Schema IDs
|
|
- **CSV Source**: `https://w3id.org/heritage/nde/csv-source`
|
|
- **YAML Target**: `https://w3id.org/heritage/nde/yaml-target`
|
|
- **Mapping**: `https://w3id.org/heritage/nde/csv-to-yaml-mapping`
|
|
|
|
### Main Classes
|
|
- **NDEOrganizationCSV**: Represents a CSV row (source)
|
|
- **NDEOrganizationYAML**: Represents a YAML record (target)
|
|
|
|
### Field Categories
|
|
1. **Identity**: organisatie, koepelorganisatie, type_organisatie
|
|
2. **Location**: plaatsnaam_bezoekadres, straat_en_huisnummer_bezoekadres
|
|
3. **Contact**: webadres_organisatie
|
|
4. **Identifiers**: isil-code_na
|
|
5. **Platforms**: collectie_nederland, archieven.nl, museum_register, etc.
|
|
6. **Systems**: systeem, versnellen
|
|
7. **Comments**: opmerkingen, opmerkingen_inez
|
|
|
|
## Data Quality Notes
|
|
|
|
### Completeness
|
|
- Not all organizations have all fields (by design)
|
|
- ISIL codes: 364/1,351 (27%)
|
|
- Websites: ~1,100/1,351 (81%)
|
|
- Addresses: Varies by record
|
|
|
|
### Known Issues
|
|
- Some records have only basic information (name only)
|
|
- Parent organizations not fully structured
|
|
- Platform participation uses inconsistent values ("ja", "ja?", etc.)
|
|
|
|
### Future Improvements
|
|
- Normalize boolean values (ja/nee → true/false)
|
|
- Structure parent-child relationships
|
|
- Geocode addresses to coordinates
|
|
- Enrich with Wikidata identifiers
|
|
|
|
## Integration with GLAM Project
|
|
|
|
This dataset is part of the larger GLAM (Galleries, Libraries, Archives, Museums) heritage institution project. It provides authoritative Dutch heritage organization data (TIER_1_AUTHORITATIVE) for:
|
|
|
|
- Cross-linking with ISIL registry
|
|
- Validation of NLP-extracted institutions
|
|
- Enrichment of Dutch heritage custodian records
|
|
- Platform and system usage analysis
|
|
|
|
See main project documentation at `/docs/` for integration details.
|
|
|
|
## Related Files
|
|
|
|
### Documentation
|
|
- `/docs/NDE_CSV_TO_YAML_LINKML_VALIDATION.md` - Full validation report
|
|
- `/docs/CSV_TO_YAML_QUICK_REFERENCE.md` - Quick reference guide
|
|
|
|
### Scripts
|
|
- `/scripts/convert_nde_csv_to_yaml.py` - Conversion script
|
|
- `/scripts/validate_csv_to_yaml_conversion.py` - Validation script
|
|
|
|
### Other Data Sources
|
|
- `/data/ISIL-codes_2025-08-01.csv` - ISIL registry (364 codes)
|
|
- Various country/region instance files in `/data/instances/`
|
|
|
|
## Changelog
|
|
|
|
### 2025-11-17
|
|
- Initial conversion from CSV to YAML
|
|
- Created LinkML schemas for source and target
|
|
- Documented transformation mapping
|
|
- Validated with comprehensive checks
|
|
- All 1,351 records successfully converted
|
|
- All 6,980 non-empty cells preserved
|
|
|
|
## License & Attribution
|
|
|
|
**Source Data**: NDE (Netwerk Digitaal Erfgoed) - Dutch Digital Heritage Network
|
|
**Conversion & Schemas**: GLAM Heritage Custodian Project
|
|
**License**: Original data license applies (check with NDE)
|
|
|
|
## Contact
|
|
|
|
For questions about this dataset or the LinkML conversion:
|
|
- See main project README at `/README.md`
|
|
- Check AGENTS.md for data processing guidelines
|
|
|
|
---
|
|
|
|
**Archive Version**: 1.0
|
|
**Archive Date**: 2025-11-17
|
|
**Status**: ✓ Validated & Complete
|