| .. | ||
| linkml | ||
| sparql | ||
| README.md | ||
| sample_yaml_for_validation.yaml | ||
| voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.backup.20251117_115940.yaml | ||
| voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.backup.20251117_121119.yaml | ||
| voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.backup.20251117_122408.yaml | ||
| voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv | ||
| voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml | ||
NDE Dutch Heritage Organizations Dataset
Dataset Name: Voorbeeld lijst organisaties en diensten - Totaallijst Nederland
Source: Network Digital Heritage (NDE)
Records: 1,351 Dutch heritage organizations
Last Updated: 2025-11-17
Enrichment Status: Test batch complete (10 records with Wikidata IDs)
Dataset Overview
This directory contains the NDE dataset of Dutch heritage organizations, converted from CSV to YAML format with Wikidata enrichment.
Files
| File | Size | Description |
|---|---|---|
voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv |
168 KB | Original CSV source (1,351 records) |
voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml |
259 KB | Converted YAML with enrichment |
voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.backup.*.yaml |
259 KB | Backup before enrichment |
sample_yaml_for_validation.yaml |
2 KB | Sample for validation testing |
Subdirectories
linkml/- LinkML schemas for CSV source, YAML target, and field mappingssparql/- SPARQL query logs and enrichment results
Dataset Statistics
Record Counts by Type
| Type | Count | Percentage |
|---|---|---|
| Archive (archief) | ~600 | 44% |
| Museum | ~500 | 37% |
| Library (bibliotheek) | ~150 | 11% |
| Historical Society (historische vereniging) | ~100 | 7% |
| Total | 1,351 | 100% |
Geographic Coverage
- Provinces: All 12 Dutch provinces
- Cities: 475+ municipalities
- Focus: Drenthe province (test batch)
Data Quality
- ISIL Codes: 1,119 records (83%)
- Websites: 1,200+ records (89%)
- Digital Platforms: 1,119 records (83%)
- Wikidata IDs: 8 records (0.6%) - test batch only
Wikidata Enrichment Status
Current Progress
- Test Batch: 10 records processed ✓
- Success Rate: 80% (8/10 matched)
- Full Dataset: Pending (1,341 records remaining)
Enriched Records
See /docs/NDE_WIKIDATA_ENRICHMENT_REPORT.md for complete enrichment results.
Sample enriched record:
- plaatsnaam_bezoekadres: Assen
straat_en_huisnummer_bezoekadres: Brink 1
organisatie: Stichting Drents Museum
webadres_organisatie: https://drentsmuseum.nl/
type_organisatie: museum
isil-code_na: NL-AsnDM
wikidata_id: Q1258370 # ← Wikidata enrichment
No-Match Records
Records flagged with wikidata_enrichment_status: no_match_found:
- Branch locations (e.g., museum extensions)
- Inter-municipal partnerships
- Small local societies
Schema Documentation
LinkML Schemas
Located in linkml/ subdirectory:
nde_csv_source.yaml- Original CSV structure (33 columns)nde_yaml_target.yaml- Normalized YAML structure (34 fields including Wikidata)nde_csv_to_yaml_mapping.yaml- Field transformation documentation
Field Definitions
Core Fields:
organisatie- Organization nametype_organisatie- Organization type (museum, archief, bibliotheek, etc.)plaatsnaam_bezoekadres- City/townstraat_en_huisnummer_bezoekadres- Street addresswebadres_organisatie- Website URLisil-code_na- ISIL identifier (NL-XXX format)
Enrichment Fields (NEW):
wikidata_id- Wikidata Q-number (e.g., Q1258370)wikidata_enrichment_status- Enrichment status flag
Platform Integration (40+ fields):
- Collection management systems (Atlantis, MAIS, etc.)
- Aggregation platforms (Collectie Nederland, Archieven.nl, etc.)
- Thematic networks (WO2Net, Modemuze, Van Gogh Worldwide, etc.)
See /docs/CSV_TO_YAML_QUICK_REFERENCE.md for complete field reference.
Usage Examples
Load YAML Data (Python)
import yaml
with open('voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml', 'r') as f:
organizations = yaml.safe_load(f)
# Filter by type
museums = [org for org in organizations if org.get('type_organisatie') == 'museum']
# Find organizations with Wikidata IDs
enriched = [org for org in organizations if 'wikidata_id' in org]
# Filter by ISIL code
with_isil = [org for org in organizations if 'isil-code_na' in org]
Query Wikidata-Enriched Records
import yaml
with open('voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml', 'r') as f:
organizations = yaml.safe_load(f)
# Get all enriched records
enriched = [
org for org in organizations
if org.get('wikidata_id')
]
for org in enriched:
print(f"{org['organisatie']}: https://www.wikidata.org/wiki/{org['wikidata_id']}")
Validate Against LinkML Schema
linkml-validate \
-s linkml/nde_yaml_target.yaml \
voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml
Conversion & Enrichment Scripts
Located in /scripts/:
CSV to YAML Conversion
convert_nde_csv_to_yaml.py- Initial CSV → YAML conversionvalidate_csv_to_yaml_conversion.py- Validation script (zero data loss verified)
Wikidata Enrichment
update_nde_yaml_with_wikidata_test_batch.py- Test batch enrichment (10 records) ✓enrich_nde_with_wikidata.py- Full dataset enrichment (prepared, not yet run)prepare_wikidata_enrichment.py- Interactive enrichment helper
SPARQL Query Logs
All Wikidata queries logged in sparql/ subdirectory:
Query Types
- Direct entity search - By organization name
- SPARQL queries - For municipalities and specialized searches
- Metadata verification - Confirm Q-number matches
Log Files
*_prepared.json- Prepared SPARQL queries (10 files)enrichment_log_test_batch_*.json- Enrichment resultsmaster_query_log_*.json- Consolidated query history
Example SPARQL Query
SELECT ?item ?itemLabel WHERE {
?item wdt:P31 wd:Q2039348 . # Instance of: Dutch municipality
?item wdt:P131 wd:Q770 . # Located in: Drenthe
?item rdfs:label "Coevorden"@nl .
SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en". }
}
Integration with Main GLAM Project
Mapping to HeritageCustodian Schema
NDE organizations will be converted to the main project's HeritageCustodian LinkML schema:
Field Mappings:
HeritageCustodian:
name: organisatie
institution_type: type_organisatie # Mapped to GLAMORCUBESFIXPHDNT taxonomy
locations:
- city: plaatsnaam_bezoekadres
street_address: straat_en_huisnummer_bezoekadres
identifiers:
- identifier_scheme: "ISIL"
identifier_value: isil-code_na
- identifier_scheme: "Wikidata"
identifier_value: wikidata_id
GHCID Generation
All NDE organizations will receive Global Heritage Custodian Identifiers:
NL-DR-ASN-M-DM # Stichting Drents Museum
NL-DR-ASN-A-DA # Drents Archief
NL-DR-BOR-M-HC # Hunebedcentrum
Format: {Country}-{Province}-{City}-{Type}-{Abbreviation}
See /docs/PERSISTENT_IDENTIFIERS.md for GHCID specification.
Data Quality Notes
Known Issues
- Unnamed first column: Some records have province/region in unnamed column
- ISIL code format: Some non-standard codes (e.g., "Drente" instead of NL-XXX format)
- Multiline addresses: Some addresses span multiple fields
- Closed institutions: Some organizations marked as closed (check
unnamed_field)
Validation Results
From scripts/validate_csv_to_yaml_conversion.py:
- ✓ All 33 CSV columns mapped
- ✓ All 6,980 non-empty cells preserved
- ✓ Zero data loss
- ✓ Zero mismatches
Next Steps
Immediate Tasks
- Scale Wikidata enrichment to full dataset (1,341 records)
- Handle ambiguous matches - Set up manual review queue
- Create Wikidata entries for missing high-priority organizations
- Validate all Q-numbers - Verify they resolve correctly
Integration Tasks
- Convert to HeritageCustodian format - Map to main LinkML schema
- Generate GHCIDs - Create persistent identifiers
- Export to RDF/JSON-LD - With Wikidata links
- Merge with ISIL registry - Cross-link with Dutch ISIL dataset
Documentation Updates
- Update project
PROGRESS.mdwith NDE statistics - Create NDE-specific extraction guide
- Document manual Wikidata creation workflow
References
- Main Documentation:
/docs/NDE_WIKIDATA_ENRICHMENT_REPORT.md - Schema Reference:
/docs/CSV_TO_YAML_QUICK_REFERENCE.md - Validation Report:
/docs/NDE_CSV_TO_YAML_LINKML_VALIDATION.md - Project Guide:
/AGENTS.md(AI agent instructions)
Contact & Support
Project: GLAM Data Extraction Project
Repository: /Users/kempersc/apps/glam
Dataset Version: v1.1 (with Wikidata enrichment)
Last Enrichment: 2025-11-17 (test batch)
End of README