# NDE Dutch Heritage Organizations Dataset **Dataset Name**: Voorbeeld lijst organisaties en diensten - Totaallijst Nederland **Source**: Network Digital Heritage (NDE) **Records**: 1,351 Dutch heritage organizations **Last Updated**: 2025-11-17 **Enrichment Status**: Test batch complete (10 records with Wikidata IDs) --- ## Dataset Overview This directory contains the NDE dataset of Dutch heritage organizations, converted from CSV to YAML format with Wikidata enrichment. ### Files | File | Size | Description | |------|------|-------------| | `voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv` | 168 KB | Original CSV source (1,351 records) | | `voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml` | 259 KB | Converted YAML with enrichment | | `voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.backup.*.yaml` | 259 KB | Backup before enrichment | | `sample_yaml_for_validation.yaml` | 2 KB | Sample for validation testing | ### Subdirectories - **`linkml/`** - LinkML schemas for CSV source, YAML target, and field mappings - **`sparql/`** - SPARQL query logs and enrichment results --- ## Dataset Statistics ### Record Counts by Type | Type | Count | Percentage | |------|-------|------------| | Archive (archief) | ~600 | 44% | | Museum | ~500 | 37% | | Library (bibliotheek) | ~150 | 11% | | Historical Society (historische vereniging) | ~100 | 7% | | **Total** | **1,351** | **100%** | ### Geographic Coverage - **Provinces**: All 12 Dutch provinces - **Cities**: 475+ municipalities - **Focus**: Drenthe province (test batch) ### Data Quality - **ISIL Codes**: 1,119 records (83%) - **Websites**: 1,200+ records (89%) - **Digital Platforms**: 1,119 records (83%) - **Wikidata IDs**: 8 records (0.6%) - *test batch only* --- ## Wikidata Enrichment Status ### Current Progress - **Test Batch**: 10 records processed ✓ - **Success Rate**: 80% (8/10 matched) - **Full Dataset**: Pending (1,341 records remaining) ### Enriched Records See `/docs/NDE_WIKIDATA_ENRICHMENT_REPORT.md` for complete enrichment results. **Sample enriched record**: ```yaml - plaatsnaam_bezoekadres: Assen straat_en_huisnummer_bezoekadres: Brink 1 organisatie: Stichting Drents Museum webadres_organisatie: https://drentsmuseum.nl/ type_organisatie: museum isil-code_na: NL-AsnDM wikidata_id: Q1258370 # ← Wikidata enrichment ``` ### No-Match Records Records flagged with `wikidata_enrichment_status: no_match_found`: 1. Branch locations (e.g., museum extensions) 2. Inter-municipal partnerships 3. Small local societies --- ## Schema Documentation ### LinkML Schemas Located in `linkml/` subdirectory: 1. **`nde_csv_source.yaml`** - Original CSV structure (33 columns) 2. **`nde_yaml_target.yaml`** - Normalized YAML structure (34 fields including Wikidata) 3. **`nde_csv_to_yaml_mapping.yaml`** - Field transformation documentation ### Field Definitions **Core Fields**: - `organisatie` - Organization name - `type_organisatie` - Organization type (museum, archief, bibliotheek, etc.) - `plaatsnaam_bezoekadres` - City/town - `straat_en_huisnummer_bezoekadres` - Street address - `webadres_organisatie` - Website URL - `isil-code_na` - ISIL identifier (NL-XXX format) **Enrichment Fields** (NEW): - `wikidata_id` - Wikidata Q-number (e.g., Q1258370) - `wikidata_enrichment_status` - Enrichment status flag **Platform Integration** (40+ fields): - Collection management systems (Atlantis, MAIS, etc.) - Aggregation platforms (Collectie Nederland, Archieven.nl, etc.) - Thematic networks (WO2Net, Modemuze, Van Gogh Worldwide, etc.) See `/docs/CSV_TO_YAML_QUICK_REFERENCE.md` for complete field reference. --- ## Usage Examples ### Load YAML Data (Python) ```python import yaml with open('voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml', 'r') as f: organizations = yaml.safe_load(f) # Filter by type museums = [org for org in organizations if org.get('type_organisatie') == 'museum'] # Find organizations with Wikidata IDs enriched = [org for org in organizations if 'wikidata_id' in org] # Filter by ISIL code with_isil = [org for org in organizations if 'isil-code_na' in org] ``` ### Query Wikidata-Enriched Records ```python import yaml with open('voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml', 'r') as f: organizations = yaml.safe_load(f) # Get all enriched records enriched = [ org for org in organizations if org.get('wikidata_id') ] for org in enriched: print(f"{org['organisatie']}: https://www.wikidata.org/wiki/{org['wikidata_id']}") ``` ### Validate Against LinkML Schema ```bash linkml-validate \ -s linkml/nde_yaml_target.yaml \ voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml ``` --- ## Conversion & Enrichment Scripts Located in `/scripts/`: ### CSV to YAML Conversion - `convert_nde_csv_to_yaml.py` - Initial CSV → YAML conversion - `validate_csv_to_yaml_conversion.py` - Validation script (zero data loss verified) ### Wikidata Enrichment - `update_nde_yaml_with_wikidata_test_batch.py` - Test batch enrichment (10 records) ✓ - `enrich_nde_with_wikidata.py` - Full dataset enrichment (prepared, not yet run) - `prepare_wikidata_enrichment.py` - Interactive enrichment helper --- ## SPARQL Query Logs All Wikidata queries logged in `sparql/` subdirectory: ### Query Types 1. **Direct entity search** - By organization name 2. **SPARQL queries** - For municipalities and specialized searches 3. **Metadata verification** - Confirm Q-number matches ### Log Files - `*_prepared.json` - Prepared SPARQL queries (10 files) - `enrichment_log_test_batch_*.json` - Enrichment results - `master_query_log_*.json` - Consolidated query history ### Example SPARQL Query ```sparql SELECT ?item ?itemLabel WHERE { ?item wdt:P31 wd:Q2039348 . # Instance of: Dutch municipality ?item wdt:P131 wd:Q770 . # Located in: Drenthe ?item rdfs:label "Coevorden"@nl . SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en". } } ``` --- ## Integration with Main GLAM Project ### Mapping to HeritageCustodian Schema NDE organizations will be converted to the main project's `HeritageCustodian` LinkML schema: **Field Mappings**: ```yaml HeritageCustodian: name: organisatie institution_type: type_organisatie # Mapped to GLAMORCUBESFIXPHDNT taxonomy locations: - city: plaatsnaam_bezoekadres street_address: straat_en_huisnummer_bezoekadres identifiers: - identifier_scheme: "ISIL" identifier_value: isil-code_na - identifier_scheme: "Wikidata" identifier_value: wikidata_id ``` ### GHCID Generation All NDE organizations will receive Global Heritage Custodian Identifiers: ``` NL-DR-ASN-M-DM # Stichting Drents Museum NL-DR-ASN-A-DA # Drents Archief NL-DR-BOR-M-HC # Hunebedcentrum ``` Format: `{Country}-{Province}-{City}-{Type}-{Abbreviation}` See `/docs/PERSISTENT_IDENTIFIERS.md` for GHCID specification. --- ## Data Quality Notes ### Known Issues 1. **Unnamed first column**: Some records have province/region in unnamed column 2. **ISIL code format**: Some non-standard codes (e.g., "Drente" instead of NL-XXX format) 3. **Multiline addresses**: Some addresses span multiple fields 4. **Closed institutions**: Some organizations marked as closed (check `unnamed_field`) ### Validation Results From `scripts/validate_csv_to_yaml_conversion.py`: - ✓ All 33 CSV columns mapped - ✓ All 6,980 non-empty cells preserved - ✓ Zero data loss - ✓ Zero mismatches --- ## Next Steps ### Immediate Tasks 1. **Scale Wikidata enrichment** to full dataset (1,341 records) 2. **Handle ambiguous matches** - Set up manual review queue 3. **Create Wikidata entries** for missing high-priority organizations 4. **Validate all Q-numbers** - Verify they resolve correctly ### Integration Tasks 5. **Convert to HeritageCustodian format** - Map to main LinkML schema 6. **Generate GHCIDs** - Create persistent identifiers 7. **Export to RDF/JSON-LD** - With Wikidata links 8. **Merge with ISIL registry** - Cross-link with Dutch ISIL dataset ### Documentation Updates 9. Update project `PROGRESS.md` with NDE statistics 10. Create NDE-specific extraction guide 11. Document manual Wikidata creation workflow --- ## References - **Main Documentation**: `/docs/NDE_WIKIDATA_ENRICHMENT_REPORT.md` - **Schema Reference**: `/docs/CSV_TO_YAML_QUICK_REFERENCE.md` - **Validation Report**: `/docs/NDE_CSV_TO_YAML_LINKML_VALIDATION.md` - **Project Guide**: `/AGENTS.md` (AI agent instructions) --- ## Contact & Support **Project**: GLAM Data Extraction Project **Repository**: `/Users/kempersc/apps/glam` **Dataset Version**: v1.1 (with Wikidata enrichment) **Last Enrichment**: 2025-11-17 (test batch) --- **End of README**