311 lines
8.6 KiB
Markdown
311 lines
8.6 KiB
Markdown
# NDE Dutch Heritage Organizations Dataset
|
|
|
|
**Dataset Name**: Voorbeeld lijst organisaties en diensten - Totaallijst Nederland
|
|
**Source**: Network Digital Heritage (NDE)
|
|
**Records**: 1,351 Dutch heritage organizations
|
|
**Last Updated**: 2025-11-17
|
|
**Enrichment Status**: Test batch complete (10 records with Wikidata IDs)
|
|
|
|
---
|
|
|
|
## Dataset Overview
|
|
|
|
This directory contains the NDE dataset of Dutch heritage organizations, converted from CSV to YAML format with Wikidata enrichment.
|
|
|
|
### Files
|
|
|
|
| File | Size | Description |
|
|
|------|------|-------------|
|
|
| `voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv` | 168 KB | Original CSV source (1,351 records) |
|
|
| `voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml` | 259 KB | Converted YAML with enrichment |
|
|
| `voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.backup.*.yaml` | 259 KB | Backup before enrichment |
|
|
| `sample_yaml_for_validation.yaml` | 2 KB | Sample for validation testing |
|
|
|
|
### Subdirectories
|
|
|
|
- **`linkml/`** - LinkML schemas for CSV source, YAML target, and field mappings
|
|
- **`sparql/`** - SPARQL query logs and enrichment results
|
|
|
|
---
|
|
|
|
## Dataset Statistics
|
|
|
|
### Record Counts by Type
|
|
|
|
| Type | Count | Percentage |
|
|
|------|-------|------------|
|
|
| Archive (archief) | ~600 | 44% |
|
|
| Museum | ~500 | 37% |
|
|
| Library (bibliotheek) | ~150 | 11% |
|
|
| Historical Society (historische vereniging) | ~100 | 7% |
|
|
| **Total** | **1,351** | **100%** |
|
|
|
|
### Geographic Coverage
|
|
|
|
- **Provinces**: All 12 Dutch provinces
|
|
- **Cities**: 475+ municipalities
|
|
- **Focus**: Drenthe province (test batch)
|
|
|
|
### Data Quality
|
|
|
|
- **ISIL Codes**: 1,119 records (83%)
|
|
- **Websites**: 1,200+ records (89%)
|
|
- **Digital Platforms**: 1,119 records (83%)
|
|
- **Wikidata IDs**: 8 records (0.6%) - *test batch only*
|
|
|
|
---
|
|
|
|
## Wikidata Enrichment Status
|
|
|
|
### Current Progress
|
|
|
|
- **Test Batch**: 10 records processed ✓
|
|
- **Success Rate**: 80% (8/10 matched)
|
|
- **Full Dataset**: Pending (1,341 records remaining)
|
|
|
|
### Enriched Records
|
|
|
|
See `/docs/NDE_WIKIDATA_ENRICHMENT_REPORT.md` for complete enrichment results.
|
|
|
|
**Sample enriched record**:
|
|
```yaml
|
|
- plaatsnaam_bezoekadres: Assen
|
|
straat_en_huisnummer_bezoekadres: Brink 1
|
|
organisatie: Stichting Drents Museum
|
|
webadres_organisatie: https://drentsmuseum.nl/
|
|
type_organisatie: museum
|
|
isil-code_na: NL-AsnDM
|
|
wikidata_id: Q1258370 # ← Wikidata enrichment
|
|
```
|
|
|
|
### No-Match Records
|
|
|
|
Records flagged with `wikidata_enrichment_status: no_match_found`:
|
|
1. Branch locations (e.g., museum extensions)
|
|
2. Inter-municipal partnerships
|
|
3. Small local societies
|
|
|
|
---
|
|
|
|
## Schema Documentation
|
|
|
|
### LinkML Schemas
|
|
|
|
Located in `linkml/` subdirectory:
|
|
|
|
1. **`nde_csv_source.yaml`** - Original CSV structure (33 columns)
|
|
2. **`nde_yaml_target.yaml`** - Normalized YAML structure (34 fields including Wikidata)
|
|
3. **`nde_csv_to_yaml_mapping.yaml`** - Field transformation documentation
|
|
|
|
### Field Definitions
|
|
|
|
**Core Fields**:
|
|
- `organisatie` - Organization name
|
|
- `type_organisatie` - Organization type (museum, archief, bibliotheek, etc.)
|
|
- `plaatsnaam_bezoekadres` - City/town
|
|
- `straat_en_huisnummer_bezoekadres` - Street address
|
|
- `webadres_organisatie` - Website URL
|
|
- `isil-code_na` - ISIL identifier (NL-XXX format)
|
|
|
|
**Enrichment Fields** (NEW):
|
|
- `wikidata_id` - Wikidata Q-number (e.g., Q1258370)
|
|
- `wikidata_enrichment_status` - Enrichment status flag
|
|
|
|
**Platform Integration** (40+ fields):
|
|
- Collection management systems (Atlantis, MAIS, etc.)
|
|
- Aggregation platforms (Collectie Nederland, Archieven.nl, etc.)
|
|
- Thematic networks (WO2Net, Modemuze, Van Gogh Worldwide, etc.)
|
|
|
|
See `/docs/CSV_TO_YAML_QUICK_REFERENCE.md` for complete field reference.
|
|
|
|
---
|
|
|
|
## Usage Examples
|
|
|
|
### Load YAML Data (Python)
|
|
|
|
```python
|
|
import yaml
|
|
|
|
with open('voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml', 'r') as f:
|
|
organizations = yaml.safe_load(f)
|
|
|
|
# Filter by type
|
|
museums = [org for org in organizations if org.get('type_organisatie') == 'museum']
|
|
|
|
# Find organizations with Wikidata IDs
|
|
enriched = [org for org in organizations if 'wikidata_id' in org]
|
|
|
|
# Filter by ISIL code
|
|
with_isil = [org for org in organizations if 'isil-code_na' in org]
|
|
```
|
|
|
|
### Query Wikidata-Enriched Records
|
|
|
|
```python
|
|
import yaml
|
|
|
|
with open('voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml', 'r') as f:
|
|
organizations = yaml.safe_load(f)
|
|
|
|
# Get all enriched records
|
|
enriched = [
|
|
org for org in organizations
|
|
if org.get('wikidata_id')
|
|
]
|
|
|
|
for org in enriched:
|
|
print(f"{org['organisatie']}: https://www.wikidata.org/wiki/{org['wikidata_id']}")
|
|
```
|
|
|
|
### Validate Against LinkML Schema
|
|
|
|
```bash
|
|
linkml-validate \
|
|
-s linkml/nde_yaml_target.yaml \
|
|
voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.yaml
|
|
```
|
|
|
|
---
|
|
|
|
## Conversion & Enrichment Scripts
|
|
|
|
Located in `/scripts/`:
|
|
|
|
### CSV to YAML Conversion
|
|
- `convert_nde_csv_to_yaml.py` - Initial CSV → YAML conversion
|
|
- `validate_csv_to_yaml_conversion.py` - Validation script (zero data loss verified)
|
|
|
|
### Wikidata Enrichment
|
|
- `update_nde_yaml_with_wikidata_test_batch.py` - Test batch enrichment (10 records) ✓
|
|
- `enrich_nde_with_wikidata.py` - Full dataset enrichment (prepared, not yet run)
|
|
- `prepare_wikidata_enrichment.py` - Interactive enrichment helper
|
|
|
|
---
|
|
|
|
## SPARQL Query Logs
|
|
|
|
All Wikidata queries logged in `sparql/` subdirectory:
|
|
|
|
### Query Types
|
|
1. **Direct entity search** - By organization name
|
|
2. **SPARQL queries** - For municipalities and specialized searches
|
|
3. **Metadata verification** - Confirm Q-number matches
|
|
|
|
### Log Files
|
|
- `*_prepared.json` - Prepared SPARQL queries (10 files)
|
|
- `enrichment_log_test_batch_*.json` - Enrichment results
|
|
- `master_query_log_*.json` - Consolidated query history
|
|
|
|
### Example SPARQL Query
|
|
|
|
```sparql
|
|
SELECT ?item ?itemLabel WHERE {
|
|
?item wdt:P31 wd:Q2039348 . # Instance of: Dutch municipality
|
|
?item wdt:P131 wd:Q770 . # Located in: Drenthe
|
|
?item rdfs:label "Coevorden"@nl .
|
|
SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en". }
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Integration with Main GLAM Project
|
|
|
|
### Mapping to HeritageCustodian Schema
|
|
|
|
NDE organizations will be converted to the main project's `HeritageCustodian` LinkML schema:
|
|
|
|
**Field Mappings**:
|
|
```yaml
|
|
HeritageCustodian:
|
|
name: organisatie
|
|
institution_type: type_organisatie # Mapped to GLAMORCUBESFIXPHDNT taxonomy
|
|
locations:
|
|
- city: plaatsnaam_bezoekadres
|
|
street_address: straat_en_huisnummer_bezoekadres
|
|
identifiers:
|
|
- identifier_scheme: "ISIL"
|
|
identifier_value: isil-code_na
|
|
- identifier_scheme: "Wikidata"
|
|
identifier_value: wikidata_id
|
|
```
|
|
|
|
### GHCID Generation
|
|
|
|
All NDE organizations will receive Global Heritage Custodian Identifiers:
|
|
|
|
```
|
|
NL-DR-ASN-M-DM # Stichting Drents Museum
|
|
NL-DR-ASN-A-DA # Drents Archief
|
|
NL-DR-BOR-M-HC # Hunebedcentrum
|
|
```
|
|
|
|
Format: `{Country}-{Province}-{City}-{Type}-{Abbreviation}`
|
|
|
|
See `/docs/PERSISTENT_IDENTIFIERS.md` for GHCID specification.
|
|
|
|
---
|
|
|
|
## Data Quality Notes
|
|
|
|
### Known Issues
|
|
|
|
1. **Unnamed first column**: Some records have province/region in unnamed column
|
|
2. **ISIL code format**: Some non-standard codes (e.g., "Drente" instead of NL-XXX format)
|
|
3. **Multiline addresses**: Some addresses span multiple fields
|
|
4. **Closed institutions**: Some organizations marked as closed (check `unnamed_field`)
|
|
|
|
### Validation Results
|
|
|
|
From `scripts/validate_csv_to_yaml_conversion.py`:
|
|
- ✓ All 33 CSV columns mapped
|
|
- ✓ All 6,980 non-empty cells preserved
|
|
- ✓ Zero data loss
|
|
- ✓ Zero mismatches
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
### Immediate Tasks
|
|
|
|
1. **Scale Wikidata enrichment** to full dataset (1,341 records)
|
|
2. **Handle ambiguous matches** - Set up manual review queue
|
|
3. **Create Wikidata entries** for missing high-priority organizations
|
|
4. **Validate all Q-numbers** - Verify they resolve correctly
|
|
|
|
### Integration Tasks
|
|
|
|
5. **Convert to HeritageCustodian format** - Map to main LinkML schema
|
|
6. **Generate GHCIDs** - Create persistent identifiers
|
|
7. **Export to RDF/JSON-LD** - With Wikidata links
|
|
8. **Merge with ISIL registry** - Cross-link with Dutch ISIL dataset
|
|
|
|
### Documentation Updates
|
|
|
|
9. Update project `PROGRESS.md` with NDE statistics
|
|
10. Create NDE-specific extraction guide
|
|
11. Document manual Wikidata creation workflow
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- **Main Documentation**: `/docs/NDE_WIKIDATA_ENRICHMENT_REPORT.md`
|
|
- **Schema Reference**: `/docs/CSV_TO_YAML_QUICK_REFERENCE.md`
|
|
- **Validation Report**: `/docs/NDE_CSV_TO_YAML_LINKML_VALIDATION.md`
|
|
- **Project Guide**: `/AGENTS.md` (AI agent instructions)
|
|
|
|
---
|
|
|
|
## Contact & Support
|
|
|
|
**Project**: GLAM Data Extraction Project
|
|
**Repository**: `/Users/kempersc/apps/glam`
|
|
**Dataset Version**: v1.1 (with Wikidata enrichment)
|
|
**Last Enrichment**: 2025-11-17 (test batch)
|
|
|
|
---
|
|
|
|
**End of README**
|