- Implemented `generate_mermaid_with_instances.py` to create ER diagrams that include all classes, relationships, enum values, and instance data. - Loaded instance data from YAML files and enriched enum definitions with meaningful annotations. - Configured output paths for generated diagrams in both frontend and schema directories. - Added support for excluding technical classes and limiting the number of displayed enum and instance values for readability.
171 lines
7.5 KiB
Markdown
171 lines
7.5 KiB
Markdown
# NDE URL Discovery Report
|
|
|
|
**Date**: 2025-12-01
|
|
**Dataset**: NDE Enriched Entries
|
|
**Total Entries**: 1,674
|
|
|
|
## Summary Statistics
|
|
|
|
| Category | Count | Percentage |
|
|
|----------|-------|------------|
|
|
| With URL (any source) | 1,621 | 96.8% |
|
|
| Without URL | 53 | 3.2% |
|
|
| - With Wikidata ID | 31 | - |
|
|
| - Without Wikidata ID | 22 | - |
|
|
|
|
## URLs Discovered via Wikidata P856
|
|
|
|
Found official websites for **16 entries** by querying Wikidata property P856:
|
|
|
|
| Entry ID | Wikidata | Institution | URL |
|
|
|----------|----------|-------------|-----|
|
|
| 0006 | Q81181279 | Rijckheyt | http://www.rijckheyt.nl |
|
|
| 0011 | Q110907483 | Stichting Nyen ä Wes van Nassau | https://www.nyenaenwasvannassau.nl/site/ |
|
|
| 0047 | Q110907518 | HKK Zuidkwartier | https://www.hkk-zuidkwartier.nl/ |
|
|
| 0059 | Q110907493 | Heemkunde Ravenstein | http://www.heemkunderavenstein.nl/ |
|
|
| 0107 | Q110907545 | Heemkunde Gemonde | http://www.heemkundegemonde.nl/ |
|
|
| 0135 | Q110995942 | Saet en Cruyt | https://www.saetencruyt.nl/ |
|
|
| 0211 | Q701 | Provincie Noord-Holland | https://www.noord-holland.nl/ |
|
|
| 0387 | Q2370540 | Collectie Six | http://nl.collectiesix.nl/ |
|
|
| 0507 | Q113119338 | Stichting Omroep Muziek | https://www.omroepmuziek.nl/ |
|
|
| 0700 | Q121225126 | Stadsarchief Oldenzaal | https://www.oldenzaal.nl/stadsarchief |
|
|
| 0839 | Q110279958 | Stichting Utrecht Altijd | https://www.utrechtaltijd.nl/ |
|
|
| 1022 | Q111509619 | Stichting Historische Behangsels | https://www.historischebehangsels.nl |
|
|
| 1195 | Q2334800 | Verzetsmuseum Zuid-Holland | http://www.verzetsmuseum-zh.nl |
|
|
| 1268 | Q2248381 | Spaarnestad Photo | http://www.spaarnestadphoto.nl |
|
|
| 1637 | Q59962362 | Bibliotheek De Groene Venen | https://www.bibliotheekdegroenevenen.nl |
|
|
| 0212 | Q1897962 | Geldmuseum | *Merged into DNB "De Nieuwe Schatkamer"* |
|
|
|
|
## URLs Discovered via Web Search (Exa)
|
|
|
|
Found URLs for **10 additional entries**:
|
|
|
|
| Entry ID | Institution | URL | Notes |
|
|
|----------|-------------|-----|-------|
|
|
| 0199 | Bibliotheek TU Kampen | https://tuu.nl/bibliotheek/ | Merged with TU Utrecht |
|
|
| 0389 | Heemkunde Ambt-Delden | https://www.heemkundedelden.nl/ | |
|
|
| 0413 | Historische Vereniging Old Deep'n | https://www.olddeepn.nl/ | Diepenveen |
|
|
| 0594 | Sambeeks Heem | https://www.sambeeksheem.nl/ | |
|
|
| 0627 | Stichting Weeshuisjes | https://www.weeshuisjes.nl/ | |
|
|
| 0715 | HDC Protestants Erfgoed | https://www.hdcvu.nl/ | Within VU Library |
|
|
| 0729 | Historische Vereniging Staphorst | https://www.historischeverenigingstaphorst.nl/ | |
|
|
| 0851 | Historische Vereniging Den Dolder | https://www.historischeverenigingdendolder.nl/ | |
|
|
| 1170 | Nederlandse Vereniging voor Papierknipkunst | https://papierknippen.nl/ | Original NDE entry |
|
|
| 1504 | Nederlandse Vereniging voor Papierknipkunst | https://papierknippen.nl/ | Same org, from NAN ISIL 2025-11-06 |
|
|
|
|
## Entries Without Dedicated Websites (Parent Organization Only)
|
|
|
|
| Entry ID | Institution | Parent Organization | Notes |
|
|
|----------|-------------|---------------------|-------|
|
|
| 1512 | Diocesane Commissie Kerkelijk Kunstbezit | Bisdom Roermond (https://bisdom-roermond.org/) | Commission managing diocesan church art - operates under diocese, no dedicated website |
|
|
|
|
## Problematic Entries
|
|
|
|
### NOT Heritage Custodians
|
|
|
|
| Entry ID | Wikidata | Name | Issue |
|
|
|----------|----------|------|-------|
|
|
| 0874 | Q2789869 | HEEMAF | Was an electrical equipment manufacturer (1906-1973), NOT a heritage institution |
|
|
|
|
**Recommendation**: Remove from dataset or mark as `status: NOT_HERITAGE`
|
|
|
|
### Closed/Defunct
|
|
|
|
| Entry ID | Wikidata | Name | Issue |
|
|
|----------|----------|------|-------|
|
|
| 1130 | Q110282061 | Museum Oud Westdorpe | Permanently closed |
|
|
|
|
**Recommendation**: Mark as `status: CLOSED`
|
|
|
|
### Duplicate Entries
|
|
|
|
| Entry IDs | Wikidata | Name | Issue |
|
|
|-----------|----------|------|-------|
|
|
| 0875, 0993 | Q110891769 | (Same institution) | Duplicate Wikidata ID |
|
|
| 0967, 0950 | - | (Same institution) | Duplicate entries |
|
|
|
|
**Recommendation**: Merge records, keep one canonical entry
|
|
|
|
## Entries Still Without URLs (~16)
|
|
|
|
These entries have no discoverable website:
|
|
|
|
| Entry ID | Name | Notes |
|
|
|----------|------|-------|
|
|
| Various | Small heemkundige kringen | May only have Facebook presence |
|
|
| Various | Historical societies | May be defunct or volunteer-run |
|
|
| 0210 | Greccio Museum (Leiden) | Located inside Hartebrug church, no dedicated website |
|
|
|
|
## Actions Taken
|
|
|
|
1. ✅ Created `docs/nde/` directory
|
|
2. ✅ Created this URL Discovery Report
|
|
3. ✅ Updated 25 YAML files with discovered URLs (15 Wikidata + 9 web search + 1 dedicated page)
|
|
4. ✅ Flagged problematic entries:
|
|
- `0874_Q2789869.yaml` (HEEMAF): `NOT_HERITAGE` - was electrical manufacturer
|
|
- `1130_Q110282061.yaml` (Museum Oud Westdorpe): `CLOSED` - permanently closed
|
|
- `0875_Q110891769.yaml` & `0993_Q110891769.yaml`: `DUPLICATE` - same Wikidata ID
|
|
- `0950_unknown.yaml` & `0967_Q110891782.yaml`: `DUPLICATE` - same institution
|
|
|
|
## Final URL Coverage
|
|
|
|
| Category | Count | Percentage |
|
|
|----------|-------|------------|
|
|
| **With URL** | 1,619 | **96.7%** |
|
|
| **Without URL** | 55 | 3.3% |
|
|
| **Flagged (duplicates, not heritage, closed)** | 5 | 0.3% |
|
|
|
|
## Remaining Entries Without URLs (55)
|
|
|
|
These entries have no discoverable website. Many are:
|
|
- Small heemkundige kringen (local history societies)
|
|
- Entries with Wikidata IDs but no P856 property
|
|
- Defunct or volunteer-run organizations
|
|
- Facebook-only presence
|
|
|
|
**Example entries still missing URLs:**
|
|
- `0001_Q2679819`: Stichting Hunebedcentrum (has Wikidata, needs P856 check)
|
|
- `0139_de_hollandse_cirkel`: De Hollandse Cirkel (no Wikidata)
|
|
- `0144_Q2710899`: Nationaal Onderduik Museum (has Wikidata)
|
|
- `0148_Q69725772`: CollectieGelderland (has Wikidata)
|
|
- Various small historical societies in Overijssel
|
|
|
|
## Recommendations
|
|
|
|
1. **Query Wikidata again** for entries with Q-numbers but no P856 discovered
|
|
2. **Web search** remaining entries with organization names
|
|
3. **Check for dedicated pages** on umbrella organization websites (like Museum Greccio on Hartebrug church site)
|
|
4. **Mark as `url_status: NOT_FOUND`** for entries where no web presence exists
|
|
5. **Consider Facebook URLs** as fallback for small volunteer organizations
|
|
|
|
## Technical Notes
|
|
|
|
- Added `digital_platforms` section with proper `platform_type` classification:
|
|
- `OFFICIAL_WEBSITE` for standalone websites
|
|
- `DEDICATED_PAGE` for pages on host websites (like Museum Greccio)
|
|
- Preserved `data_tier` in provenance:
|
|
- `TIER_2_VERIFIED` for Wikidata P856 URLs
|
|
- `TIER_4_INFERRED` for web search discovered URLs
|
|
|
|
---
|
|
*Generated by NDE URL Discovery workflow*
|
|
*Last updated: 2025-12-01T16:30:00+00:00*
|
|
|
|
## Session Updates - December 2025
|
|
|
|
### 2025-12-01: NAN ISIL Batch Corrections
|
|
|
|
Fixed several issues with entries 1502-1513 (NAN ISIL 2025-11-06 batch):
|
|
|
|
| Entry ID | Issue | Resolution |
|
|
|----------|-------|------------|
|
|
| 1504 | Missing URL | Added https://papierknippen.nl/ (discovered via Exa) |
|
|
| 1508 | Wrong custodian_name ("Cookiesbeleid") | Corrected to "Parochiearchief Kampen" from NAN ISIL registry |
|
|
| 1508 | Wrong institution type (U) | Changed to A (Archive) |
|
|
| 1508 | Incorrect GHCID | Regenerated: NL-OV-KAM-A-PK |
|
|
| 1511 | Wrong custodian_name (exhibition title) | Already corrected to "Wereldmuseum Leiden" |
|
|
| 1511 | Wrong institution type (U) | Changed to M (Museum) |
|
|
| 1511 | GHCID based on exhibition title | Regenerated: NL-ZH-LEI-M-WL |
|
|
| 1512 | Wrong institution type (U) | Changed to H (Holy Sites - diocesan heritage commission) |
|
|
| 1512 | No website info | Added parent organization note (Bisdom Roermond) |
|
|
| 1512 | Incorrect GHCID | Regenerated: NL-LI-ROE-H-DCKK |
|