- Implemented `generate_mermaid_with_instances.py` to create ER diagrams that include all classes, relationships, enum values, and instance data. - Loaded instance data from YAML files and enriched enum definitions with meaningful annotations. - Configured output paths for generated diagrams in both frontend and schema directories. - Added support for excluding technical classes and limiting the number of displayed enum and instance values for readability.
7.5 KiB
7.5 KiB
NDE URL Discovery Report
Date: 2025-12-01
Dataset: NDE Enriched Entries
Total Entries: 1,674
Summary Statistics
| Category | Count | Percentage |
|---|---|---|
| With URL (any source) | 1,621 | 96.8% |
| Without URL | 53 | 3.2% |
| - With Wikidata ID | 31 | - |
| - Without Wikidata ID | 22 | - |
URLs Discovered via Wikidata P856
Found official websites for 16 entries by querying Wikidata property P856:
| Entry ID | Wikidata | Institution | URL |
|---|---|---|---|
| 0006 | Q81181279 | Rijckheyt | http://www.rijckheyt.nl |
| 0011 | Q110907483 | Stichting Nyen ä Wes van Nassau | https://www.nyenaenwasvannassau.nl/site/ |
| 0047 | Q110907518 | HKK Zuidkwartier | https://www.hkk-zuidkwartier.nl/ |
| 0059 | Q110907493 | Heemkunde Ravenstein | http://www.heemkunderavenstein.nl/ |
| 0107 | Q110907545 | Heemkunde Gemonde | http://www.heemkundegemonde.nl/ |
| 0135 | Q110995942 | Saet en Cruyt | https://www.saetencruyt.nl/ |
| 0211 | Q701 | Provincie Noord-Holland | https://www.noord-holland.nl/ |
| 0387 | Q2370540 | Collectie Six | http://nl.collectiesix.nl/ |
| 0507 | Q113119338 | Stichting Omroep Muziek | https://www.omroepmuziek.nl/ |
| 0700 | Q121225126 | Stadsarchief Oldenzaal | https://www.oldenzaal.nl/stadsarchief |
| 0839 | Q110279958 | Stichting Utrecht Altijd | https://www.utrechtaltijd.nl/ |
| 1022 | Q111509619 | Stichting Historische Behangsels | https://www.historischebehangsels.nl |
| 1195 | Q2334800 | Verzetsmuseum Zuid-Holland | http://www.verzetsmuseum-zh.nl |
| 1268 | Q2248381 | Spaarnestad Photo | http://www.spaarnestadphoto.nl |
| 1637 | Q59962362 | Bibliotheek De Groene Venen | https://www.bibliotheekdegroenevenen.nl |
| 0212 | Q1897962 | Geldmuseum | Merged into DNB "De Nieuwe Schatkamer" |
URLs Discovered via Web Search (Exa)
Found URLs for 10 additional entries:
| Entry ID | Institution | URL | Notes |
|---|---|---|---|
| 0199 | Bibliotheek TU Kampen | https://tuu.nl/bibliotheek/ | Merged with TU Utrecht |
| 0389 | Heemkunde Ambt-Delden | https://www.heemkundedelden.nl/ | |
| 0413 | Historische Vereniging Old Deep'n | https://www.olddeepn.nl/ | Diepenveen |
| 0594 | Sambeeks Heem | https://www.sambeeksheem.nl/ | |
| 0627 | Stichting Weeshuisjes | https://www.weeshuisjes.nl/ | |
| 0715 | HDC Protestants Erfgoed | https://www.hdcvu.nl/ | Within VU Library |
| 0729 | Historische Vereniging Staphorst | https://www.historischeverenigingstaphorst.nl/ | |
| 0851 | Historische Vereniging Den Dolder | https://www.historischeverenigingdendolder.nl/ | |
| 1170 | Nederlandse Vereniging voor Papierknipkunst | https://papierknippen.nl/ | Original NDE entry |
| 1504 | Nederlandse Vereniging voor Papierknipkunst | https://papierknippen.nl/ | Same org, from NAN ISIL 2025-11-06 |
Entries Without Dedicated Websites (Parent Organization Only)
| Entry ID | Institution | Parent Organization | Notes |
|---|---|---|---|
| 1512 | Diocesane Commissie Kerkelijk Kunstbezit | Bisdom Roermond (https://bisdom-roermond.org/) | Commission managing diocesan church art - operates under diocese, no dedicated website |
Problematic Entries
NOT Heritage Custodians
| Entry ID | Wikidata | Name | Issue |
|---|---|---|---|
| 0874 | Q2789869 | HEEMAF | Was an electrical equipment manufacturer (1906-1973), NOT a heritage institution |
Recommendation: Remove from dataset or mark as status: NOT_HERITAGE
Closed/Defunct
| Entry ID | Wikidata | Name | Issue |
|---|---|---|---|
| 1130 | Q110282061 | Museum Oud Westdorpe | Permanently closed |
Recommendation: Mark as status: CLOSED
Duplicate Entries
| Entry IDs | Wikidata | Name | Issue |
|---|---|---|---|
| 0875, 0993 | Q110891769 | (Same institution) | Duplicate Wikidata ID |
| 0967, 0950 | - | (Same institution) | Duplicate entries |
Recommendation: Merge records, keep one canonical entry
Entries Still Without URLs (~16)
These entries have no discoverable website:
| Entry ID | Name | Notes |
|---|---|---|
| Various | Small heemkundige kringen | May only have Facebook presence |
| Various | Historical societies | May be defunct or volunteer-run |
| 0210 | Greccio Museum (Leiden) | Located inside Hartebrug church, no dedicated website |
Actions Taken
- ✅ Created
docs/nde/directory - ✅ Created this URL Discovery Report
- ✅ Updated 25 YAML files with discovered URLs (15 Wikidata + 9 web search + 1 dedicated page)
- ✅ Flagged problematic entries:
0874_Q2789869.yaml(HEEMAF):NOT_HERITAGE- was electrical manufacturer1130_Q110282061.yaml(Museum Oud Westdorpe):CLOSED- permanently closed0875_Q110891769.yaml&0993_Q110891769.yaml:DUPLICATE- same Wikidata ID0950_unknown.yaml&0967_Q110891782.yaml:DUPLICATE- same institution
Final URL Coverage
| Category | Count | Percentage |
|---|---|---|
| With URL | 1,619 | 96.7% |
| Without URL | 55 | 3.3% |
| Flagged (duplicates, not heritage, closed) | 5 | 0.3% |
Remaining Entries Without URLs (55)
These entries have no discoverable website. Many are:
- Small heemkundige kringen (local history societies)
- Entries with Wikidata IDs but no P856 property
- Defunct or volunteer-run organizations
- Facebook-only presence
Example entries still missing URLs:
0001_Q2679819: Stichting Hunebedcentrum (has Wikidata, needs P856 check)0139_de_hollandse_cirkel: De Hollandse Cirkel (no Wikidata)0144_Q2710899: Nationaal Onderduik Museum (has Wikidata)0148_Q69725772: CollectieGelderland (has Wikidata)- Various small historical societies in Overijssel
Recommendations
- Query Wikidata again for entries with Q-numbers but no P856 discovered
- Web search remaining entries with organization names
- Check for dedicated pages on umbrella organization websites (like Museum Greccio on Hartebrug church site)
- Mark as
url_status: NOT_FOUNDfor entries where no web presence exists - Consider Facebook URLs as fallback for small volunteer organizations
Technical Notes
- Added
digital_platformssection with properplatform_typeclassification:OFFICIAL_WEBSITEfor standalone websitesDEDICATED_PAGEfor pages on host websites (like Museum Greccio)
- Preserved
data_tierin provenance:TIER_2_VERIFIEDfor Wikidata P856 URLsTIER_4_INFERREDfor web search discovered URLs
Generated by NDE URL Discovery workflow
Last updated: 2025-12-01T16:30:00+00:00
Session Updates - December 2025
2025-12-01: NAN ISIL Batch Corrections
Fixed several issues with entries 1502-1513 (NAN ISIL 2025-11-06 batch):
| Entry ID | Issue | Resolution |
|---|---|---|
| 1504 | Missing URL | Added https://papierknippen.nl/ (discovered via Exa) |
| 1508 | Wrong custodian_name ("Cookiesbeleid") | Corrected to "Parochiearchief Kampen" from NAN ISIL registry |
| 1508 | Wrong institution type (U) | Changed to A (Archive) |
| 1508 | Incorrect GHCID | Regenerated: NL-OV-KAM-A-PK |
| 1511 | Wrong custodian_name (exhibition title) | Already corrected to "Wereldmuseum Leiden" |
| 1511 | Wrong institution type (U) | Changed to M (Museum) |
| 1511 | GHCID based on exhibition title | Regenerated: NL-ZH-LEI-M-WL |
| 1512 | Wrong institution type (U) | Changed to H (Holy Sites - diocesan heritage commission) |
| 1512 | No website info | Added parent organization note (Bisdom Roermond) |
| 1512 | Incorrect GHCID | Regenerated: NL-LI-ROE-H-DCKK |