glam/docs/nde/URL_DISCOVERY_REPORT.md
kempersc 48a2b26f59 feat: Add script to generate Mermaid ER diagrams with instance data from LinkML schemas
- Implemented `generate_mermaid_with_instances.py` to create ER diagrams that include all classes, relationships, enum values, and instance data.
- Loaded instance data from YAML files and enriched enum definitions with meaningful annotations.
- Configured output paths for generated diagrams in both frontend and schema directories.
- Added support for excluding technical classes and limiting the number of displayed enum and instance values for readability.
2025-12-01 16:58:03 +01:00

7.5 KiB

NDE URL Discovery Report

Date: 2025-12-01
Dataset: NDE Enriched Entries
Total Entries: 1,674

Summary Statistics

Category Count Percentage
With URL (any source) 1,621 96.8%
Without URL 53 3.2%
- With Wikidata ID 31 -
- Without Wikidata ID 22 -

URLs Discovered via Wikidata P856

Found official websites for 16 entries by querying Wikidata property P856:

Entry ID Wikidata Institution URL
0006 Q81181279 Rijckheyt http://www.rijckheyt.nl
0011 Q110907483 Stichting Nyen ä Wes van Nassau https://www.nyenaenwasvannassau.nl/site/
0047 Q110907518 HKK Zuidkwartier https://www.hkk-zuidkwartier.nl/
0059 Q110907493 Heemkunde Ravenstein http://www.heemkunderavenstein.nl/
0107 Q110907545 Heemkunde Gemonde http://www.heemkundegemonde.nl/
0135 Q110995942 Saet en Cruyt https://www.saetencruyt.nl/
0211 Q701 Provincie Noord-Holland https://www.noord-holland.nl/
0387 Q2370540 Collectie Six http://nl.collectiesix.nl/
0507 Q113119338 Stichting Omroep Muziek https://www.omroepmuziek.nl/
0700 Q121225126 Stadsarchief Oldenzaal https://www.oldenzaal.nl/stadsarchief
0839 Q110279958 Stichting Utrecht Altijd https://www.utrechtaltijd.nl/
1022 Q111509619 Stichting Historische Behangsels https://www.historischebehangsels.nl
1195 Q2334800 Verzetsmuseum Zuid-Holland http://www.verzetsmuseum-zh.nl
1268 Q2248381 Spaarnestad Photo http://www.spaarnestadphoto.nl
1637 Q59962362 Bibliotheek De Groene Venen https://www.bibliotheekdegroenevenen.nl
0212 Q1897962 Geldmuseum Merged into DNB "De Nieuwe Schatkamer"

URLs Discovered via Web Search (Exa)

Found URLs for 10 additional entries:

Entry ID Institution URL Notes
0199 Bibliotheek TU Kampen https://tuu.nl/bibliotheek/ Merged with TU Utrecht
0389 Heemkunde Ambt-Delden https://www.heemkundedelden.nl/
0413 Historische Vereniging Old Deep'n https://www.olddeepn.nl/ Diepenveen
0594 Sambeeks Heem https://www.sambeeksheem.nl/
0627 Stichting Weeshuisjes https://www.weeshuisjes.nl/
0715 HDC Protestants Erfgoed https://www.hdcvu.nl/ Within VU Library
0729 Historische Vereniging Staphorst https://www.historischeverenigingstaphorst.nl/
0851 Historische Vereniging Den Dolder https://www.historischeverenigingdendolder.nl/
1170 Nederlandse Vereniging voor Papierknipkunst https://papierknippen.nl/ Original NDE entry
1504 Nederlandse Vereniging voor Papierknipkunst https://papierknippen.nl/ Same org, from NAN ISIL 2025-11-06

Entries Without Dedicated Websites (Parent Organization Only)

Entry ID Institution Parent Organization Notes
1512 Diocesane Commissie Kerkelijk Kunstbezit Bisdom Roermond (https://bisdom-roermond.org/) Commission managing diocesan church art - operates under diocese, no dedicated website

Problematic Entries

NOT Heritage Custodians

Entry ID Wikidata Name Issue
0874 Q2789869 HEEMAF Was an electrical equipment manufacturer (1906-1973), NOT a heritage institution

Recommendation: Remove from dataset or mark as status: NOT_HERITAGE

Closed/Defunct

Entry ID Wikidata Name Issue
1130 Q110282061 Museum Oud Westdorpe Permanently closed

Recommendation: Mark as status: CLOSED

Duplicate Entries

Entry IDs Wikidata Name Issue
0875, 0993 Q110891769 (Same institution) Duplicate Wikidata ID
0967, 0950 - (Same institution) Duplicate entries

Recommendation: Merge records, keep one canonical entry

Entries Still Without URLs (~16)

These entries have no discoverable website:

Entry ID Name Notes
Various Small heemkundige kringen May only have Facebook presence
Various Historical societies May be defunct or volunteer-run
0210 Greccio Museum (Leiden) Located inside Hartebrug church, no dedicated website

Actions Taken

  1. Created docs/nde/ directory
  2. Created this URL Discovery Report
  3. Updated 25 YAML files with discovered URLs (15 Wikidata + 9 web search + 1 dedicated page)
  4. Flagged problematic entries:
    • 0874_Q2789869.yaml (HEEMAF): NOT_HERITAGE - was electrical manufacturer
    • 1130_Q110282061.yaml (Museum Oud Westdorpe): CLOSED - permanently closed
    • 0875_Q110891769.yaml & 0993_Q110891769.yaml: DUPLICATE - same Wikidata ID
    • 0950_unknown.yaml & 0967_Q110891782.yaml: DUPLICATE - same institution

Final URL Coverage

Category Count Percentage
With URL 1,619 96.7%
Without URL 55 3.3%
Flagged (duplicates, not heritage, closed) 5 0.3%

Remaining Entries Without URLs (55)

These entries have no discoverable website. Many are:

  • Small heemkundige kringen (local history societies)
  • Entries with Wikidata IDs but no P856 property
  • Defunct or volunteer-run organizations
  • Facebook-only presence

Example entries still missing URLs:

  • 0001_Q2679819: Stichting Hunebedcentrum (has Wikidata, needs P856 check)
  • 0139_de_hollandse_cirkel: De Hollandse Cirkel (no Wikidata)
  • 0144_Q2710899: Nationaal Onderduik Museum (has Wikidata)
  • 0148_Q69725772: CollectieGelderland (has Wikidata)
  • Various small historical societies in Overijssel

Recommendations

  1. Query Wikidata again for entries with Q-numbers but no P856 discovered
  2. Web search remaining entries with organization names
  3. Check for dedicated pages on umbrella organization websites (like Museum Greccio on Hartebrug church site)
  4. Mark as url_status: NOT_FOUND for entries where no web presence exists
  5. Consider Facebook URLs as fallback for small volunteer organizations

Technical Notes

  • Added digital_platforms section with proper platform_type classification:
    • OFFICIAL_WEBSITE for standalone websites
    • DEDICATED_PAGE for pages on host websites (like Museum Greccio)
  • Preserved data_tier in provenance:
    • TIER_2_VERIFIED for Wikidata P856 URLs
    • TIER_4_INFERRED for web search discovered URLs

Generated by NDE URL Discovery workflow
Last updated: 2025-12-01T16:30:00+00:00

Session Updates - December 2025

2025-12-01: NAN ISIL Batch Corrections

Fixed several issues with entries 1502-1513 (NAN ISIL 2025-11-06 batch):

Entry ID Issue Resolution
1504 Missing URL Added https://papierknippen.nl/ (discovered via Exa)
1508 Wrong custodian_name ("Cookiesbeleid") Corrected to "Parochiearchief Kampen" from NAN ISIL registry
1508 Wrong institution type (U) Changed to A (Archive)
1508 Incorrect GHCID Regenerated: NL-OV-KAM-A-PK
1511 Wrong custodian_name (exhibition title) Already corrected to "Wereldmuseum Leiden"
1511 Wrong institution type (U) Changed to M (Museum)
1511 GHCID based on exhibition title Regenerated: NL-ZH-LEI-M-WL
1512 Wrong institution type (U) Changed to H (Holy Sites - diocesan heritage commission)
1512 No website info Added parent organization note (Bisdom Roermond)
1512 Incorrect GHCID Regenerated: NL-LI-ROE-H-DCKK