9.6 KiB
9.6 KiB
Entity Duplicate Analysis Report
Generated: 2025-12-13 17:34:36
Data Quality Issues Summary
| Issue Type | Count | Total Occurrences |
|---|---|---|
| Language Codes | 31 | 1098 |
| Generic Labels | 7 | 87 |
| Numeric Only | 235 | 366 |
| Single Char | 48 | 476 |
| Type Mismatches | 475 | 2652 |
| Variant Spellings | 0 | 0 |
| TOTAL | 796 | 4679 |
Language Code Entities (Should Be Filtered)
These are HTML lang attribute values, not real entities:
| Entity | Occurrences | Action |
|---|---|---|
nl-NL |
527 | Filter out |
nl |
370 | Filter out |
nl_NL |
86 | Filter out |
EN |
36 | Filter out |
en-GB |
20 | Filter out |
en-US |
12 | Filter out |
de |
10 | Filter out |
en_US |
6 | Filter out |
de-de |
4 | Filter out |
NH |
4 | Filter out |
UB |
2 | Filter out |
fy |
2 | Filter out |
fy-nl |
1 | Filter out |
PI |
1 | Filter out |
ym |
1 | Filter out |
en_GB |
1 | Filter out |
VU |
1 | Filter out |
fr-FR |
1 | Filter out |
MA |
1 | Filter out |
US |
1 | Filter out |
Generic Navigation Labels (Low Value)
These are website navigation labels, not heritage entities:
| Entity | Occurrences | Action |
|---|---|---|
Home |
42 | Filter out |
collectie |
22 | Filter out |
Archief |
8 | Filter out |
Contact |
5 | Filter out |
Nieuws |
5 | Filter out |
AGENDA |
4 | Filter out |
collection |
1 | Filter out |
Numeric-Only Entities (Often Dimensions)
These are often image dimensions or years without context:
| Entity | Occurrences | Action |
|---|---|---|
'2025' |
26 | Review context |
'2560' |
10 | Review context |
'1200' |
10 | Review context |
'1920' |
10 | Review context |
'2024' |
8 | Review context |
'800' |
6 | Review context |
'1280' |
6 | Review context |
'670' |
5 | Review context |
'700' |
4 | Review context |
'2021' |
4 | Review context |
'2023' |
4 | Review context |
'20' |
3 | Review context |
'1080' |
3 | Review context |
'10' |
3 | Review context |
'150' |
3 | Review context |
'1995' |
3 | Review context |
'630' |
3 | Review context |
'100' |
3 | Review context |
'2000' |
3 | Review context |
'2048' |
3 | Review context |
Type Mismatches (Need Resolution)
Same entity classified with different types:
| Entity | Types | Occurrences | Action |
|---|---|---|---|
| nl-NL | THG.LNG, TOP.CTY, TOP.LNG |
527 | Resolve type |
| nl | THG.LNG, TOP.CTY |
370 | Resolve type |
| Nederland | THG.LNG, TOP.CTY, TOP.NAT, TOP.REG, TOP.SET |
87 | Resolve type |
| nl_NL | THG.LNG, TOP.CTY |
86 | Resolve type |
| Home | APP.EXH, APP.TIT, APP.TTL, DOC.WEB, THG.CON, TOP.SET, WRK.TXT, WRK.WEB |
42 | Resolve type |
| '2025' | TMP.DAB, TMP.DRL |
26 | Resolve type |
| Netherlands | TOP.CTY, TOP.REG, TOP.SET |
24 | Resolve type |
| collectie | APP.COL, APP.EXH, THG.COL, THG.CON, WRK.COL, WRK.SER |
22 | Resolve type |
| Groningen | TOP.CTY, TOP.REG, TOP.SET |
21 | Resolve type |
| Den Haag | APP.PNM, TOP.SET |
19 | Resolve type |
| Onder de paramariboom | APP.TIT, WRK.TTL, WRK.TXT, WRK.VIS, WRK.WRK |
18 | Resolve type |
| bibliotheek | APP.COL, GRP.EDU, GRP.HER, THG.CON, TOP.BLD, WRK.COL, WRK.WEB |
16 | Resolve type |
| Zwolle | TOP.ADR, TOP.SET |
16 | Resolve type |
| Beeldbank | APP.COL, APP.EXH, WRK.COL, WRK.VIS, WRK.WEB |
15 | Resolve type |
| Heel Nederland Leest | APP.EXH, APP.TTL, GRP.HIS, THG.CON, THG.EVT, WRK.EXH, WRK.SER |
15 | Resolve type |
| Rotterdam | TOP.CTY, TOP.REG, TOP.SET |
14 | Resolve type |
| Nederlandse | THG.LNG, TOP.CTY |
12 | Resolve type |
| Utrecht | TOP.CTY, TOP.REG, TOP.SET |
12 | Resolve type |
APP.NAM, APP.SOC, GRP.COR, GRP.GOV, THG.CON, WRK.SOC, WRK.WEB |
11 | Resolve type | |
| archieven | APP.COL, THG.CON, WRK.ARC, WRK.COL |
11 | Resolve type |
| Gemeentearchief | APP.TIT, APP.TTL, GRP.HER, WRK.WEB |
11 | Resolve type |
| Overijssel | GRP.GOV, TOP.REG |
11 | Resolve type |
| Friesland | TOP.CTY, TOP.REG |
10 | Resolve type |
| '1200' | QTY.CNT, QTY.MSR |
10 | Resolve type |
| Twente | TOP.REG, TOP.SET |
10 | Resolve type |
| Leiden | TOP.REG, TOP.SET |
10 | Resolve type |
| '1920' | QTY.MSR, TMP.CEN, TMP.DAB |
10 | Resolve type |
| de | THG.LNG, TOP.CTY |
10 | Resolve type |
| Noord-Holland | TOP.CTY, TOP.REG |
9 | Resolve type |
| Tweede Wereldoorlog | THG.CON, THG.EVT, TMP.ERA |
9 | Resolve type |
| Limburg | TOP.CTY, TOP.REG, TOP.SET |
8 | Resolve type |
| Zuid-Holland | GRP.GOV, TOP.REG |
8 | Resolve type |
| Brabant | TOP.CTY, TOP.REG |
8 | Resolve type |
| Archief | APP.COL, APP.TIT, GRP.UNT, THG.CON, WRK.ARC, WRK.COL, WRK.WEB |
8 | Resolve type |
GRP.COR, GRP.GOV |
7 | Resolve type | |
| Ministerie van Algemene Zaken | GRP.GOV, GRP.PAR |
7 | Resolve type |
| foto's | THG.CON, THG.DOC, THG.PHO |
7 | Resolve type |
| Kasteel Doorwerth | APP.TTL, GRP.HER, THG.AFT, TOP.BLD |
7 | Resolve type |
| Boeken | THG.CON, THG.DOC, WRK.COL, WRK.PUB, WRK.TXT |
7 | Resolve type |
| Delft | TOP.REG, TOP.SET |
7 | Resolve type |
| Gelderland | GRP.GOV, TOP.REG, TOP.SET |
7 | Resolve type |
APP.NAM, APP.SOC, THG.CON, WRK.SOC, WRK.WEB |
6 | Resolve type | |
| '800' | QTY.CNT, QTY.MSR |
6 | Resolve type |
| Haarlem | TOP.REG, TOP.SET |
6 | Resolve type |
| museum | GRP.HER, ROL.OCC, THG.CON, THG.EVT |
6 | Resolve type |
| Tijdschriften | APP.TTL, THG.CON, THG.DOC, WRK.PUB, WRK.SER |
6 | Resolve type |
| Archeologie | GRP.HER, GRP.UNT, THG.CON |
6 | Resolve type |
| Provincie Overijssel | APP.TIT, GRP.GOV, TOP.REG |
6 | Resolve type |
| '1280' | QTY.CNT, QTY.MSR |
6 | Resolve type |
| november 2025 | TMP.DAB, TMP.DRL, TMP.DUR, TMP.EXP, TMP.RNG |
6 | Resolve type |
Cleanup Impact Analysis
| Metric | Value |
|---|---|
| Total entity occurrences | 13,268 |
| Candidate cleanup occurrences | 1,551 |
| Cleanup percentage | 11.7% |
| Remaining after cleanup | 11,717 |
High-Value Entities for Linking
Entities that appear frequently and are good candidates for Wikidata/VIAF linking:
| Entity | Type | Occurrences | Wikidata? |
|---|---|---|---|
| Nederland | TOP.CTY |
87 | 🔍 Search |
| Amsterdam | TOP.SET |
42 | 🔍 Search |
| Netherlands | TOP.CTY |
24 | 🔍 Search |
| Groningen | TOP.SET |
21 | 🔍 Search |
| Den Haag | TOP.SET |
19 | 🔍 Search |
| bibliotheek | GRP.HER |
16 | 🔍 Search |
| Zwolle | TOP.SET |
16 | 🔍 Search |
| Rotterdam | TOP.SET |
14 | 🔍 Search |
| Johan Fretz | AGT.PER |
13 | 🔍 Search |
| Nederlandse | TOP.CTY |
12 | 🔍 Search |
| Utrecht | TOP.SET |
12 | 🔍 Search |
WRK.SOC |
11 | 🔍 Search | |
| Roermond | TOP.SET |
11 | 🔍 Search |
| Gemeentearchief | GRP.HER |
11 | 🔍 Search |
| Overijssel | TOP.REG |
11 | 🔍 Search |
| Deventer | TOP.SET |
11 | 🔍 Search |
| Friesland | TOP.REG |
10 | 🔍 Search |
| Twente | TOP.REG |
10 | 🔍 Search |
| Leiden | TOP.SET |
10 | 🔍 Search |
| Noord-Holland | TOP.REG |
9 | 🔍 Search |
| Geldersch Landschap en Kasteelen | GRP.HER |
9 | 🔍 Search |
| Nijmegen | TOP.SET |
8 | 🔍 Search |
| Limburg | TOP.REG |
8 | 🔍 Search |
| Enschede | TOP.SET |
8 | 🔍 Search |
| Zuid-Holland | TOP.REG |
8 | 🔍 Search |
| Brabant | TOP.REG |
8 | 🔍 Search |
GRP.COR |
7 | 🔍 Search | |
| Heerlen | TOP.SET |
7 | 🔍 Search |
| Maastricht | TOP.SET |
7 | 🔍 Search |
| Ministerie van Algemene Zaken | GRP.GOV |
7 | 🔍 Search |
| Kasteel Doorwerth | GRP.HER |
7 | 🔍 Search |
| Delft | TOP.SET |
7 | 🔍 Search |
| Gelderland | TOP.REG |
7 | 🔍 Search |
| Schiedam | TOP.SET |
6 | 🔍 Search |
| Tilburg | TOP.SET |
6 | 🔍 Search |
| Amelander Musea | GRP.HER |
6 | 🔍 Search |
| Alkmaar | TOP.SET |
6 | 🔍 Search |
| Franeker | TOP.SET |
6 | 🔍 Search |
| Ommen | TOP.SET |
6 | 🔍 Search |
| Zutphen | TOP.SET |
6 | 🔍 Search |
| Haarlem | TOP.SET |
6 | 🔍 Search |
| museum | GRP.HER |
6 | 🔍 Search |
| Maassluis | TOP.SET |
6 | 🔍 Search |
| Dordrecht | TOP.SET |
6 | 🔍 Search |
| Arnhem | TOP.SET |
6 | 🔍 Search |
| Government of the Netherlands | GRP.GOV |
6 | 🔍 Search |
| Leeuwarden | TOP.SET |
6 | 🔍 Search |
| Apeldoorn | TOP.SET |
6 | 🔍 Search |
| Archeologie | THG.CON |
6 | 🔍 Search |
| Provincie Overijssel | GRP.GOV |
6 | 🔍 Search |
| Rijksmuseum | GRP.HER |
6 | 🔍 Search |
| Zeeland | TOP.SET |
6 | 🔍 Search |
| Kampen | TOP.SET |
5 | 🔍 Search |
| Breda | TOP.SET |
5 | 🔍 Search |
| Hengelo | TOP.SET |
5 | 🔍 Search |
| Midden-Groningen | TOP.REG |
5 | 🔍 Search |
| The Netherlands | TOP.CTY |
5 | 🔍 Search |
| Erfgoed 's-Hertogenbosch | GRP.HER |
5 | 🔍 Search |
| Amersfoort | TOP.SET |
5 | 🔍 Search |
| Eindhoven | TOP.SET |
5 | 🔍 Search |
| Hoorn | TOP.SET |
5 | 🔍 Search |
| Dalfsen | TOP.SET |
5 | 🔍 Search |
| Drenthe | TOP.REG |
5 | 🔍 Search |
| Oirschot | TOP.SET |
5 | 🔍 Search |
| Bollenstreek | TOP.REG |
5 | 🔍 Search |
| Medemblik | TOP.SET |
5 | 🔍 Search |
| Wageningen | TOP.SET |
5 | 🔍 Search |
| Rijksuniversiteit Groningen | GRP.EDU |
5 | 🔍 Search |
| Texel | TOP.SET |
5 | 🔍 Search |
| Collectie Overijssel | GRP.HER |
5 | 🔍 Search |
| Flevoland | TOP.REG |
5 | 🔍 Search |
| Venlo | TOP.SET |
5 | 🔍 Search |
APP.SOC |
4 | 🔍 Search | |
| Gemeente Waadhoeke | GRP.GOV |
4 | 🔍 Search |
| St.-Annaparochie | TOP.SET |
4 | 🔍 Search |