2.7 KiB
2.7 KiB
Duplicate Q-Numbers in Curated Vocabulary
Date: 2025-11-16
Source: data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml
Summary
- Total duplicate mappings: 18
- Purpose: Exclude alternative Wikidata IDs for entities already curated
- Reason for duplicates: Wikidata merge operations, initial misidentification, or alternative external database IDs
Complete List of Duplicate Mappings
| Primary Label | Duplicate Q-number | Context |
|---|---|---|
| Q474 | Q111437426 | |
| Q9259 | Q31838911 | Heritage site |
| Q59772 | Q107417094 | |
| Q167346 | Q3162438 | Botanical garden |
| Q275901 | Q115620712 | |
| Q4344572 | Q116144938 | |
| Q5403345 | Q100994575 | |
| Q7839970 | Q1257025 | Garden |
| Q7888843 | Q20076926 | |
| Q9480202 | Q21164403 | Safari park |
| Q11609363 | Q383092 | |
| Q12373252 | Q25516833 | Estonia protected area |
| Q25105971 | Q135941649 | |
| Q27032347 | Q116144936 | |
| Q27670147 | Q113550465 | |
| Q35800652 | Q3457217 | USA protected area |
| Q113583759 | Q134886297 | |
| Q124022770 | Q108640285 |
Example Entry from YAML
- label: Q9259
hypernym:
- heritage site
type:
- B
- F
- L
- A
- M
duplicate:
- Q31838911 # Alternative Wikidata ID for same entity
Impact on Query Generation
When generating SPARQL queries, both the primary label and duplicate Q-numbers are excluded:
FILTER(?hyponym NOT IN (
wd:Q9259, # Primary ID
wd:Q31838911, # Duplicate ID
...
))
This ensures that if Wikidata search discovers the duplicate ID, it won't be returned as a "new" hyponym since we've already curated the entity under its primary ID.
Verification in Generated Query
File: botanical_query_updated_20251116T093744.sparql
All 18 duplicate Q-numbers are present in FILTER statements. You can verify with:
grep -o "wd:Q31838911\|wd:Q1257025\|wd:Q21164403\|wd:Q3457217" \
data/wikidata/GLAMORCUBEPSXHFN/B/queries/botanical_query_updated_20251116T093744.sparql
Expected output:
wd:Q21164403
wd:Q1257025
wd:Q3457217
wd:Q21164403 # Appears twice (in different FILTER chunks)
Recommendations
- Keep duplicate mappings: These are valuable for preventing rediscovery
- Document reasons: When adding new duplicates, add comments explaining why
- Verify Wikidata: Check if Q-numbers have been merged in Wikidata (use redirect checker)
- Update regularly: As Wikidata evolves, more duplicates may emerge
Related Documentation
- Query Generation:
data/wikidata/GLAMORCUBEPSXHFN/B/README_QUERY_GENERATION.md - Script:
scripts/generate_botanical_query_with_exclusions.py - Session Notes:
docs/sessions/SESSION_SUMMARY_20251116_B_CLASS_QUERY_AUTOMATION.md