glam/data/wikidata/GLAMORCUBEPSXHFN/DUPLICATE_Q_NUMBERS.md
2025-11-19 23:25:22 +01:00

2.7 KiB

Duplicate Q-Numbers in Curated Vocabulary

Date: 2025-11-16
Source: data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml

Summary

  • Total duplicate mappings: 18
  • Purpose: Exclude alternative Wikidata IDs for entities already curated
  • Reason for duplicates: Wikidata merge operations, initial misidentification, or alternative external database IDs

Complete List of Duplicate Mappings

Primary Label Duplicate Q-number Context
Q474 Q111437426
Q9259 Q31838911 Heritage site
Q59772 Q107417094
Q167346 Q3162438 Botanical garden
Q275901 Q115620712
Q4344572 Q116144938
Q5403345 Q100994575
Q7839970 Q1257025 Garden
Q7888843 Q20076926
Q9480202 Q21164403 Safari park
Q11609363 Q383092
Q12373252 Q25516833 Estonia protected area
Q25105971 Q135941649
Q27032347 Q116144936
Q27670147 Q113550465
Q35800652 Q3457217 USA protected area
Q113583759 Q134886297
Q124022770 Q108640285

Example Entry from YAML

- label: Q9259
  hypernym:
    - heritage site
  type:
    - B
    - F
    - L
    - A
    - M
  duplicate:
    - Q31838911  # Alternative Wikidata ID for same entity

Impact on Query Generation

When generating SPARQL queries, both the primary label and duplicate Q-numbers are excluded:

FILTER(?hyponym NOT IN (
  wd:Q9259,      # Primary ID
  wd:Q31838911,  # Duplicate ID
  ...
))

This ensures that if Wikidata search discovers the duplicate ID, it won't be returned as a "new" hyponym since we've already curated the entity under its primary ID.

Verification in Generated Query

File: botanical_query_updated_20251116T093744.sparql

All 18 duplicate Q-numbers are present in FILTER statements. You can verify with:

grep -o "wd:Q31838911\|wd:Q1257025\|wd:Q21164403\|wd:Q3457217" \
  data/wikidata/GLAMORCUBEPSXHFN/B/queries/botanical_query_updated_20251116T093744.sparql

Expected output:

wd:Q21164403
wd:Q1257025
wd:Q3457217
wd:Q21164403  # Appears twice (in different FILTER chunks)

Recommendations

  1. Keep duplicate mappings: These are valuable for preventing rediscovery
  2. Document reasons: When adding new duplicates, add comments explaining why
  3. Verify Wikidata: Check if Q-numbers have been merged in Wikidata (use redirect checker)
  4. Update regularly: As Wikidata evolves, more duplicates may emerge
  • Query Generation: data/wikidata/GLAMORCUBEPSXHFN/B/README_QUERY_GENERATION.md
  • Script: scripts/generate_botanical_query_with_exclusions.py
  • Session Notes: docs/sessions/SESSION_SUMMARY_20251116_B_CLASS_QUERY_AUTOMATION.md