6.6 KiB
Enhanced G-class SPARQL Query v2.0
Date: 2025-11-16
Purpose: Discover hidden G-class (Gallery) heritage custodians using Q118554787 and curated hypernyms
Improvement: Uses broadest hypernym (Q118554787 "gallery") identified but not yet included in our dataset
Key Innovation: Q118554787
Q118554787 ("gallery" - collection of physical or digital images intended to be publicly visible) is the broadest hypernym within the G-class taxonomy but was not used in previous queries. This entity should capture many gallery types that were missed.
Query Strategy
1. Core Hypernyms (New)
- Q118554787 - gallery (broadest, not yet exploited)
- Q207694 - art gallery (instance level)
- Q18761864 - exhibition space
2. Curated G-class Hypernyms (From Existing Data)
Pure G-class entries already in our data that can help find more:
- Q2190251 - arts center
- Q98818526 - art gallery
- Q20897549 - art institution
- Q3844310 - national gallery
- Q125501487 - map gallery
- Q127346204 - design gallery
- Q109038036 - Galeries Fnac
- Q29380643 - cast collection
- Q114023739, Q1400264, Q3768550, Q17111940, Q11900212, Q56317084
3. Mixed Type G+M Hypernyms
- Q1030034 - GLAM
- Q3196771 - art museum
- Q1475403 - kunsthalle
- Q740437 - pinacotheca
- Q1747681 - artist museum
- Q135926044 - phototheque
- Q1759852 - sculpture garden
- Q15090615 - arts venue
Full SPARQL Query
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?item ?itemLabel ?itemDescription WHERE {
{
# Strategy 1: Q118554787 (gallery) - Broadest hypernym
?item wdt:P279+ wd:Q118554787 .
} UNION {
# Strategy 2: Direct instances of Q118554787
?item wdt:P31 wd:Q118554787 .
} UNION {
# Strategy 3: Pure G-class hypernyms from curated data
VALUES ?g_hypernym {
wd:Q2190251 # arts center
wd:Q98818526 # art gallery
wd:Q20897549 # art institution
wd:Q3844310 # national gallery
wd:Q125501487 # map gallery
wd:Q127346204 # design gallery
wd:Q109038036 # Galeries Fnac
wd:Q29380643 # cast collection
wd:Q114023739 # art institution, gallery
wd:Q1400264 # art institution
wd:Q3768550 # art institution, gallery
wd:Q17111940 # art institution, gallery
wd:Q11900212 # art institution
wd:Q56317084 # art institution, gallery
}
?item wdt:P279+ ?g_hypernym .
} UNION {
# Strategy 4: Mixed G+M hypernyms
VALUES ?gm_hypernym {
wd:Q1030034 # GLAM
wd:Q3196771 # art museum
wd:Q1475403 # kunsthalle
wd:Q740437 # pinacotheca
wd:Q1747681 # artist museum
wd:Q135926044 # phototheque
wd:Q1759852 # sculpture garden
wd:Q15090615 # arts venue
}
?item wdt:P279+ ?gm_hypernym .
} UNION {
# Strategy 5: Q207694 (art gallery) instances
?item wdt:P31 wd:Q207694 .
} UNION {
# Strategy 6: Q18761864 (exhibition space)
?item wdt:P31/wdt:P279* wd:Q18761864 .
} UNION {
# Strategy 7: Cast collections
?item wdt:P31/wdt:P279* wd:Q29380643 .
}
# Quality filters
FILTER NOT EXISTS { ?item wdt:P31 wd:Q4167410 . } # disambiguation
FILTER NOT EXISTS { ?item wdt:P31 wd:Q13406463 . } # list article
FILTER NOT EXISTS { ?item wdt:P31 wd:Q4167836 . } # category
FILTER NOT EXISTS { ?item wdt:P31 wd:Q5398426 . } # TV series
FILTER NOT EXISTS { ?item wdt:P31 wd:Q11424 . } # film
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,fr,de,es,nl,it,pt" .
}
}
LIMIT 2000
Expected Results
Based on preliminary counts:
- Q118554787 alone: ~26 entities
- Curated G-class hypernyms: ~50-100 additional entities
- Mixed type hypernyms: ~30-50 entities
- Exhibition spaces: ~20-30 entities
Total estimated: 150-250 new G-class entities
Post-Processing Required
After running the query:
- Deduplicate against existing
hyponyms_curated.yaml(1,896 Q-numbers) - Manual curation to assign type codes:
- Pure G: galleries, arts centers, exhibition spaces
- G+M: kunsthalles, art museums with gallery functions
- G+L: galleries with library collections
- G+A: galleries with archival materials
- Add metadata:
country: Geographic locationhypernym: Parent class(es) from querynotes: Special characteristics
Usage
Step 1: Run Query
Copy query to Wikidata Query Service
Step 2: Export Results
Download as JSON: enhanced_g_query_results_v2.json
Step 3: Deduplicate
import yaml
import json
# Load existing
with open('data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml', 'r') as f:
existing = yaml.safe_load(f)
existing_qids = set()
for section in ['hypernym', 'entity', 'entity_list', 'standard', 'collection', 'exclude']:
for item in existing.get(section, []):
if isinstance(item, dict):
label = item.get('label')
if isinstance(label, str) and label.startswith('Q'):
existing_qids.add(label)
# Load new results
with open('enhanced_g_query_results_v2.json', 'r') as f:
results = json.load(f)
new_qids = []
for binding in results['results']['bindings']:
qid = binding['item']['value'].split('/')[-1]
if qid not in existing_qids:
new_qids.append({
'qid': qid,
'label': binding.get('itemLabel', {}).get('value', ''),
'description': binding.get('itemDescription', {}).get('value', '')
})
print(f"Found {len(new_qids)} new G-class entities")
Step 4: Curate
Review each new entity and add to hyponyms_curated.yaml:
hypernym:
- label: Q12345678
hypernym:
- gallery
- art institution
type:
- G
country: NL # Add if known
notes: "Discovered via Q118554787 query"
Validation
After adding new entries:
- Run enrichment script:
python scripts/enrich_hyponyms_with_wikidata.py - Verify property extraction works correctly
- Check for duplicates or conflicts
Notes
- Q118554787 is a key discovery - it's the broadest gallery hypernym not previously exploited
- This query focuses on hyponyms (P279+ subclass relationships)
- Some results may be mixed types (G+M, G+L) - curate carefully
- Exhibition spaces (Q18761864) may overlap with museum types - check carefully
Version History
- v1.0 (2025-11-12): Initial G-class query without Q118554787
- v2.0 (2025-11-16): Added Q118554787 and curated hypernyms from existing data