glam/data/wikidata/GLAMORCUBEPSXHFN/G/sparql/enhanced_query_v2.md
2025-11-19 23:25:22 +01:00

6.6 KiB

Enhanced G-class SPARQL Query v2.0

Date: 2025-11-16
Purpose: Discover hidden G-class (Gallery) heritage custodians using Q118554787 and curated hypernyms
Improvement: Uses broadest hypernym (Q118554787 "gallery") identified but not yet included in our dataset

Key Innovation: Q118554787

Q118554787 ("gallery" - collection of physical or digital images intended to be publicly visible) is the broadest hypernym within the G-class taxonomy but was not used in previous queries. This entity should capture many gallery types that were missed.

Query Strategy

1. Core Hypernyms (New)

  • Q118554787 - gallery (broadest, not yet exploited)
  • Q207694 - art gallery (instance level)
  • Q18761864 - exhibition space

2. Curated G-class Hypernyms (From Existing Data)

Pure G-class entries already in our data that can help find more:

  • Q2190251 - arts center
  • Q98818526 - art gallery
  • Q20897549 - art institution
  • Q3844310 - national gallery
  • Q125501487 - map gallery
  • Q127346204 - design gallery
  • Q109038036 - Galeries Fnac
  • Q29380643 - cast collection
  • Q114023739, Q1400264, Q3768550, Q17111940, Q11900212, Q56317084

3. Mixed Type G+M Hypernyms

  • Q1030034 - GLAM
  • Q3196771 - art museum
  • Q1475403 - kunsthalle
  • Q740437 - pinacotheca
  • Q1747681 - artist museum
  • Q135926044 - phototheque
  • Q1759852 - sculpture garden
  • Q15090615 - arts venue

Full SPARQL Query

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?item ?itemLabel ?itemDescription WHERE {
  {
    # Strategy 1: Q118554787 (gallery) - Broadest hypernym
    ?item wdt:P279+ wd:Q118554787 .
  } UNION {
    # Strategy 2: Direct instances of Q118554787
    ?item wdt:P31 wd:Q118554787 .
  } UNION {
    # Strategy 3: Pure G-class hypernyms from curated data
    VALUES ?g_hypernym {
      wd:Q2190251    # arts center
      wd:Q98818526   # art gallery  
      wd:Q20897549   # art institution
      wd:Q3844310    # national gallery
      wd:Q125501487  # map gallery
      wd:Q127346204  # design gallery
      wd:Q109038036  # Galeries Fnac
      wd:Q29380643   # cast collection
      wd:Q114023739  # art institution, gallery
      wd:Q1400264    # art institution
      wd:Q3768550    # art institution, gallery
      wd:Q17111940   # art institution, gallery
      wd:Q11900212   # art institution
      wd:Q56317084   # art institution, gallery
    }
    ?item wdt:P279+ ?g_hypernym .
  } UNION {
    # Strategy 4: Mixed G+M hypernyms
    VALUES ?gm_hypernym {
      wd:Q1030034    # GLAM
      wd:Q3196771    # art museum
      wd:Q1475403    # kunsthalle
      wd:Q740437     # pinacotheca
      wd:Q1747681    # artist museum
      wd:Q135926044  # phototheque
      wd:Q1759852    # sculpture garden
      wd:Q15090615   # arts venue
    }
    ?item wdt:P279+ ?gm_hypernym .
  } UNION {
    # Strategy 5: Q207694 (art gallery) instances
    ?item wdt:P31 wd:Q207694 .
  } UNION {
    # Strategy 6: Q18761864 (exhibition space)
    ?item wdt:P31/wdt:P279* wd:Q18761864 .
  } UNION {
    # Strategy 7: Cast collections
    ?item wdt:P31/wdt:P279* wd:Q29380643 .
  }
  
  # Quality filters
  FILTER NOT EXISTS { ?item wdt:P31 wd:Q4167410 . }   # disambiguation
  FILTER NOT EXISTS { ?item wdt:P31 wd:Q13406463 . }  # list article
  FILTER NOT EXISTS { ?item wdt:P31 wd:Q4167836 . }   # category
  FILTER NOT EXISTS { ?item wdt:P31 wd:Q5398426 . }   # TV series
  FILTER NOT EXISTS { ?item wdt:P31 wd:Q11424 . }     # film
  
  SERVICE wikibase:label { 
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,fr,de,es,nl,it,pt" . 
  }
}
LIMIT 2000

Expected Results

Based on preliminary counts:

  • Q118554787 alone: ~26 entities
  • Curated G-class hypernyms: ~50-100 additional entities
  • Mixed type hypernyms: ~30-50 entities
  • Exhibition spaces: ~20-30 entities

Total estimated: 150-250 new G-class entities

Post-Processing Required

After running the query:

  1. Deduplicate against existing hyponyms_curated.yaml (1,896 Q-numbers)
  2. Manual curation to assign type codes:
    • Pure G: galleries, arts centers, exhibition spaces
    • G+M: kunsthalles, art museums with gallery functions
    • G+L: galleries with library collections
    • G+A: galleries with archival materials
  3. Add metadata:
    • country: Geographic location
    • hypernym: Parent class(es) from query
    • notes: Special characteristics

Usage

Step 1: Run Query

Copy query to Wikidata Query Service

Step 2: Export Results

Download as JSON: enhanced_g_query_results_v2.json

Step 3: Deduplicate

import yaml
import json

# Load existing
with open('data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml', 'r') as f:
    existing = yaml.safe_load(f)

existing_qids = set()
for section in ['hypernym', 'entity', 'entity_list', 'standard', 'collection', 'exclude']:
    for item in existing.get(section, []):
        if isinstance(item, dict):
            label = item.get('label')
            if isinstance(label, str) and label.startswith('Q'):
                existing_qids.add(label)

# Load new results
with open('enhanced_g_query_results_v2.json', 'r') as f:
    results = json.load(f)

new_qids = []
for binding in results['results']['bindings']:
    qid = binding['item']['value'].split('/')[-1]
    if qid not in existing_qids:
        new_qids.append({
            'qid': qid,
            'label': binding.get('itemLabel', {}).get('value', ''),
            'description': binding.get('itemDescription', {}).get('value', '')
        })

print(f"Found {len(new_qids)} new G-class entities")

Step 4: Curate

Review each new entity and add to hyponyms_curated.yaml:

hypernym:
  - label: Q12345678
    hypernym:
      - gallery
      - art institution
    type:
      - G
    country: NL  # Add if known
    notes: "Discovered via Q118554787 query"

Validation

After adding new entries:

  1. Run enrichment script: python scripts/enrich_hyponyms_with_wikidata.py
  2. Verify property extraction works correctly
  3. Check for duplicates or conflicts

Notes

  • Q118554787 is a key discovery - it's the broadest gallery hypernym not previously exploited
  • This query focuses on hyponyms (P279+ subclass relationships)
  • Some results may be mixed types (G+M, G+L) - curate carefully
  • Exhibition spaces (Q18761864) may overlap with museum types - check carefully

Version History

  • v1.0 (2025-11-12): Initial G-class query without Q118554787
  • v2.0 (2025-11-16): Added Q118554787 and curated hypernyms from existing data