glam/data/wikidata/GLAMORCUBEPSXHFN/G/queries
2025-11-19 23:25:22 +01:00
..
gallery_classes_query_20251116T134500.sparql add isil entries 2025-11-19 23:25:22 +01:00
gallery_classes_query_20251116T134500.yaml add isil entries 2025-11-19 23:25:22 +01:00
gallery_query_complete_20251113T130920.sparql add isil entries 2025-11-19 23:25:22 +01:00
gallery_query_complete_20251113T130920.yaml add isil entries 2025-11-19 23:25:22 +01:00
gallery_query_enhanced_20251116T125414.sparql add isil entries 2025-11-19 23:25:22 +01:00
gallery_query_enhanced_20251116T125414.yaml add isil entries 2025-11-19 23:25:22 +01:00
gallery_query_updated_20251113T160823.sparql add isil entries 2025-11-19 23:25:22 +01:00
gallery_query_updated_20251113T160823.yaml add isil entries 2025-11-19 23:25:22 +01:00
gallery_query_updated_20251116T095704.sparql add isil entries 2025-11-19 23:25:22 +01:00
gallery_query_updated_20251116T095704.yaml add isil entries 2025-11-19 23:25:22 +01:00
gallery_query_updated_20251116T104506.sparql add isil entries 2025-11-19 23:25:22 +01:00
gallery_query_updated_20251116T104506.yaml add isil entries 2025-11-19 23:25:22 +01:00
README.md add isil entries 2025-11-19 23:25:22 +01:00

G-class Query History

This directory contains all SPARQL queries used to discover G-class (Gallery) heritage custodians from Wikidata.

Query Evolution

Phase 1: Initial Discovery (Nov 13, 2025)

gallery_query_complete_20251113T130920.sparql

  • First comprehensive G-class query
  • 6 base classes
  • Basic subclass discovery (P279+)
  • No exclusions

Phase 2: First Update (Nov 13, 2025)

gallery_query_updated_20251113T160823.sparql

  • Refined base classes
  • Added quality filters
  • Improved pattern matching

Phase 3: Verification Round 1 (Nov 16, 2025 AM)

gallery_query_updated_20251116T095704.sparql

  • Q-number verification pass
  • Updated base classes based on validation
  • Better type distinctions (institution vs. building)

Phase 4: Verification Round 2 (Nov 16, 2025 Midday)

gallery_query_updated_20251116T104506.sparql

  • 14 verified base classes
  • Comprehensive exclusions (1,819 Q-numbers)
  • 37 filter chunks for optimization
  • Key discoveries:
    • Institution vs building distinction
    • Artist-run and alternative spaces
    • Online/digital galleries

Phase 5: Enhanced Recursive Discovery (Nov 16, 2025 12:54) ⚠️ SUPERSEDED

gallery_query_enhanced_20251116T125414.sparql

Issue Discovered: Returns individual gallery instances (P31), not gallery classes (P279)

Why Superseded:

  • Mixed P31 (instance of) and P279 (subclass of) patterns
  • Would return "Tate Modern", "Louvre", etc. (individual galleries)
  • We need "contemporary art gallery", "sculpture gallery", etc. (gallery types)

Phase 6: Classes-Only Query (Nov 16, 2025 13:45) CURRENT

gallery_classes_query_20251116T134500.sparqlUSE THIS ONE

Major Innovation: ONLY P279 (subclass) relationships - NO P31 (instance) patterns

Critical Distinction:

  • P279 (subclass of) → Gallery CLASSES we want (e.g., "photography gallery")
  • P31 (instance of) → Individual galleries we exclude (e.g., "MoMA")

8 Search Strategies (all P279-based):

  1. Q118554787 direct hyponyms (1 level)
  2. Q118554787 transitive hyponyms (all levels)
  3. 14 curated G-class hypernyms - direct hyponyms
  4. 14 curated G-class hypernyms - transitive hyponyms
  5. 8 mixed G+M hypernyms - direct hyponyms
  6. 8 mixed G+M hypernyms - transitive hyponyms
  7. Q207694 (art gallery) transitive hyponyms
  8. Q18761864 (exhibition space) transitive hyponyms

Key Features:

  • FILTER NOT EXISTS { ?item wdt:P31 ?anyInstance } - excludes ALL instances
  • Only returns classes/types (P279 relationships)
  • Uses Q118554787 (broadest gallery hypernym)
  • Recursive discovery via curated G-class entries
  • Expected: 30-100 new gallery classes

Improvements over Phase 5:

  • Focused on classes, not instances
  • Added critical instance exclusion filter
  • All strategies now use P279 (subclass) patterns only
  • More accurate for taxonomy building

File Structure

Each query version includes:

  • .sparql - SPARQL query code
  • .yaml - Metadata with:
    • Base classes used
    • Search strategies
    • Expected results
    • Deduplication notes
    • Usage instructions
    • Related files

Current Statistics

hyponyms_curated.yaml (as of Nov 16, 2025):

  • Total Q-numbers: 1,896
  • G-class entries: 47 (25 pure G + 22 mixed)
  • Q118554787: NOT INCLUDED ← Key opportunity!

Usage Workflow

  1. Select Query: Use latest classes-only query (gallery_classes_query_20251116T134500.sparql)
  2. Execute: Copy to Wikidata Query Service
  3. Download: Export results as JSON
  4. Deduplicate: Filter against existing 1,896 Q-numbers
  5. Curate: Assign type codes, country, hypernym metadata
  6. Add: Append to hyponyms_curated.yaml
  7. Enrich: Run python scripts/enrich_hyponyms_with_wikidata.py
  8. Validate: Check for duplicates/conflicts

Expected Impact

Before (Phase 5):

  • 47 G-class entries
  • Limited coverage

After (Phase 6 - Projected):

  • 77-147 G-class entries (47 + 30-100 new classes)
  • 2-3x increase in taxonomy coverage
  • Gallery classes like "contemporary art gallery", "sculpture gallery", etc.
  • Better classification for future institution extraction

Documentation

See also:

  • Query documentation: ../G_QUERY_UPDATE_2025-11-16.md
  • Source data: ../../hyponyms_curated.yaml
  • Enriched data: ../../hyponyms_curated_full.yaml
  • SPARQL copies: ../sparql/ (for easy access)

Next Queries to Develop

After G-class completion, develop similar enhanced queries for:

  • L-class (Libraries) - use curated library hypernyms
  • A-class (Archives) - use curated archive hypernyms
  • M-class (Museums) - largest class, needs careful strategy
  • Other classes - R, C, O, B, E, S, F, I, X, P, H, D, N, T

Query Best Practices

Based on lessons learned:

  1. Verify Q-numbers before using as base classes
  2. Use curated entries for recursive discovery
  3. Defer deduplication to post-processing (performance)
  4. Multiple strategies for comprehensive coverage
  5. Document metadata in accompanying YAML files
  6. Version control with timestamps
  7. Test with COUNT queries first to estimate results

Contact

For questions about query strategy or to report issues, see project documentation in AGENTS.md.