| .. | ||
| gallery_classes_query_20251116T134500.sparql | ||
| gallery_classes_query_20251116T134500.yaml | ||
| gallery_query_complete_20251113T130920.sparql | ||
| gallery_query_complete_20251113T130920.yaml | ||
| gallery_query_enhanced_20251116T125414.sparql | ||
| gallery_query_enhanced_20251116T125414.yaml | ||
| gallery_query_updated_20251113T160823.sparql | ||
| gallery_query_updated_20251113T160823.yaml | ||
| gallery_query_updated_20251116T095704.sparql | ||
| gallery_query_updated_20251116T095704.yaml | ||
| gallery_query_updated_20251116T104506.sparql | ||
| gallery_query_updated_20251116T104506.yaml | ||
| README.md | ||
G-class Query History
This directory contains all SPARQL queries used to discover G-class (Gallery) heritage custodians from Wikidata.
Query Evolution
Phase 1: Initial Discovery (Nov 13, 2025)
gallery_query_complete_20251113T130920.sparql
- First comprehensive G-class query
- 6 base classes
- Basic subclass discovery (P279+)
- No exclusions
Phase 2: First Update (Nov 13, 2025)
gallery_query_updated_20251113T160823.sparql
- Refined base classes
- Added quality filters
- Improved pattern matching
Phase 3: Verification Round 1 (Nov 16, 2025 AM)
gallery_query_updated_20251116T095704.sparql
- Q-number verification pass
- Updated base classes based on validation
- Better type distinctions (institution vs. building)
Phase 4: Verification Round 2 (Nov 16, 2025 Midday)
gallery_query_updated_20251116T104506.sparql
- 14 verified base classes
- Comprehensive exclusions (1,819 Q-numbers)
- 37 filter chunks for optimization
- Key discoveries:
- Institution vs building distinction
- Artist-run and alternative spaces
- Online/digital galleries
Phase 5: Enhanced Recursive Discovery (Nov 16, 2025 12:54) ⚠️ SUPERSEDED
gallery_query_enhanced_20251116T125414.sparql
Issue Discovered: Returns individual gallery instances (P31), not gallery classes (P279)
Why Superseded:
- Mixed P31 (instance of) and P279 (subclass of) patterns
- Would return "Tate Modern", "Louvre", etc. (individual galleries)
- We need "contemporary art gallery", "sculpture gallery", etc. (gallery types)
Phase 6: Classes-Only Query (Nov 16, 2025 13:45) ⭐ CURRENT
gallery_classes_query_20251116T134500.sparql ← USE THIS ONE
Major Innovation: ONLY P279 (subclass) relationships - NO P31 (instance) patterns
Critical Distinction:
- ✅ P279 (subclass of) → Gallery CLASSES we want (e.g., "photography gallery")
- ❌ P31 (instance of) → Individual galleries we exclude (e.g., "MoMA")
8 Search Strategies (all P279-based):
- ✨ Q118554787 direct hyponyms (1 level)
- ✨ Q118554787 transitive hyponyms (all levels)
- ✨ 14 curated G-class hypernyms - direct hyponyms
- ✨ 14 curated G-class hypernyms - transitive hyponyms
- ✨ 8 mixed G+M hypernyms - direct hyponyms
- ✨ 8 mixed G+M hypernyms - transitive hyponyms
- Q207694 (art gallery) transitive hyponyms
- Q18761864 (exhibition space) transitive hyponyms
Key Features:
- FILTER NOT EXISTS { ?item wdt:P31 ?anyInstance } - excludes ALL instances
- Only returns classes/types (P279 relationships)
- Uses Q118554787 (broadest gallery hypernym)
- Recursive discovery via curated G-class entries
- Expected: 30-100 new gallery classes
Improvements over Phase 5:
- Focused on classes, not instances
- Added critical instance exclusion filter
- All strategies now use P279 (subclass) patterns only
- More accurate for taxonomy building
File Structure
Each query version includes:
.sparql- SPARQL query code.yaml- Metadata with:- Base classes used
- Search strategies
- Expected results
- Deduplication notes
- Usage instructions
- Related files
Current Statistics
hyponyms_curated.yaml (as of Nov 16, 2025):
- Total Q-numbers: 1,896
- G-class entries: 47 (25 pure G + 22 mixed)
- Q118554787: NOT INCLUDED ← Key opportunity!
Usage Workflow
- Select Query: Use latest classes-only query (
gallery_classes_query_20251116T134500.sparql) - Execute: Copy to Wikidata Query Service
- Download: Export results as JSON
- Deduplicate: Filter against existing 1,896 Q-numbers
- Curate: Assign type codes, country, hypernym metadata
- Add: Append to
hyponyms_curated.yaml - Enrich: Run
python scripts/enrich_hyponyms_with_wikidata.py - Validate: Check for duplicates/conflicts
Expected Impact
Before (Phase 5):
- 47 G-class entries
- Limited coverage
After (Phase 6 - Projected):
- 77-147 G-class entries (47 + 30-100 new classes)
- 2-3x increase in taxonomy coverage
- Gallery classes like "contemporary art gallery", "sculpture gallery", etc.
- Better classification for future institution extraction
Documentation
See also:
- Query documentation:
../G_QUERY_UPDATE_2025-11-16.md - Source data:
../../hyponyms_curated.yaml - Enriched data:
../../hyponyms_curated_full.yaml - SPARQL copies:
../sparql/(for easy access)
Next Queries to Develop
After G-class completion, develop similar enhanced queries for:
- L-class (Libraries) - use curated library hypernyms
- A-class (Archives) - use curated archive hypernyms
- M-class (Museums) - largest class, needs careful strategy
- Other classes - R, C, O, B, E, S, F, I, X, P, H, D, N, T
Query Best Practices
Based on lessons learned:
- ✅ Verify Q-numbers before using as base classes
- ✅ Use curated entries for recursive discovery
- ✅ Defer deduplication to post-processing (performance)
- ✅ Multiple strategies for comprehensive coverage
- ✅ Document metadata in accompanying YAML files
- ✅ Version control with timestamps
- ✅ Test with COUNT queries first to estimate results
Contact
For questions about query strategy or to report issues, see project documentation in AGENTS.md.