156 lines
5.3 KiB
Markdown
156 lines
5.3 KiB
Markdown
# G-class Query History
|
|
|
|
This directory contains all SPARQL queries used to discover G-class (Gallery) heritage custodians from Wikidata.
|
|
|
|
## Query Evolution
|
|
|
|
### Phase 1: Initial Discovery (Nov 13, 2025)
|
|
|
|
**`gallery_query_complete_20251113T130920.sparql`**
|
|
- First comprehensive G-class query
|
|
- 6 base classes
|
|
- Basic subclass discovery (P279+)
|
|
- No exclusions
|
|
|
|
### Phase 2: First Update (Nov 13, 2025)
|
|
|
|
**`gallery_query_updated_20251113T160823.sparql`**
|
|
- Refined base classes
|
|
- Added quality filters
|
|
- Improved pattern matching
|
|
|
|
### Phase 3: Verification Round 1 (Nov 16, 2025 AM)
|
|
|
|
**`gallery_query_updated_20251116T095704.sparql`**
|
|
- Q-number verification pass
|
|
- Updated base classes based on validation
|
|
- Better type distinctions (institution vs. building)
|
|
|
|
### Phase 4: Verification Round 2 (Nov 16, 2025 Midday)
|
|
|
|
**`gallery_query_updated_20251116T104506.sparql`**
|
|
- 14 verified base classes
|
|
- Comprehensive exclusions (1,819 Q-numbers)
|
|
- 37 filter chunks for optimization
|
|
- Key discoveries:
|
|
- Institution vs building distinction
|
|
- Artist-run and alternative spaces
|
|
- Online/digital galleries
|
|
|
|
### Phase 5: Enhanced Recursive Discovery (Nov 16, 2025 12:54) ⚠️ **SUPERSEDED**
|
|
|
|
**`gallery_query_enhanced_20251116T125414.sparql`**
|
|
|
|
**Issue Discovered**: Returns individual gallery **instances** (P31), not gallery **classes** (P279)
|
|
|
|
**Why Superseded**:
|
|
- Mixed P31 (instance of) and P279 (subclass of) patterns
|
|
- Would return "Tate Modern", "Louvre", etc. (individual galleries)
|
|
- We need "contemporary art gallery", "sculpture gallery", etc. (gallery types)
|
|
|
|
### Phase 6: Classes-Only Query (Nov 16, 2025 13:45) ⭐ **CURRENT**
|
|
|
|
**`gallery_classes_query_20251116T134500.sparql`** ← **USE THIS ONE**
|
|
|
|
**Major Innovation**: **ONLY P279 (subclass) relationships** - NO P31 (instance) patterns
|
|
|
|
**Critical Distinction**:
|
|
- ✅ **P279 (subclass of)** → Gallery CLASSES we want (e.g., "photography gallery")
|
|
- ❌ **P31 (instance of)** → Individual galleries we exclude (e.g., "MoMA")
|
|
|
|
**8 Search Strategies** (all P279-based):
|
|
1. ✨ Q118554787 direct hyponyms (1 level)
|
|
2. ✨ Q118554787 transitive hyponyms (all levels)
|
|
3. ✨ 14 curated G-class hypernyms - direct hyponyms
|
|
4. ✨ 14 curated G-class hypernyms - transitive hyponyms
|
|
5. ✨ 8 mixed G+M hypernyms - direct hyponyms
|
|
6. ✨ 8 mixed G+M hypernyms - transitive hyponyms
|
|
7. Q207694 (art gallery) transitive hyponyms
|
|
8. Q18761864 (exhibition space) transitive hyponyms
|
|
|
|
**Key Features**:
|
|
- **FILTER NOT EXISTS { ?item wdt:P31 ?anyInstance }** - excludes ALL instances
|
|
- Only returns classes/types (P279 relationships)
|
|
- Uses Q118554787 (broadest gallery hypernym)
|
|
- Recursive discovery via curated G-class entries
|
|
- Expected: 30-100 new gallery classes
|
|
|
|
**Improvements over Phase 5**:
|
|
- Focused on classes, not instances
|
|
- Added critical instance exclusion filter
|
|
- All strategies now use P279 (subclass) patterns only
|
|
- More accurate for taxonomy building
|
|
|
|
## File Structure
|
|
|
|
Each query version includes:
|
|
- **`.sparql`** - SPARQL query code
|
|
- **`.yaml`** - Metadata with:
|
|
- Base classes used
|
|
- Search strategies
|
|
- Expected results
|
|
- Deduplication notes
|
|
- Usage instructions
|
|
- Related files
|
|
|
|
## Current Statistics
|
|
|
|
**hyponyms_curated.yaml** (as of Nov 16, 2025):
|
|
- Total Q-numbers: 1,896
|
|
- G-class entries: 47 (25 pure G + 22 mixed)
|
|
- Q118554787: **NOT INCLUDED** ← Key opportunity!
|
|
|
|
## Usage Workflow
|
|
|
|
1. **Select Query**: Use latest classes-only query (`gallery_classes_query_20251116T134500.sparql`)
|
|
2. **Execute**: Copy to [Wikidata Query Service](https://query.wikidata.org/)
|
|
3. **Download**: Export results as JSON
|
|
4. **Deduplicate**: Filter against existing 1,896 Q-numbers
|
|
5. **Curate**: Assign type codes, country, hypernym metadata
|
|
6. **Add**: Append to `hyponyms_curated.yaml`
|
|
7. **Enrich**: Run `python scripts/enrich_hyponyms_with_wikidata.py`
|
|
8. **Validate**: Check for duplicates/conflicts
|
|
|
|
## Expected Impact
|
|
|
|
**Before** (Phase 5):
|
|
- 47 G-class entries
|
|
- Limited coverage
|
|
|
|
**After** (Phase 6 - Projected):
|
|
- 77-147 G-class entries (47 + 30-100 new classes)
|
|
- 2-3x increase in taxonomy coverage
|
|
- Gallery classes like "contemporary art gallery", "sculpture gallery", etc.
|
|
- Better classification for future institution extraction
|
|
|
|
## Documentation
|
|
|
|
See also:
|
|
- **Query documentation**: `../G_QUERY_UPDATE_2025-11-16.md`
|
|
- **Source data**: `../../hyponyms_curated.yaml`
|
|
- **Enriched data**: `../../hyponyms_curated_full.yaml`
|
|
- **SPARQL copies**: `../sparql/` (for easy access)
|
|
|
|
## Next Queries to Develop
|
|
|
|
After G-class completion, develop similar enhanced queries for:
|
|
- **L-class** (Libraries) - use curated library hypernyms
|
|
- **A-class** (Archives) - use curated archive hypernyms
|
|
- **M-class** (Museums) - largest class, needs careful strategy
|
|
- **Other classes** - R, C, O, B, E, S, F, I, X, P, H, D, N, T
|
|
|
|
## Query Best Practices
|
|
|
|
Based on lessons learned:
|
|
|
|
1. ✅ **Verify Q-numbers** before using as base classes
|
|
2. ✅ **Use curated entries** for recursive discovery
|
|
3. ✅ **Defer deduplication** to post-processing (performance)
|
|
4. ✅ **Multiple strategies** for comprehensive coverage
|
|
5. ✅ **Document metadata** in accompanying YAML files
|
|
6. ✅ **Version control** with timestamps
|
|
7. ✅ **Test with COUNT** queries first to estimate results
|
|
|
|
## Contact
|
|
|
|
For questions about query strategy or to report issues, see project documentation in `AGENTS.md`.
|