11 KiB
GLAMORCUBEPSXHFN Query Execution Guide
Date: 2025-11-13
Status: Ready for execution
Quick Start
1. Choose a Query
Navigate to the query directory for your target class:
cd /Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN
# Example: Museum queries
cat M/queries/museum_query_complete_20251113T131027.sparql
2. Execute Query
- Open https://query.wikidata.org/
- Paste SPARQL query content
- Click "Run" (or press Ctrl+Enter)
- Wait for results (may take 10-60 seconds for large result sets)
3. Download Results
Click "Download" → Choose format:
- JSON (recommended for processing)
- CSV (for spreadsheet analysis)
- TSV (for data import)
4. Review Results
Check the downloaded file for:
- Valid heritage institution subtypes
- False positives (non-heritage classes)
- Semantic correctness of labels
Query Inventory
✅ Complete Queries (13 classes)
| Class | File | Base Classes | Expected Results |
|---|---|---|---|
| A (Archive) | A/queries/archive_query_missing_complete_20251113T130052.sparql |
1 | 0 (already captured) |
| B (Botanical/Zoo) | B/queries/botanical_zoo_query_complete_20251113T130659.sparql |
18 | High (many subtypes) |
| G (Gallery) | G/queries/gallery_query_complete_20251113T130920.sparql |
4 | Low-Medium |
| L (Library) | L/queries/library_query_complete_20251113T131006.sparql |
11 | Medium-High |
| M (Museum) | M/queries/museum_query_complete_20251113T131027.sparql |
14 | Very High |
| O (Official Inst.) | O/queries/official_query_complete_20251113T131055.sparql |
4 | Low-Medium |
| R (Research Ctr) | R/queries/research_query_complete_20251113T131055.sparql |
3 | Medium |
| C (Corporation) | C/queries/corporation_query_complete_20251113T131055.sparql |
3 | Low |
| E (Education) | E/queries/education_query_complete_20251113T131055.sparql |
6 | High |
| P (Personal Coll.) | P/queries/personal_query_complete_20251113T131055.sparql |
2 | Low |
| S (Coll. Society) | S/queries/collecting_query_complete_20251113T131055.sparql |
3 | Low-Medium |
| H (Holy Sites) | H/queries/holy_query_complete_20251113T131055.sparql |
6 | Medium |
| F (Features) | F/queries/features_query_complete_20251113T131055.sparql |
4 | Medium-High |
Note: U (Unknown) and X (Mixed) classes do not have queries (special classification states).
Recommended Execution Order
Priority 1: High-Value, Low-Noise Classes
Start with well-defined institutional classes:
- Museum (M) - Expected to return many valid museum subtypes
- Library (L) - Well-structured taxonomy in Wikidata
- Gallery (G) - Focused domain, clear boundaries
Why first? These classes have:
- Clear semantic boundaries
- High-quality Wikidata curation
- Low false positive rate
- Immediate value for GLAM curation
Priority 2: Specialized Heritage Classes
Continue with niche heritage types:
- Archive (A) - Verify completeness (should return 0 results)
- Botanical/Zoo (B) - Large taxonomy, needs careful review
- Features (F) - Monuments, memorials, sculptures
Curation note: Features (F) may include non-heritage physical objects. Review each result carefully.
Priority 3: Organizational Classes
Proceed to organizational entities:
- Education Provider (E) - Universities, colleges, schools with collections
- Research Center (R) - Scientific institutes, documentation centers
- Official Institution (O) - Government heritage agencies
Curation note: Filter for institutions that actually maintain heritage collections (not all universities have museums/archives).
Priority 4: Niche/Low-Volume Classes
Finish with specialized collection types:
- Holy Sites (H) - Religious institutions with heritage collections
- Collecting Society (S) - Historical societies, numismatic clubs
- Personal Collection (P) - Private collections
- Corporation (C) - Corporate archives/museums
Curation note: These classes often overlap with others (e.g., corporate museums are also museums). Document multi-type classifications.
Query Execution Checklist
For each query execution:
- Copy SPARQL from
[class]/queries/*_complete_*.sparql - Execute at https://query.wikidata.org/
- Download results as JSON
- Save JSON to
[class]/sparql/results_[YYYYMMDD].json - Review results for:
- Valid heritage institution subtypes
- False positives (non-heritage)
- Semantic correctness
- Geographic diversity
- Document results in
[class]/CURATION_LOG.md - Add validated Q-numbers to
hyponyms_curated.yaml - Re-run query to discover next batch
Curation Workflow
Step 1: Review Query Results
Open the downloaded JSON file:
cat M/sparql/results_20251113.json | jq '.results.bindings[] | {q: .hyponym.value, label: .hyponymLabel.value}'
Step 2: Validate Each Q-number
For each result, check:
-
Is it a heritage institution type?
- Museums, libraries, archives, galleries, etc.
- Collections, societies, cultural organizations
- NOT: administrative units, geographic features (unless F-class)
-
What GLAMORCUBEPSXHFN class(es)?
- Single type: M (museum), L (library), A (archive), etc.
- Multiple types: Use X (mixed) or list all applicable codes
-
Geographic/cultural context?
- Country-specific types (note in
country:field) - Regional variations (note in
subregion:field)
- Country-specific types (note in
-
Historical context?
- Defunct institution types (note in
time:field) - Historical periods (e.g., "Imperial Russia", "Medieval")
- Defunct institution types (note in
Step 3: Add to Curated Vocabulary
Edit hyponyms_curated.yaml:
hyponym:
- label: Q[NUMBER]
hypernym:
- [descriptive term from Wikidata label]
type:
- [GLAMORCUBEPSXHFN code: A, B, C, E, F, G, H, L, M, O, P, R, S, or X]
country: # optional
- [ISO 3166-1 alpha-2 country code]
subregion: # optional
- [region name]
time: # optional
- [temporal context, e.g., "1900-1950", "< 1948"]
rico: # optional (for archival record types)
- label: recordSetTypes
duplicate: # optional (if merged with another Q-number)
- Q[DUPLICATE_NUMBER]
Example:
- label: Q123456
hypernym:
- maritime museum
type:
- M
country:
- Netherlands
Step 4: Re-run Query (Iterative Discovery)
After adding Q-numbers to hyponyms_curated.yaml:
- Queries automatically exclude newly curated Q-numbers (next execution)
- Run query again to discover transitive subclasses
- Continue until no new relevant results found
- Mark class as "complete" in tracking doc
Common False Positives
Museums (M)
- ❌ Museum websites (Q386724) - Digital platforms, not institution types
- ❌ Museum collections (Q2668072) - Collection types, not institutions
- ❌ Museum buildings (Q41176) - Architecture, not organizations
- ✅ Museum subtypes (e.g., Q207694 "art museum") - Valid!
Libraries (L)
- ❌ Library catalogs (Q5994) - Systems, not institutions
- ❌ Library software (Q7375) - Technology, not organizations
- ✅ Library types (e.g., Q28564 "public library") - Valid!
Education (E)
- ❌ All universities - Only include if they maintain heritage collections
- ❌ Primary schools - Rarely have heritage significance
- ✅ Universities with archives/museums - Valid if documented
Features (F)
- ❌ Natural features (mountains, rivers) - Not heritage custodians
- ❌ Living people - Not physical features
- ✅ Monuments, memorials, sculptures, cemeteries - Valid!
Query Performance Tips
Timeout Issues
If query times out (>60 seconds):
- Split query by base class: Run separate queries for each
UNIONclause - Add temporal filter: Limit to items created after a certain year
- Reduce language list: Focus on 10-15 major languages
- Use LIMIT: Add
LIMIT 1000for initial exploration
Large Result Sets
If query returns >10,000 results:
- Prioritize by usage: Add
ORDER BY DESC(?usageCount)(count statements) - Filter by sitelinks:
FILTER(?sitelinks > 5)to focus on well-documented items - Geographic focus: Add country/region filters for phased curation
Memory Issues
If browser/WDQS crashes:
- Use LIMIT: Start with
LIMIT 100, increase gradually - Download in batches: Run query multiple times with
OFFSET - Use API: Query via https://query.wikidata.org/sparql (programmatic)
Automation Scripts (Future)
Batch Query Execution
# Planned: scripts/execute_wikidata_queries.py
# - Read all *.sparql files
# - Execute via WDQS API
# - Save results to class-specific directories
# - Generate curation dashboard
Result Analysis
# Planned: scripts/analyze_query_results.py
# - Parse JSON results
# - Identify potential false positives
# - Suggest hypernym relationships
# - Generate curation candidates
Iterative Curation
# Planned: scripts/iterative_hyponym_discovery.py
# - Execute query
# - Present results for human review
# - Add validated Q-numbers to hyponyms_curated.yaml
# - Re-run query
# - Repeat until no new results
Troubleshooting
"Query timeout" error
Cause: Query takes >60 seconds to execute.
Solution: Simplify query (see "Query Performance Tips" above).
"Too many results" warning
Cause: Result set >10,000 rows.
Solution: Add LIMIT 1000 or use batch execution with OFFSET.
"Malformed query" error
Cause: SPARQL syntax error (rare, all queries pre-validated).
Solution: Check FILTER clauses are correctly closed with parentheses.
Query returns base classes
Cause: Using wdt:P279* instead of wdt:P279+.
Solution: Already corrected in all queries (use wdt:P279+).
Progress Tracking
Execution Log Template
Create [class]/EXECUTION_LOG.md for each class:
# [Class] Query Execution Log
## Execution 1
- **Date**: 2025-11-13
- **Query**: [filename]
- **Results**: [count] hyponyms
- **Curated**: [count] added to hyponyms_curated.yaml
- **Rejected**: [count] false positives
- **Notes**: [observations]
## Execution 2
- **Date**: 2025-11-XX
- **Query**: [filename] (re-run)
- **Results**: [count] new hyponyms (after exclusion)
- **Status**: [Complete / Continue / Review]
Completion Criteria
Mark class as "complete" when:
- Query returns <10 new relevant results
- All major institution subtypes are captured
- Geographic coverage is adequate
- Iterative discovery yields diminishing returns
References
- Query Files:
/Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN/[A-Z]/queries/ - Curated Vocabulary:
/Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml - Wikidata Query Service: https://query.wikidata.org/
- SPARQL Tutorial: https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial
- Session Summary:
docs/sessions/SESSION_SUMMARY_20251113_SPARQL_GENERATION.md
Version: 1.0
Last Updated: 2025-11-13
Status: Ready for execution