# GLAMORCUBEPSXHFN Query Execution Guide **Date**: 2025-11-13 **Status**: Ready for execution --- ## Quick Start ### 1. Choose a Query Navigate to the query directory for your target class: ```bash cd /Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN # Example: Museum queries cat M/queries/museum_query_complete_20251113T131027.sparql ``` ### 2. Execute Query 1. Open https://query.wikidata.org/ 2. Paste SPARQL query content 3. Click **"Run"** (or press Ctrl+Enter) 4. Wait for results (may take 10-60 seconds for large result sets) ### 3. Download Results Click **"Download"** → Choose format: - **JSON** (recommended for processing) - **CSV** (for spreadsheet analysis) - **TSV** (for data import) ### 4. Review Results Check the downloaded file for: - Valid heritage institution subtypes - False positives (non-heritage classes) - Semantic correctness of labels --- ## Query Inventory ### ✅ Complete Queries (13 classes) | Class | File | Base Classes | Expected Results | |-------|------|--------------|------------------| | **A** (Archive) | `A/queries/archive_query_missing_complete_20251113T130052.sparql` | 1 | 0 (already captured) | | **B** (Botanical/Zoo) | `B/queries/botanical_zoo_query_complete_20251113T130659.sparql` | 18 | High (many subtypes) | | **G** (Gallery) | `G/queries/gallery_query_complete_20251113T130920.sparql` | 4 | Low-Medium | | **L** (Library) | `L/queries/library_query_complete_20251113T131006.sparql` | 11 | Medium-High | | **M** (Museum) | `M/queries/museum_query_complete_20251113T131027.sparql` | 14 | Very High | | **O** (Official Inst.) | `O/queries/official_query_complete_20251113T131055.sparql` | 4 | Low-Medium | | **R** (Research Ctr) | `R/queries/research_query_complete_20251113T131055.sparql` | 3 | Medium | | **C** (Corporation) | `C/queries/corporation_query_complete_20251113T131055.sparql` | 3 | Low | | **E** (Education) | `E/queries/education_query_complete_20251113T131055.sparql` | 6 | High | | **P** (Personal Coll.) | `P/queries/personal_query_complete_20251113T131055.sparql` | 2 | Low | | **S** (Coll. Society) | `S/queries/collecting_query_complete_20251113T131055.sparql` | 3 | Low-Medium | | **H** (Holy Sites) | `H/queries/holy_query_complete_20251113T131055.sparql` | 6 | Medium | | **F** (Features) | `F/queries/features_query_complete_20251113T131055.sparql` | 4 | Medium-High | **Note**: U (Unknown) and X (Mixed) classes do not have queries (special classification states). --- ## Recommended Execution Order ### Priority 1: High-Value, Low-Noise Classes Start with well-defined institutional classes: 1. **Museum (M)** - Expected to return many valid museum subtypes 2. **Library (L)** - Well-structured taxonomy in Wikidata 3. **Gallery (G)** - Focused domain, clear boundaries **Why first?** These classes have: - Clear semantic boundaries - High-quality Wikidata curation - Low false positive rate - Immediate value for GLAM curation ### Priority 2: Specialized Heritage Classes Continue with niche heritage types: 4. **Archive (A)** - Verify completeness (should return 0 results) 5. **Botanical/Zoo (B)** - Large taxonomy, needs careful review 6. **Features (F)** - Monuments, memorials, sculptures **Curation note**: Features (F) may include non-heritage physical objects. Review each result carefully. ### Priority 3: Organizational Classes Proceed to organizational entities: 7. **Education Provider (E)** - Universities, colleges, schools with collections 8. **Research Center (R)** - Scientific institutes, documentation centers 9. **Official Institution (O)** - Government heritage agencies **Curation note**: Filter for institutions that actually maintain heritage collections (not all universities have museums/archives). ### Priority 4: Niche/Low-Volume Classes Finish with specialized collection types: 10. **Holy Sites (H)** - Religious institutions with heritage collections 11. **Collecting Society (S)** - Historical societies, numismatic clubs 12. **Personal Collection (P)** - Private collections 13. **Corporation (C)** - Corporate archives/museums **Curation note**: These classes often overlap with others (e.g., corporate museums are also museums). Document multi-type classifications. --- ## Query Execution Checklist For each query execution: - [ ] Copy SPARQL from `[class]/queries/*_complete_*.sparql` - [ ] Execute at https://query.wikidata.org/ - [ ] Download results as JSON - [ ] Save JSON to `[class]/sparql/results_[YYYYMMDD].json` - [ ] Review results for: - [ ] Valid heritage institution subtypes - [ ] False positives (non-heritage) - [ ] Semantic correctness - [ ] Geographic diversity - [ ] Document results in `[class]/CURATION_LOG.md` - [ ] Add validated Q-numbers to `hyponyms_curated.yaml` - [ ] Re-run query to discover next batch --- ## Curation Workflow ### Step 1: Review Query Results Open the downloaded JSON file: ```bash cat M/sparql/results_20251113.json | jq '.results.bindings[] | {q: .hyponym.value, label: .hyponymLabel.value}' ``` ### Step 2: Validate Each Q-number For each result, check: 1. **Is it a heritage institution type?** - Museums, libraries, archives, galleries, etc. - Collections, societies, cultural organizations - NOT: administrative units, geographic features (unless F-class) 2. **What GLAMORCUBEPSXHFN class(es)?** - Single type: M (museum), L (library), A (archive), etc. - Multiple types: Use X (mixed) or list all applicable codes 3. **Geographic/cultural context?** - Country-specific types (note in `country:` field) - Regional variations (note in `subregion:` field) 4. **Historical context?** - Defunct institution types (note in `time:` field) - Historical periods (e.g., "Imperial Russia", "Medieval") ### Step 3: Add to Curated Vocabulary Edit `hyponyms_curated.yaml`: ```yaml hyponym: - label: Q[NUMBER] hypernym: - [descriptive term from Wikidata label] type: - [GLAMORCUBEPSXHFN code: A, B, C, E, F, G, H, L, M, O, P, R, S, or X] country: # optional - [ISO 3166-1 alpha-2 country code] subregion: # optional - [region name] time: # optional - [temporal context, e.g., "1900-1950", "< 1948"] rico: # optional (for archival record types) - label: recordSetTypes duplicate: # optional (if merged with another Q-number) - Q[DUPLICATE_NUMBER] ``` **Example**: ```yaml - label: Q123456 hypernym: - maritime museum type: - M country: - Netherlands ``` ### Step 4: Re-run Query (Iterative Discovery) After adding Q-numbers to `hyponyms_curated.yaml`: 1. Queries automatically exclude newly curated Q-numbers (next execution) 2. Run query again to discover transitive subclasses 3. Continue until no new relevant results found 4. Mark class as "complete" in tracking doc --- ## Common False Positives ### Museums (M) - ❌ **Museum websites** (Q386724) - Digital platforms, not institution types - ❌ **Museum collections** (Q2668072) - Collection types, not institutions - ❌ **Museum buildings** (Q41176) - Architecture, not organizations - ✅ **Museum subtypes** (e.g., Q207694 "art museum") - Valid! ### Libraries (L) - ❌ **Library catalogs** (Q5994) - Systems, not institutions - ❌ **Library software** (Q7375) - Technology, not organizations - ✅ **Library types** (e.g., Q28564 "public library") - Valid! ### Education (E) - ❌ **All universities** - Only include if they maintain heritage collections - ❌ **Primary schools** - Rarely have heritage significance - ✅ **Universities with archives/museums** - Valid if documented ### Features (F) - ❌ **Natural features** (mountains, rivers) - Not heritage custodians - ❌ **Living people** - Not physical features - ✅ **Monuments, memorials, sculptures, cemeteries** - Valid! --- ## Query Performance Tips ### Timeout Issues If query times out (>60 seconds): 1. **Split query by base class**: Run separate queries for each `UNION` clause 2. **Add temporal filter**: Limit to items created after a certain year 3. **Reduce language list**: Focus on 10-15 major languages 4. **Use LIMIT**: Add `LIMIT 1000` for initial exploration ### Large Result Sets If query returns >10,000 results: 1. **Prioritize by usage**: Add `ORDER BY DESC(?usageCount)` (count statements) 2. **Filter by sitelinks**: `FILTER(?sitelinks > 5)` to focus on well-documented items 3. **Geographic focus**: Add country/region filters for phased curation ### Memory Issues If browser/WDQS crashes: 1. **Use LIMIT**: Start with `LIMIT 100`, increase gradually 2. **Download in batches**: Run query multiple times with `OFFSET` 3. **Use API**: Query via https://query.wikidata.org/sparql (programmatic) --- ## Automation Scripts (Future) ### Batch Query Execution ```python # Planned: scripts/execute_wikidata_queries.py # - Read all *.sparql files # - Execute via WDQS API # - Save results to class-specific directories # - Generate curation dashboard ``` ### Result Analysis ```python # Planned: scripts/analyze_query_results.py # - Parse JSON results # - Identify potential false positives # - Suggest hypernym relationships # - Generate curation candidates ``` ### Iterative Curation ```python # Planned: scripts/iterative_hyponym_discovery.py # - Execute query # - Present results for human review # - Add validated Q-numbers to hyponyms_curated.yaml # - Re-run query # - Repeat until no new results ``` --- ## Troubleshooting ### "Query timeout" error **Cause**: Query takes >60 seconds to execute. **Solution**: Simplify query (see "Query Performance Tips" above). ### "Too many results" warning **Cause**: Result set >10,000 rows. **Solution**: Add `LIMIT 1000` or use batch execution with `OFFSET`. ### "Malformed query" error **Cause**: SPARQL syntax error (rare, all queries pre-validated). **Solution**: Check FILTER clauses are correctly closed with parentheses. ### Query returns base classes **Cause**: Using `wdt:P279*` instead of `wdt:P279+`. **Solution**: Already corrected in all queries (use `wdt:P279+`). --- ## Progress Tracking ### Execution Log Template Create `[class]/EXECUTION_LOG.md` for each class: ```markdown # [Class] Query Execution Log ## Execution 1 - **Date**: 2025-11-13 - **Query**: [filename] - **Results**: [count] hyponyms - **Curated**: [count] added to hyponyms_curated.yaml - **Rejected**: [count] false positives - **Notes**: [observations] ## Execution 2 - **Date**: 2025-11-XX - **Query**: [filename] (re-run) - **Results**: [count] new hyponyms (after exclusion) - **Status**: [Complete / Continue / Review] ``` ### Completion Criteria Mark class as "complete" when: - [ ] Query returns <10 new relevant results - [ ] All major institution subtypes are captured - [ ] Geographic coverage is adequate - [ ] Iterative discovery yields diminishing returns --- ## References - **Query Files**: `/Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN/[A-Z]/queries/` - **Curated Vocabulary**: `/Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml` - **Wikidata Query Service**: https://query.wikidata.org/ - **SPARQL Tutorial**: https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial - **Session Summary**: `docs/sessions/SESSION_SUMMARY_20251113_SPARQL_GENERATION.md` --- **Version**: 1.0 **Last Updated**: 2025-11-13 **Status**: Ready for execution