glam/data/wikidata/GLAMORCUBEPSXHFN/00-QUERY_EXECUTION_GUIDE.md
2025-11-19 23:25:22 +01:00

303 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# GLAMORCUBEPSXHF SPARQL Query Execution Guide
**Project**: Multilingual vocabulary/thesaurus extraction for 15 heritage institution types
**Date**: 2025-11-12
**Purpose**: Execute all SPARQL queries against Wikidata to retrieve terminology
---
## Query Execution Order
Execute these queries in the order listed below. Save results to the specified output files.
### Completed Classes (8/15 = 53%)
**G-Class (GALLERY)** - COMPLETED
- Query: `data/wikidata/GLAMORCUBEPSXH/G/sparql/gallery_hyponyms.sparql`
- Output: `data/wikidata/GLAMORCUBEPSXH/G/sparql/hyponyms_raw.json`
- Status: Results saved, analysis complete
**L-Class (LIBRARY)** - COMPLETED
- Query: `data/wikidata/GLAMORCUBEPSXH/L/sparql/library_hyponyms.sparql`
- Output: `data/wikidata/GLAMORCUBEPSXH/L/sparql/hyponyms_raw.json`
- Status: Results saved, analysis complete
**A-Class (ARCHIVE)** - COMPLETED
- Query: `data/wikidata/GLAMORCUBEPSXH/A/sparql/archive_hyponyms.sparql`
- Output: `data/wikidata/GLAMORCUBEPSXH/A/sparql/hyponyms_raw.json`
- Status: Results saved, analysis complete
**M-Class (MUSEUM)** - COMPLETED
- Query: `data/wikidata/GLAMORCUBEPSXH/M/sparql/museum_hyponyms.sparql`
- Output: `data/wikidata/GLAMORCUBEPSXH/M/sparql/hyponyms_raw.json`
- Status: Results saved, analysis complete
**O-Class (OFFICIAL_INSTITUTION)** - COMPLETED
- Query: `data/wikidata/GLAMORCUBEPSXH/O/sparql/query.sparql`
- Output: `data/wikidata/GLAMORCUBEPSXH/O/sparql/hyponyms_merged.json`
- Status: Results saved, analysis complete
**H-Class (HOLY_SITES)** - COMPLETED
- Query: `data/wikidata/GLAMORCUBEPSXH/H/sparql/holy_sites_hyponyms.sparql`
- Output: `data/wikidata/GLAMORCUBEPSXH/H/sparql/hyponyms_raw.json`
- Status: Results saved, analysis complete
**F-Class (FEATURES)** - COMPLETED
- Query: Multiple feature-specific queries in `data/wikidata/GLAMORCUBEPSXH/F/sparql/`
- Output: `data/wikidata/GLAMORCUBEPSXH/F/sparql/hyponyms_raw.json`
- Status: Results saved, analysis complete
**R-Class (RESEARCH_CENTER)** - COMPLETED
- Query: `data/wikidata/GLAMORCUBEPSXH/R/sparql/research_center_hyponyms.sparql`
- Output: `data/wikidata/GLAMORCUBEPSXH/R/sparql/hyponyms_raw.json`
- Status: Results saved, analysis complete
---
## Remaining Classes (7/15 = 47%)
### Priority 1: C-Class (CORPORATION)
**Status**: Query ready, awaiting execution
**Query File**: `data/wikidata/GLAMORCUBEPSXH/C/sparql/corporation_hyponyms.sparql`
**Target Classes**:
- Q18631232 (corporate archive)
- Q33506 (museum) + Q4830453 (business)
- Q1616075 (company museum)
**Output File**: `data/wikidata/GLAMORCUBEPSXH/C/sparql/hyponyms_raw.json`
**Expected Terms**: Corporate museum, company archive, brand heritage center, business archive, Firmenmuseum (de), 企業博物館 (ja)
**Execution Command** (Wikidata Query Service):
```bash
# Copy query from corporation_hyponyms.sparql
# Execute at: https://query.wikidata.org/
# Export results as JSON
# Save to: data/wikidata/GLAMORCUBEPSXH/C/sparql/hyponyms_raw.json
```
---
### Priority 2: U-Class (UNKNOWN)
**Status**: Not applicable - UNKNOWN is assigned during data extraction
**Query File**: N/A
**Target Classes**: No Wikidata query needed for U-class
**Explanation**: The U-class represents institutions where the type cannot be determined during data extraction. This is not a Wikidata class but rather a fallback classification used when:
- Source data lacks type information
- Institution description is ambiguous
- Multiple conflicting type indicators exist
U-class institutions should be manually reviewed and reclassified to appropriate types (G, L, A, M, etc.) when more information becomes available.
**Note**: Universities are classified under **E (EDUCATION_PROVIDER)**, not U-class.
---
### Priority 3: B-Class (BOTANICAL_ZOO)
**Status**: Query ready, awaiting execution
**Query File**: `data/wikidata/GLAMORCUBEPSXH/B/sparql/botanical_zoo_hyponyms.sparql`
**Target Classes**:
- Q167346 (botanical garden)
- Q43501 (zoo)
- Q27686 (aquarium)
- Q1855774 (natural history museum)
**Output File**: `data/wikidata/GLAMORCUBEPSXH/B/sparql/hyponyms_raw.json`
**Expected Terms**: Botanical garden, zoo, aquarium, arboretum, jardin botanique (fr), zoológico (es)
**Execution Command**:
```bash
# Copy query from botanical_zoo_hyponyms.sparql
# Execute at: https://query.wikidata.org/
# Export results as JSON
# Save to: data/wikidata/GLAMORCUBEPSXH/B/sparql/hyponyms_raw.json
```
---
### Priority 4: E-Class (EDUCATION_PROVIDER)
**Status**: Query ready, awaiting execution
**Query File**: `data/wikidata/GLAMORCUBEPSXH/E/sparql/education_provider_hyponyms.sparql`
**Target Classes**:
- Q3914 (school)
- Q15936437 (training center)
- Q1390872 (vocational school)
- Q2385804 (educational institution)
**Output File**: `data/wikidata/GLAMORCUBEPSXH/E/sparql/hyponyms_raw.json`
**Expected Terms**: School, training center, vocational school, Schule (de), escuela (es), 学校 (ja)
**Execution Command**:
```bash
# Copy query from education_provider_hyponyms.sparql
# Execute at: https://query.wikidata.org/
# Export results as JSON
# Save to: data/wikidata/GLAMORCUBEPSXH/E/sparql/hyponyms_raw.json
```
---
### Priority 5: S-Class (COLLECTING_SOCIETY)
**Status**: Query ready, awaiting execution
**Query File**: `data/wikidata/GLAMORCUBEPSXH/S/sparql/collecting_society_hyponyms.sparql`
**Target Classes**:
- Q1391145 (historical society)
- Q2900544 (heritage society)
- Q955824 (learned society)
- Q5533467 (genealogical society)
- Q564323 (antiquarian society)
**Output File**: `data/wikidata/GLAMORCUBEPSXH/S/sparql/hyponyms_raw.json`
**Expected Terms**: Historical society, heritage society, heemkundige kring (nl), genealogical society
**Execution Command**:
```bash
# Copy query from collecting_society_hyponyms.sparql
# Execute at: https://query.wikidata.org/
# Export results as JSON
# Save to: data/wikidata/GLAMORCUBEPSXH/S/sparql/hyponyms_raw.json
```
---
### Priority 6: P-Class (PERSONAL_COLLECTION)
**Status**: Query ready, awaiting execution
**Query File**: `data/wikidata/GLAMORCUBEPSXH/P/sparql/personal_collection_hyponyms.sparql`
**Target Classes**:
- Q768717 (private collection)
- Private museums, archives, libraries
**Output File**: `data/wikidata/GLAMORCUBEPSXH/P/sparql/hyponyms_raw.json`
**Expected Terms**: Private collection, personal collection, private museum, colección privada (es)
**Execution Command**:
```bash
# Copy query from personal_collection_hyponyms.sparql
# Execute at: https://query.wikidata.org/
# Export results as JSON
# Save to: data/wikidata/GLAMORCUBEPSXH/P/sparql/hyponyms_raw.json
```
---
### Priority 7: X-Class (MIXED)
**Status**: Query ready, awaiting execution
**Query File**: `data/wikidata/GLAMORCUBEPSXH/X/sparql/mixed_hyponyms.sparql`
**Target Classes**:
- Q207694 (cultural center)
- Q22808320 (heritage center)
- Q3152824 (cultural institution)
- Q1030034 (memory institution)
**Output File**: `data/wikidata/GLAMORCUBEPSXH/X/sparql/hyponyms_raw.json`
**Expected Terms**: Cultural center, heritage center, memory institution, multi-purpose institution
**Execution Command**:
```bash
# Copy query from mixed_hyponyms.sparql
# Execute at: https://query.wikidata.org/
# Export results as JSON
# Save to: data/wikidata/GLAMORCUBEPSXH/X/sparql/hyponyms_raw.json
```
---
## Batch Execution Workflow
### Step 1: Execute All Queries
For each remaining class (C, U, B, E, S, P, X):
1. Open Wikidata Query Service: https://query.wikidata.org/
2. Copy SPARQL query from the respective `.sparql` file
3. Paste into query editor
4. Click "Execute" button
5. Wait for results (may take 30-60 seconds)
6. Click "Download" → "JSON"
7. Save to the specified output file (`hyponyms_raw.json`)
### Step 2: Verify Results
After downloading each result file, verify:
- File size > 0 bytes
- Valid JSON format
- Contains `results.bindings` array
- Has expected fields: `class`, `classLabel`, `altLabels`
### Step 3: Processing
Once all 7 raw result files are saved, notify the AI agent to:
1. Deduplicate by QID
2. Generate statistics
3. Create analysis documents
4. Update master checklist
---
## Query Execution Checklist
- [ ] C-Class (CORPORATION) - `corporation_hyponyms.sparql``hyponyms_raw.json`
- [ ] U-Class (UNIVERSITY) - `university_hyponyms.sparql``hyponyms_raw.json`
- [ ] B-Class (BOTANICAL_ZOO) - `botanical_zoo_hyponyms.sparql``hyponyms_raw.json`
- [ ] E-Class (EDUCATION_PROVIDER) - `education_provider_hyponyms.sparql``hyponyms_raw.json`
- [ ] S-Class (COLLECTING_SOCIETY) - `collecting_society_hyponyms.sparql``hyponyms_raw.json`
- [ ] P-Class (PERSONAL_COLLECTION) - `personal_collection_hyponyms.sparql``hyponyms_raw.json`
- [ ] X-Class (MIXED) - `mixed_hyponyms.sparql``hyponyms_raw.json`
---
## Expected Timeline
- **Query execution**: 5-10 minutes per class (7 classes × 10 min = ~70 minutes)
- **Data processing**: 30-60 minutes (automated by AI agent)
- **Total completion**: 2-3 hours
---
## Success Criteria
All queries executed successfully when:
- 7 new `hyponyms_raw.json` files created
- Each file > 1 KB (contains results)
- Valid JSON format in all files
- Master checklist updated to 15/15 (100%)
---
## Notes
- Wikidata Query Service may timeout for very large result sets
- If timeout occurs, consider splitting query into smaller geographic regions
- Save queries incrementally (don't lose progress)
- JSON export preserves all language labels and altLabels
---
**Next Action**: Execute C-Class query first, then proceed through priorities 2-7.