303 lines
9.4 KiB
Markdown
303 lines
9.4 KiB
Markdown
# GLAMORCUBEPSXHF SPARQL Query Execution Guide
|
||
|
||
**Project**: Multilingual vocabulary/thesaurus extraction for 15 heritage institution types
|
||
**Date**: 2025-11-12
|
||
**Purpose**: Execute all SPARQL queries against Wikidata to retrieve terminology
|
||
|
||
---
|
||
|
||
## Query Execution Order
|
||
|
||
Execute these queries in the order listed below. Save results to the specified output files.
|
||
|
||
### Completed Classes (8/15 = 53%)
|
||
|
||
✅ **G-Class (GALLERY)** - COMPLETED
|
||
- Query: `data/wikidata/GLAMORCUBEPSXH/G/sparql/gallery_hyponyms.sparql`
|
||
- Output: `data/wikidata/GLAMORCUBEPSXH/G/sparql/hyponyms_raw.json`
|
||
- Status: Results saved, analysis complete
|
||
|
||
✅ **L-Class (LIBRARY)** - COMPLETED
|
||
- Query: `data/wikidata/GLAMORCUBEPSXH/L/sparql/library_hyponyms.sparql`
|
||
- Output: `data/wikidata/GLAMORCUBEPSXH/L/sparql/hyponyms_raw.json`
|
||
- Status: Results saved, analysis complete
|
||
|
||
✅ **A-Class (ARCHIVE)** - COMPLETED
|
||
- Query: `data/wikidata/GLAMORCUBEPSXH/A/sparql/archive_hyponyms.sparql`
|
||
- Output: `data/wikidata/GLAMORCUBEPSXH/A/sparql/hyponyms_raw.json`
|
||
- Status: Results saved, analysis complete
|
||
|
||
✅ **M-Class (MUSEUM)** - COMPLETED
|
||
- Query: `data/wikidata/GLAMORCUBEPSXH/M/sparql/museum_hyponyms.sparql`
|
||
- Output: `data/wikidata/GLAMORCUBEPSXH/M/sparql/hyponyms_raw.json`
|
||
- Status: Results saved, analysis complete
|
||
|
||
✅ **O-Class (OFFICIAL_INSTITUTION)** - COMPLETED
|
||
- Query: `data/wikidata/GLAMORCUBEPSXH/O/sparql/query.sparql`
|
||
- Output: `data/wikidata/GLAMORCUBEPSXH/O/sparql/hyponyms_merged.json`
|
||
- Status: Results saved, analysis complete
|
||
|
||
✅ **H-Class (HOLY_SITES)** - COMPLETED
|
||
- Query: `data/wikidata/GLAMORCUBEPSXH/H/sparql/holy_sites_hyponyms.sparql`
|
||
- Output: `data/wikidata/GLAMORCUBEPSXH/H/sparql/hyponyms_raw.json`
|
||
- Status: Results saved, analysis complete
|
||
|
||
✅ **F-Class (FEATURES)** - COMPLETED
|
||
- Query: Multiple feature-specific queries in `data/wikidata/GLAMORCUBEPSXH/F/sparql/`
|
||
- Output: `data/wikidata/GLAMORCUBEPSXH/F/sparql/hyponyms_raw.json`
|
||
- Status: Results saved, analysis complete
|
||
|
||
✅ **R-Class (RESEARCH_CENTER)** - COMPLETED
|
||
- Query: `data/wikidata/GLAMORCUBEPSXH/R/sparql/research_center_hyponyms.sparql`
|
||
- Output: `data/wikidata/GLAMORCUBEPSXH/R/sparql/hyponyms_raw.json`
|
||
- Status: Results saved, analysis complete
|
||
|
||
---
|
||
|
||
## Remaining Classes (7/15 = 47%)
|
||
|
||
### Priority 1: C-Class (CORPORATION)
|
||
|
||
**Status**: Query ready, awaiting execution
|
||
|
||
**Query File**: `data/wikidata/GLAMORCUBEPSXH/C/sparql/corporation_hyponyms.sparql`
|
||
|
||
**Target Classes**:
|
||
- Q18631232 (corporate archive)
|
||
- Q33506 (museum) + Q4830453 (business)
|
||
- Q1616075 (company museum)
|
||
|
||
**Output File**: `data/wikidata/GLAMORCUBEPSXH/C/sparql/hyponyms_raw.json`
|
||
|
||
**Expected Terms**: Corporate museum, company archive, brand heritage center, business archive, Firmenmuseum (de), 企業博物館 (ja)
|
||
|
||
**Execution Command** (Wikidata Query Service):
|
||
```bash
|
||
# Copy query from corporation_hyponyms.sparql
|
||
# Execute at: https://query.wikidata.org/
|
||
# Export results as JSON
|
||
# Save to: data/wikidata/GLAMORCUBEPSXH/C/sparql/hyponyms_raw.json
|
||
```
|
||
|
||
---
|
||
|
||
### Priority 2: U-Class (UNKNOWN)
|
||
|
||
**Status**: Not applicable - UNKNOWN is assigned during data extraction
|
||
|
||
**Query File**: N/A
|
||
|
||
**Target Classes**: No Wikidata query needed for U-class
|
||
|
||
**Explanation**: The U-class represents institutions where the type cannot be determined during data extraction. This is not a Wikidata class but rather a fallback classification used when:
|
||
- Source data lacks type information
|
||
- Institution description is ambiguous
|
||
- Multiple conflicting type indicators exist
|
||
|
||
U-class institutions should be manually reviewed and reclassified to appropriate types (G, L, A, M, etc.) when more information becomes available.
|
||
|
||
**Note**: Universities are classified under **E (EDUCATION_PROVIDER)**, not U-class.
|
||
|
||
---
|
||
|
||
### Priority 3: B-Class (BOTANICAL_ZOO)
|
||
|
||
**Status**: Query ready, awaiting execution
|
||
|
||
**Query File**: `data/wikidata/GLAMORCUBEPSXH/B/sparql/botanical_zoo_hyponyms.sparql`
|
||
|
||
**Target Classes**:
|
||
- Q167346 (botanical garden)
|
||
- Q43501 (zoo)
|
||
- Q27686 (aquarium)
|
||
- Q1855774 (natural history museum)
|
||
|
||
**Output File**: `data/wikidata/GLAMORCUBEPSXH/B/sparql/hyponyms_raw.json`
|
||
|
||
**Expected Terms**: Botanical garden, zoo, aquarium, arboretum, jardin botanique (fr), zoológico (es)
|
||
|
||
**Execution Command**:
|
||
```bash
|
||
# Copy query from botanical_zoo_hyponyms.sparql
|
||
# Execute at: https://query.wikidata.org/
|
||
# Export results as JSON
|
||
# Save to: data/wikidata/GLAMORCUBEPSXH/B/sparql/hyponyms_raw.json
|
||
```
|
||
|
||
---
|
||
|
||
### Priority 4: E-Class (EDUCATION_PROVIDER)
|
||
|
||
**Status**: Query ready, awaiting execution
|
||
|
||
**Query File**: `data/wikidata/GLAMORCUBEPSXH/E/sparql/education_provider_hyponyms.sparql`
|
||
|
||
**Target Classes**:
|
||
- Q3914 (school)
|
||
- Q15936437 (training center)
|
||
- Q1390872 (vocational school)
|
||
- Q2385804 (educational institution)
|
||
|
||
**Output File**: `data/wikidata/GLAMORCUBEPSXH/E/sparql/hyponyms_raw.json`
|
||
|
||
**Expected Terms**: School, training center, vocational school, Schule (de), escuela (es), 学校 (ja)
|
||
|
||
**Execution Command**:
|
||
```bash
|
||
# Copy query from education_provider_hyponyms.sparql
|
||
# Execute at: https://query.wikidata.org/
|
||
# Export results as JSON
|
||
# Save to: data/wikidata/GLAMORCUBEPSXH/E/sparql/hyponyms_raw.json
|
||
```
|
||
|
||
---
|
||
|
||
### Priority 5: S-Class (COLLECTING_SOCIETY)
|
||
|
||
**Status**: Query ready, awaiting execution
|
||
|
||
**Query File**: `data/wikidata/GLAMORCUBEPSXH/S/sparql/collecting_society_hyponyms.sparql`
|
||
|
||
**Target Classes**:
|
||
- Q1391145 (historical society)
|
||
- Q2900544 (heritage society)
|
||
- Q955824 (learned society)
|
||
- Q5533467 (genealogical society)
|
||
- Q564323 (antiquarian society)
|
||
|
||
**Output File**: `data/wikidata/GLAMORCUBEPSXH/S/sparql/hyponyms_raw.json`
|
||
|
||
**Expected Terms**: Historical society, heritage society, heemkundige kring (nl), genealogical society
|
||
|
||
**Execution Command**:
|
||
```bash
|
||
# Copy query from collecting_society_hyponyms.sparql
|
||
# Execute at: https://query.wikidata.org/
|
||
# Export results as JSON
|
||
# Save to: data/wikidata/GLAMORCUBEPSXH/S/sparql/hyponyms_raw.json
|
||
```
|
||
|
||
---
|
||
|
||
### Priority 6: P-Class (PERSONAL_COLLECTION)
|
||
|
||
**Status**: Query ready, awaiting execution
|
||
|
||
**Query File**: `data/wikidata/GLAMORCUBEPSXH/P/sparql/personal_collection_hyponyms.sparql`
|
||
|
||
**Target Classes**:
|
||
- Q768717 (private collection)
|
||
- Private museums, archives, libraries
|
||
|
||
**Output File**: `data/wikidata/GLAMORCUBEPSXH/P/sparql/hyponyms_raw.json`
|
||
|
||
**Expected Terms**: Private collection, personal collection, private museum, colección privada (es)
|
||
|
||
**Execution Command**:
|
||
```bash
|
||
# Copy query from personal_collection_hyponyms.sparql
|
||
# Execute at: https://query.wikidata.org/
|
||
# Export results as JSON
|
||
# Save to: data/wikidata/GLAMORCUBEPSXH/P/sparql/hyponyms_raw.json
|
||
```
|
||
|
||
---
|
||
|
||
### Priority 7: X-Class (MIXED)
|
||
|
||
**Status**: Query ready, awaiting execution
|
||
|
||
**Query File**: `data/wikidata/GLAMORCUBEPSXH/X/sparql/mixed_hyponyms.sparql`
|
||
|
||
**Target Classes**:
|
||
- Q207694 (cultural center)
|
||
- Q22808320 (heritage center)
|
||
- Q3152824 (cultural institution)
|
||
- Q1030034 (memory institution)
|
||
|
||
**Output File**: `data/wikidata/GLAMORCUBEPSXH/X/sparql/hyponyms_raw.json`
|
||
|
||
**Expected Terms**: Cultural center, heritage center, memory institution, multi-purpose institution
|
||
|
||
**Execution Command**:
|
||
```bash
|
||
# Copy query from mixed_hyponyms.sparql
|
||
# Execute at: https://query.wikidata.org/
|
||
# Export results as JSON
|
||
# Save to: data/wikidata/GLAMORCUBEPSXH/X/sparql/hyponyms_raw.json
|
||
```
|
||
|
||
---
|
||
|
||
## Batch Execution Workflow
|
||
|
||
### Step 1: Execute All Queries
|
||
|
||
For each remaining class (C, U, B, E, S, P, X):
|
||
|
||
1. Open Wikidata Query Service: https://query.wikidata.org/
|
||
2. Copy SPARQL query from the respective `.sparql` file
|
||
3. Paste into query editor
|
||
4. Click "Execute" button
|
||
5. Wait for results (may take 30-60 seconds)
|
||
6. Click "Download" → "JSON"
|
||
7. Save to the specified output file (`hyponyms_raw.json`)
|
||
|
||
### Step 2: Verify Results
|
||
|
||
After downloading each result file, verify:
|
||
- File size > 0 bytes
|
||
- Valid JSON format
|
||
- Contains `results.bindings` array
|
||
- Has expected fields: `class`, `classLabel`, `altLabels`
|
||
|
||
### Step 3: Processing
|
||
|
||
Once all 7 raw result files are saved, notify the AI agent to:
|
||
1. Deduplicate by QID
|
||
2. Generate statistics
|
||
3. Create analysis documents
|
||
4. Update master checklist
|
||
|
||
---
|
||
|
||
## Query Execution Checklist
|
||
|
||
- [ ] C-Class (CORPORATION) - `corporation_hyponyms.sparql` → `hyponyms_raw.json`
|
||
- [ ] U-Class (UNIVERSITY) - `university_hyponyms.sparql` → `hyponyms_raw.json`
|
||
- [ ] B-Class (BOTANICAL_ZOO) - `botanical_zoo_hyponyms.sparql` → `hyponyms_raw.json`
|
||
- [ ] E-Class (EDUCATION_PROVIDER) - `education_provider_hyponyms.sparql` → `hyponyms_raw.json`
|
||
- [ ] S-Class (COLLECTING_SOCIETY) - `collecting_society_hyponyms.sparql` → `hyponyms_raw.json`
|
||
- [ ] P-Class (PERSONAL_COLLECTION) - `personal_collection_hyponyms.sparql` → `hyponyms_raw.json`
|
||
- [ ] X-Class (MIXED) - `mixed_hyponyms.sparql` → `hyponyms_raw.json`
|
||
|
||
---
|
||
|
||
## Expected Timeline
|
||
|
||
- **Query execution**: 5-10 minutes per class (7 classes × 10 min = ~70 minutes)
|
||
- **Data processing**: 30-60 minutes (automated by AI agent)
|
||
- **Total completion**: 2-3 hours
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
|
||
All queries executed successfully when:
|
||
- 7 new `hyponyms_raw.json` files created
|
||
- Each file > 1 KB (contains results)
|
||
- Valid JSON format in all files
|
||
- Master checklist updated to 15/15 (100%)
|
||
|
||
---
|
||
|
||
## Notes
|
||
|
||
- Wikidata Query Service may timeout for very large result sets
|
||
- If timeout occurs, consider splitting query into smaller geographic regions
|
||
- Save queries incrementally (don't lose progress)
|
||
- JSON export preserves all language labels and altLabels
|
||
|
||
---
|
||
|
||
**Next Action**: Execute C-Class query first, then proceed through priorities 2-7.
|