glam/data/wikidata/GLAMORCUBEPSXHFN/QUERY_AUDIT_REPORT.md
2025-11-19 23:25:22 +01:00

9.5 KiB

GLAMORCUBEPSXHFN Query Audit Report

Date: 2025-11-13
Auditor: Simon C. Kemper
Status: Critical issues found - requires immediate fixes


Executive Summary

Audited 58 unique Q-numbers across 11 GLAMORCUBEPSXHFN class queries. Found 2 critical issues (incorrect Q-numbers) and 7 warnings (overly broad base classes that may generate noise).


Critical Issues (MUST FIX)

1. Q7075 in F (Features) Class - INCORRECT

Problem: Q7075 is "library" (an institution type), NOT a physical feature.

Current Query: data/wikidata/GLAMORCUBEPSXHFN/F/queries/features_query_complete_20251113T131055.sparql

Found In:

UNION {
  # sculpture - Q7075  ← WRONG!
  ?hyponym wdt:P279+ wd:Q7075 .
}

Fix: Remove Q7075 from F-class query. Q7075 is correctly used in L-class (Library).

Correct Q-number for sculpture: Q860861 (sculpture)


2. Q1193702 in O (Official Institution) Class - COMPLETELY WRONG

Problem: Q1193702 is "parliamentary opposition" (political opposition), NOT a heritage institution!

Current Query: data/wikidata/GLAMORCUBEPSXHFN/O/queries/official_query_complete_20251113T131055.sparql

Found In:

UNION {
  # statistical office - Q1193702  ← WRONG!
  ?hyponym wdt:P279+ wd:Q1193702 .
}

Fix: Remove Q1193702 entirely. Statistical offices are NOT heritage institutions.

Alternative: If government heritage agencies are desired, use:

  • Q1664720 (cultural heritage institution) - already included ✓
  • Q7257748 (public heritage organization) - already included ✓

Warnings (Overly Broad - May Need Refinement)

W1. Q3918 (university) in E-class

Issue: Q3918 is the root "university" class - will return ALL universities, not just those with heritage collections.

Impact: May generate 10,000+ results, most of which are not heritage custodians.

Recommendation: Keep, but add filtering for universities with:

  • Museums (P361 has part / P31 museum)
  • Archives (P361 has part / P31 archive)
  • Special collections

Alternative: Use more specific classes:

  • Q15936437 (research university) - already included ✓
  • Q875538 (public university) - already included ✓
  • Q902104 (private university) - already included ✓

W2. Q43229 (organization) in S-class

Issue: Q43229 is the root "organization" class - too broad for Collecting Society.

Impact: Will return millions of organizations, most unrelated to heritage.

Recommendation: Remove Q43229, keep only:

  • Q1065742 (historical society) ✓
  • Q2668072 (collection) - see W3

W3. Q2668072 (collection) in P and S classes

Issue: Q2668072 is "collection" - very broad, used in both Personal Collection (P) and Collecting Society (S).

Impact: May return non-heritage collections (data collections, software collections, etc.).

Recommendation: Keep, but consider more specific subclasses:

  • Q160554 (private collection) - already in P-class ✓
  • Add: Q29642811 (art collection)
  • Add: Q2822856 (archival fonds)

W4. Q480242 (governmental organization) in O-class

Issue: Too broad - includes all government organizations, not just heritage.

Impact: May return tax agencies, military units, courts, etc.

Recommendation: Keep Q1664720 and Q7257748 which are specific to heritage. Remove Q480242.


W5. Q4830453 (business) and Q6881511 (enterprise) in C-class

Issue: Too broad - includes all businesses/enterprises, not just those with heritage collections.

Impact: Will return millions of companies without heritage activities.

Recommendation: Use more specific classes:

  • Q16917 (hospital) - historic hospital collections
  • Q783794 (company) - keep as is (already targeted)
  • Add: Q20832995 (corporate archive)
  • Add: Q96358696 (company museum)

W6. Q1365560 (school) in E-class

Issue: Root "school" class includes primary schools, kindergartens, driving schools, etc.

Impact: Will return many non-heritage educational institutions.

Recommendation: Use more specific subclasses:

  • Keep universities (Q3918, Q875538, Q902104, Q15936437) ✓
  • Keep academy (Q383092) ✓
  • Add: Q4358176 (art school)
  • Add: Q38723 (higher education institution)

Validation Results by Class

C (Corporation) - 3 base classes

  • ⚠️ Q783794 (company) - OK but broad
  • ⚠️ Q4830453 (business) - Too broad
  • ⚠️ Q6881511 (enterprise) - Too broad

Action: Remove Q4830453 and Q6881511, add specialized corporate heritage classes.


E (Education Provider) - 6 base classes

  • ⚠️ Q3918 (university) - Too broad but keep
  • Q383092 (academy) - OK
  • Q875538 (public university) - OK
  • Q902104 (private university) - OK
  • ⚠️ Q1365560 (school) - Too broad
  • Q15936437 (research university) - OK

Action: Consider removing Q1365560, or accept noise and filter during curation.


F (Features) - 4 base classes

  • Q7075 (library) - WRONG! Should be Q860861 (sculpture)
  • Q39614 (cemetery) - OK
  • Q4989906 (monument) - OK
  • Q5003624 (memorial) - OK

Action: Replace Q7075 with Q860861.


  • Q194195 (kunsthalle) - OK
  • Q445396 (print room) - OK
  • Q1007870 (art gallery) - OK
  • Q1007871 (commercial art gallery) - OK

Action: None - all valid.


H (Holy Sites) - 6 base classes

  • Q16970 (church building) - OK
  • Q32815 (mosque) - OK
  • Q34627 (synagogue) - OK
  • Q44539 (temple) - OK
  • Q44613 (monastery) - OK
  • Q160742 (abbey) - OK

Action: None - all valid.


L (Library) - 11 base classes

  • Q7075 (library) - OK (correctly used here)
  • Q28326 (digital library) - OK
  • Q28564 (public library) - OK
  • Q212805 (special library) - OK
  • Q811979 (architectural library) - OK
  • Q856234 (academic library) - OK
  • Q1479318 (music library) - OK
  • Q1543654 (law library) - OK
  • Q1622062 (national library) - OK
  • Q2326815 (research library) - OK
  • Q5995078 (map library) - OK

Action: None - all valid.


M (Museum) - 14 base classes

  • Q33506 (museum) - OK
  • Q42177 (memorial) - OK (memorials can be museums)
  • Q207694 (art museum) - OK
  • Q588140 (history museum) - OK
  • Q679200 (ethnographic museum) - OK
  • Q1231145 (ecomuseum) - OK
  • Q1499158 (folk museum) - OK
  • Q1535661 (city museum) - OK
  • Q1568346 (archaeological museum) - OK
  • Q1662358 (natural history museum) - OK
  • Q2860896 (private museum) - OK
  • Q6589660 (local museum) - OK
  • Q16735822 (science museum) - OK
  • Q17431399 (open-air museum) - OK

Action: None - all valid.


O (Official Institution) - 4 base classes

  • ⚠️ Q480242 (governmental organization) - Too broad
  • Q1193702 (parliamentary opposition) - WRONG!
  • Q1664720 (cultural heritage institution) - OK
  • Q7257748 (public heritage organization) - OK

Action: Remove Q480242 and Q1193702.


P (Personal Collection) - 2 base classes

  • Q160554 (private collection) - OK
  • ⚠️ Q2668072 (collection) - Broad but acceptable

Action: None - acceptable for niche class.


R (Research Center) - 3 base classes

  • Q31855 (research institute) - OK
  • Q4671277 (academic research center) - OK
  • Q13226383 (research facility) - OK

Action: None - all valid.


S (Collecting Society) - 3 base classes

  • ⚠️ Q43229 (organization) - Too broad!
  • Q1065742 (historical society) - OK
  • ⚠️ Q2668072 (collection) - Broad but acceptable

Action: Remove Q43229.


Priority 1: Critical Fixes (DO NOW)

  1. F-class (Features): Replace Q7075 with Q860861 (sculpture)
  2. O-class (Official): Remove Q1193702 and Q480242

Priority 2: Refinement (Consider)

  1. C-class (Corporation): Remove Q4830453 and Q6881511, add Q20832995 and Q96358696
  2. E-class (Education): Consider removing Q1365560 (school)
  3. S-class (Society): Remove Q43229 (organization)

Priority 3: Enhancement (Optional)

  1. C-class: Add specialized corporate heritage classes
  2. E-class: Add art schools and higher education institutions
  3. P-class: Add art collections and archival fonds subtypes

Validation Methodology

  1. Extracted all Q-numbers from SPARQL queries (58 unique)
  2. Manually reviewed each Q-number against Wikidata
  3. Validated against GLAMORCUBEPSXHFN taxonomy definitions
  4. Checked for semantic appropriateness (heritage vs non-heritage)
  5. Identified cross-class conflicts (Q7075 in both L and F)

Next Steps

  1. Fix F-class query (replace Q7075 with Q860861)
  2. Fix O-class query (remove Q1193702 and Q480242)
  3. ⚠️ Review warnings with user - decide on broad vs narrow approach
  4. 🔄 Re-generate all affected YAML metadata files
  5. 📝 Document changes in query comments
  6. Update session summary

Appendix: Correct Q-numbers Reference

Sculpture (for F-class)

  • Q860861 - sculpture (artwork/art form)
  • Q179700 - statue (free-standing sculpture)

Corporate Heritage (for C-class)

  • Q20832995 - corporate archive
  • Q96358696 - company museum
  • Q16917 - hospital (historic medical collections)

Education (for E-class)

  • Q38723 - higher education institution (more specific than Q3918)
  • Q4358176 - art school
  • Q875538 - public university ✓ (already included)

Collections (for P and S classes)

  • Q29642811 - art collection
  • Q2822856 - archival fonds
  • Q160554 - private collection ✓ (already in P)

Report End