# U-Class (UNKNOWN) - Not a Wikidata Classification ## Overview The **U-class** represents institutions where the type **cannot be determined** during data extraction. This is NOT a Wikidata class or concept, but rather a **fallback classification** used in the GLAMORCUBEPSXHF taxonomy. ## Key Points ### What U-Class Represents - **UNKNOWN institution type** - Cannot be classified into G, L, A, M, O, R, C, B, E, P, S, H, or F categories - **Ambiguous organizations** - Source data provides conflicting or unclear type information - **Incomplete data** - Institution description lacks sufficient details for classification ### What U-Class Is NOT - ❌ **NOT for Universities** - Universities are classified under **E (EDUCATION_PROVIDER)** - ❌ **NOT a Wikidata class** - No SPARQL query or hyponym extraction needed - ❌ **NOT a permanent classification** - Institutions should be reclassified when more information becomes available ## No SPARQL Query Required Unlike other GLAMORCUBEPSXHF classes (G, L, A, M, O, R, C, B, E, P, S, H, F, X), the U-class does **NOT have a Wikidata query** because: 1. UNKNOWN is a data quality indicator, not a heritage institution type 2. No Wikidata classes correspond to "institution of unknown type" 3. U-class is assigned during data extraction when type cannot be determined ## When to Use U-Class ### ✅ Appropriate Use Cases - Extracting from incomplete source data (CSV missing type column) - Parsing conversations where institution purpose is unclear - Processing records with conflicting type indicators - Temporary classification pending manual review ### ❌ Inappropriate Use Cases - **Universities** → Use **E (EDUCATION_PROVIDER)** instead - **Mixed-type institutions** → Use **X (MIXED)** and document all actual types - **Unknown location** → Use U-class only for unknown TYPE, not location - **Lazy classification** → Always attempt proper classification first ## Workflow for U-Class Institutions ### 1. Initial Extraction ```yaml - id: https://w3id.org/heritage/custodian/unknown/inst-001 name: Heritage Organization XYZ institution_type: UNKNOWN # U-class assigned description: "Organization mentioned in conversation with unclear purpose" provenance: data_source: CONVERSATION_NLP data_tier: TIER_4_INFERRED confidence_score: 0.45 # Low confidence notes: "Institution type unclear - requires manual review" ``` ### 2. Manual Review - Research institution's website, Wikipedia, or authoritative sources - Determine actual institution type (museum, library, archive, etc.) - Update `institution_type` field with correct classification ### 3. Reclassification ```yaml - id: https://w3id.org/heritage/custodian/unknown/inst-001 name: Heritage Organization XYZ institution_type: MUSEUM # Reclassified from UNKNOWN to M-class description: "Local history museum (verified via website)" provenance: data_source: CONVERSATION_NLP data_tier: TIER_4_INFERRED confidence_score: 0.85 # Higher confidence after verification notes: "Originally classified as UNKNOWN, reclassified to MUSEUM after website verification" verified_date: "2025-11-12T..." verified_by: "manual_review" ``` ## Universities Clarification **CRITICAL**: Universities are **NOT U-class**. ### Correct Classification for Universities | Institution Type | Correct Class | Example | |------------------|---------------|---------| | University | **E (EDUCATION_PROVIDER)** | Harvard University, Oxford University | | University library | **E (EDUCATION_PROVIDER)** | Bodleian Library, Widener Library | | University museum | **E (EDUCATION_PROVIDER)** | Yale University Art Gallery | | University archive | **E (EDUCATION_PROVIDER)** | MIT Archives and Special Collections | ### Why Universities Are E-Class - Universities are **educational institutions** with heritage collections - E-class includes schools, training centers, vocational schools, AND universities - Wikidata classes: Q3918 (university), Q875538 (university library), Q1143635 (academic library) - See updated E-class query: `/data/wikidata/GLAMORCUBEPSXH/E/sparql/education_provider_hyponyms.sparql` ## Directory Structure ``` data/wikidata/GLAMORCUBEPSXH/U/ ├── README.md # This file (explanation of U-class) ├── sparql/ # Empty - no SPARQL query needed └── (no query files) # UNKNOWN is not a Wikidata concept ``` ## Data Quality Implications ### TIER Assignment for U-Class U-class institutions typically receive **TIER_4_INFERRED** (lowest tier) because: - Type cannot be determined from available data - Requires manual verification - May contain errors or ambiguities ### Confidence Scoring U-class institutions should have **low confidence scores** (< 0.6): - Signals need for human review - Flags incomplete data extraction - Prioritizes records for verification ### Review Priority U-class institutions should be **high priority for manual review**: 1. Search institution's official website 2. Query Wikidata for existing records 3. Check ISIL registry or national heritage registries 4. Reclassify to appropriate type (G, L, A, M, etc.) ## Statistics and Monitoring ### Expected U-Class Distribution In a well-extracted dataset: - **< 5%** of institutions should be U-class (most types should be determinable) - **High U-class percentage** indicates data quality issues - **Zero U-class** is ideal (all institutions properly classified) ### Monitoring Metrics ```python # Pseudocode for monitoring U-class usage total_institutions = count_all_institutions() u_class_count = count_institutions_by_type("UNKNOWN") u_class_percentage = (u_class_count / total_institutions) * 100 if u_class_percentage > 10: print("WARNING: High UNKNOWN classification rate - review extraction logic") elif u_class_percentage > 5: print("CAUTION: Elevated UNKNOWN rate - some manual review needed") else: print("OK: Low UNKNOWN rate - extraction quality acceptable") ``` ## References - **Taxonomy Table**: `/AGENTS.md` (lines 391-419) - **Schema Definition**: `/schemas/enums.yaml` (InstitutionTypeEnum) - **E-Class Query**: `/data/wikidata/GLAMORCUBEPSXH/E/sparql/education_provider_hyponyms.sparql` - **Query Execution Guide**: `/data/wikidata/GLAMORCUBEPSXH/00-QUERY_EXECUTION_GUIDE.md` --- **Last Updated**: 2025-11-12 **Taxonomy Version**: GLAMORCUBEPSXHF v2.0 **Status**: U-class correctly defined as UNKNOWN (not UNIVERSITY)