6.3 KiB
6.3 KiB
U-Class (UNKNOWN) - Not a Wikidata Classification
Overview
The U-class represents institutions where the type cannot be determined during data extraction. This is NOT a Wikidata class or concept, but rather a fallback classification used in the GLAMORCUBEPSXHF taxonomy.
Key Points
What U-Class Represents
- UNKNOWN institution type - Cannot be classified into G, L, A, M, O, R, C, B, E, P, S, H, or F categories
- Ambiguous organizations - Source data provides conflicting or unclear type information
- Incomplete data - Institution description lacks sufficient details for classification
What U-Class Is NOT
- ❌ NOT for Universities - Universities are classified under E (EDUCATION_PROVIDER)
- ❌ NOT a Wikidata class - No SPARQL query or hyponym extraction needed
- ❌ NOT a permanent classification - Institutions should be reclassified when more information becomes available
No SPARQL Query Required
Unlike other GLAMORCUBEPSXHF classes (G, L, A, M, O, R, C, B, E, P, S, H, F, X), the U-class does NOT have a Wikidata query because:
- UNKNOWN is a data quality indicator, not a heritage institution type
- No Wikidata classes correspond to "institution of unknown type"
- U-class is assigned during data extraction when type cannot be determined
When to Use U-Class
✅ Appropriate Use Cases
- Extracting from incomplete source data (CSV missing type column)
- Parsing conversations where institution purpose is unclear
- Processing records with conflicting type indicators
- Temporary classification pending manual review
❌ Inappropriate Use Cases
- Universities → Use E (EDUCATION_PROVIDER) instead
- Mixed-type institutions → Use X (MIXED) and document all actual types
- Unknown location → Use U-class only for unknown TYPE, not location
- Lazy classification → Always attempt proper classification first
Workflow for U-Class Institutions
1. Initial Extraction
- id: https://w3id.org/heritage/custodian/unknown/inst-001
name: Heritage Organization XYZ
institution_type: UNKNOWN # U-class assigned
description: "Organization mentioned in conversation with unclear purpose"
provenance:
data_source: CONVERSATION_NLP
data_tier: TIER_4_INFERRED
confidence_score: 0.45 # Low confidence
notes: "Institution type unclear - requires manual review"
2. Manual Review
- Research institution's website, Wikipedia, or authoritative sources
- Determine actual institution type (museum, library, archive, etc.)
- Update
institution_typefield with correct classification
3. Reclassification
- id: https://w3id.org/heritage/custodian/unknown/inst-001
name: Heritage Organization XYZ
institution_type: MUSEUM # Reclassified from UNKNOWN to M-class
description: "Local history museum (verified via website)"
provenance:
data_source: CONVERSATION_NLP
data_tier: TIER_4_INFERRED
confidence_score: 0.85 # Higher confidence after verification
notes: "Originally classified as UNKNOWN, reclassified to MUSEUM after website verification"
verified_date: "2025-11-12T..."
verified_by: "manual_review"
Universities Clarification
CRITICAL: Universities are NOT U-class.
Correct Classification for Universities
| Institution Type | Correct Class | Example |
|---|---|---|
| University | E (EDUCATION_PROVIDER) | Harvard University, Oxford University |
| University library | E (EDUCATION_PROVIDER) | Bodleian Library, Widener Library |
| University museum | E (EDUCATION_PROVIDER) | Yale University Art Gallery |
| University archive | E (EDUCATION_PROVIDER) | MIT Archives and Special Collections |
Why Universities Are E-Class
- Universities are educational institutions with heritage collections
- E-class includes schools, training centers, vocational schools, AND universities
- Wikidata classes: Q3918 (university), Q875538 (university library), Q1143635 (academic library)
- See updated E-class query:
/data/wikidata/GLAMORCUBEPSXH/E/sparql/education_provider_hyponyms.sparql
Directory Structure
data/wikidata/GLAMORCUBEPSXH/U/
├── README.md # This file (explanation of U-class)
├── sparql/ # Empty - no SPARQL query needed
└── (no query files) # UNKNOWN is not a Wikidata concept
Data Quality Implications
TIER Assignment for U-Class
U-class institutions typically receive TIER_4_INFERRED (lowest tier) because:
- Type cannot be determined from available data
- Requires manual verification
- May contain errors or ambiguities
Confidence Scoring
U-class institutions should have low confidence scores (< 0.6):
- Signals need for human review
- Flags incomplete data extraction
- Prioritizes records for verification
Review Priority
U-class institutions should be high priority for manual review:
- Search institution's official website
- Query Wikidata for existing records
- Check ISIL registry or national heritage registries
- Reclassify to appropriate type (G, L, A, M, etc.)
Statistics and Monitoring
Expected U-Class Distribution
In a well-extracted dataset:
- < 5% of institutions should be U-class (most types should be determinable)
- High U-class percentage indicates data quality issues
- Zero U-class is ideal (all institutions properly classified)
Monitoring Metrics
# Pseudocode for monitoring U-class usage
total_institutions = count_all_institutions()
u_class_count = count_institutions_by_type("UNKNOWN")
u_class_percentage = (u_class_count / total_institutions) * 100
if u_class_percentage > 10:
print("WARNING: High UNKNOWN classification rate - review extraction logic")
elif u_class_percentage > 5:
print("CAUTION: Elevated UNKNOWN rate - some manual review needed")
else:
print("OK: Low UNKNOWN rate - extraction quality acceptable")
References
- Taxonomy Table:
/AGENTS.md(lines 391-419) - Schema Definition:
/schemas/enums.yaml(InstitutionTypeEnum) - E-Class Query:
/data/wikidata/GLAMORCUBEPSXH/E/sparql/education_provider_hyponyms.sparql - Query Execution Guide:
/data/wikidata/GLAMORCUBEPSXH/00-QUERY_EXECUTION_GUIDE.md
Last Updated: 2025-11-12
Taxonomy Version: GLAMORCUBEPSXHF v2.0
Status: U-class correctly defined as UNKNOWN (not UNIVERSITY)