glam/COUNTRY_RESTRICTION_IMPLEMENTATION.md
kempersc 67657c39b6 feat: Complete Country Class Implementation and Hypernyms Removal
- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata.
- Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms.
- Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types.
- Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings.
- Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm.
- Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
2025-11-23 13:09:38 +01:00

17 KiB
Raw Blame History

Country Restriction Implementation for FeatureTypeEnum

Date: 2025-11-22
Status: Implementation Plan
Related Files:

  • schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml
  • schemas/20251121/linkml/modules/classes/CustodianPlace.yaml
  • schemas/20251121/linkml/modules/classes/FeaturePlace.yaml
  • schemas/20251121/linkml/modules/classes/Country.yaml

Problem Statement

Some feature types in FeatureTypeEnum are country-specific and should only be used when the CustodianPlace.country matches a specific jurisdiction:

Examples:

  • CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION (Q64960148) - US only (Pittsburgh, Pennsylvania)
  • CULTURAL_HERITAGE_OF_PERU (Q16617058) - Peru only
  • BUITENPLAATS (Q2927789) - Netherlands only (Dutch country estates)
  • NATIONAL_MEMORIAL_OF_THE_UNITED_STATES (Q1967454) - US only

Current Issue: No validation mechanism enforces country restrictions on feature type usage.


Ontology Properties for Jurisdiction

Property: dcterms:spatial
Definition: "The spatial or temporal topic of the resource, spatial applicability of the resource, or jurisdiction under which the resource is relevant."

Source: data/ontology/dublin_core_elements.rdf

<dcterms:spatial>
  rdfs:comment "The spatial or temporal topic of the resource, spatial applicability 
                of the resource, or jurisdiction under which the resource is relevant."@en
  dcterms:description "Spatial topic and spatial applicability may be a named place or 
                       a location specified by its geographic coordinates. ... 
                       A jurisdiction may be a named administrative entity or a geographic 
                       place to which the resource applies."@en
</dcterms:spatial>

Why this is perfect:

  • Explicitly covers "jurisdiction under which the resource is relevant"
  • Allows both named places and ISO country codes
  • W3C standard, widely adopted
  • Already used in DBpedia for HistoricalPeriod → Place relationships

Example usage:

CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION:
  meaning: wd:Q64960148
  annotations:
    dcterms:spatial: "US"  # ISO 3166-1 alpha-2 code

2. RiC-O - rico:hasOrHadJurisdiction (Alternative)

Property: rico:hasOrHadJurisdiction
Inverse: rico:isOrWasJurisdictionOf
Domain: rico:Agent (organizations)
Range: rico:Place

Source: data/ontology/RiC-O_1-1.rdf

<rico:hasOrHadJurisdiction>
  rdfs:subPropertyOf rico:isAgentAssociatedWithPlace
  owl:inverseOf rico:isOrWasJurisdictionOf
  rdfs:domain rico:Agent
  rdfs:range rico:Place
  rdfs:comment "Inverse of 'is or was jurisdiction of' object relation"@en
</rico:hasOrHadJurisdiction>

Why this is less suitable:

  • ⚠️ Designed for organizational jurisdiction (which organization has authority over which place)
  • ⚠️ Not designed for feature type geographic applicability
  • ⚠️ Domain is Agent, not Feature or EnumValue

Conclusion: Use RiC-O for organizational jurisdiction (e.g., "Netherlands National Archives has jurisdiction over Noord-Holland"), NOT for feature type restrictions.


3. Schema.org - schema:addressCountry ALREADY USED

Property: schema:addressCountry
Range: schema:Country or ISO 3166-1 alpha-2 code

Current usage: Already mapped in CustodianPlace.country:

country:
  slot_uri: schema:addressCountry
  range: Country

Why this works for validation:

  • CustodianPlace.country already uses ISO 3166-1 codes
  • Can cross-reference with dcterms:spatial in FeatureTypeEnum
  • Validation rule: "If feature_type.spatial annotation exists, CustodianPlace.country MUST match"

LinkML Implementation Strategy

Rationale: LinkML doesn't have built-in "enum value → class field" conditional validation, so we:

  1. Add dcterms:spatial annotations to country-specific enum values
  2. Implement custom validation rules at the CustodianPlace class level

Step 1: Add dcterms:spatial Annotations to FeatureTypeEnum

# schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml

enums:
  FeatureTypeEnum:
    permissible_values:
      CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION:
        title: City of Pittsburgh historic designation
        meaning: wd:Q64960148
        annotations:
          wikidata_id: Q64960148
          dcterms:spatial: "US"  # ← NEW: Country restriction
          spatial_note: "Pittsburgh, Pennsylvania, United States"
      
      CULTURAL_HERITAGE_OF_PERU:
        title: cultural heritage of Peru
        meaning: wd:Q16617058
        annotations:
          wikidata_id: Q16617058
          dcterms:spatial: "PE"  # ← NEW: Country restriction
      
      BUITENPLAATS:
        title: buitenplaats
        meaning: wd:Q2927789
        annotations:
          wikidata_id: Q2927789
          dcterms:spatial: "NL"  # ← NEW: Country restriction
      
      NATIONAL_MEMORIAL_OF_THE_UNITED_STATES:
        title: National Memorial of the United States
        meaning: wd:Q1967454
        annotations:
          wikidata_id: Q1967454
          dcterms:spatial: "US"  # ← NEW: Country restriction
      
      # Global feature types have NO dcterms:spatial annotation
      MANSION:
        title: mansion
        meaning: wd:Q1802963
        annotations:
          wikidata_id: Q1802963
          # NO dcterms:spatial - applicable globally

Step 2: Add Validation Rules to CustodianPlace Class

# schemas/20251121/linkml/modules/classes/CustodianPlace.yaml

classes:
  CustodianPlace:
    class_uri: crm:E53_Place
    slots:
      - place_name
      - country
      - has_feature_type
      # ... other slots
    
    rules:
      - title: "Feature type country restriction validation"
        description: >-
          If a feature type has a dcterms:spatial annotation (country restriction),
          then the CustodianPlace.country MUST match that restriction.
          
          Examples:
          - CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION requires country.alpha_2 = "US"
          - CULTURAL_HERITAGE_OF_PERU requires country.alpha_2 = "PE"
          - BUITENPLAATS requires country.alpha_2 = "NL"
          
          Feature types WITHOUT dcterms:spatial are applicable globally.          
        
        preconditions:
          slot_conditions:
            has_feature_type:
              # If has_feature_type is populated
              required: true
            country:
              # And country is populated
              required: true
        
        postconditions:
          # CUSTOM VALIDATION (requires external validator)
          description: >-
            Validate that if has_feature_type.feature_type enum value has 
            a dcterms:spatial annotation, then country.alpha_2 MUST equal 
            that annotation value.
            
            Pseudocode:
              feature_enum_value = has_feature_type.feature_type
              spatial_restriction = enum_annotations[feature_enum_value]['dcterms:spatial']
              
              if spatial_restriction is not None:
                assert country.alpha_2 == spatial_restriction, \
                  f"Feature type {feature_enum_value} restricted to {spatial_restriction}, \
                   but CustodianPlace country is {country.alpha_2}"            

Limitation: LinkML's rules block cannot directly access enum annotations. We need a custom Python validator.


Approach 2: Python Custom Validator IMPLEMENTATION REQUIRED

Since LinkML rules can't access enum annotations, implement a post-validation Python script:

# scripts/validate_country_restrictions.py

from linkml_runtime.loaders import yaml_loader
from linkml_runtime.utils.schemaview import SchemaView
from linkml.validators import JsonSchemaDataValidator
from typing import Dict, Optional

def load_feature_type_spatial_restrictions(schema_view: SchemaView) -> Dict[str, str]:
    """
    Extract dcterms:spatial annotations from FeatureTypeEnum permissible values.
    
    Returns:
        Dict mapping feature type enum key → ISO 3166-1 alpha-2 country code
        Example: {"CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION": "US", ...}
    """
    restrictions = {}
    
    enum_def = schema_view.get_enum("FeatureTypeEnum")
    for pv_name, pv in enum_def.permissible_values.items():
        if pv.annotations and "dcterms:spatial" in pv.annotations:
            restrictions[pv_name] = pv.annotations["dcterms:spatial"].value
    
    return restrictions

def validate_custodian_place_country_restrictions(
    custodian_place_data: dict,
    spatial_restrictions: Dict[str, str]
) -> Optional[str]:
    """
    Validate that feature types with country restrictions match CustodianPlace.country.
    
    Returns:
        None if valid, error message string if invalid
    """
    # Extract feature type and country
    feature_place = custodian_place_data.get("has_feature_type")
    if not feature_place:
        return None  # No feature type, no restriction
    
    feature_type_enum = feature_place.get("feature_type")
    if not feature_type_enum:
        return None
    
    # Check if this feature type has a country restriction
    required_country = spatial_restrictions.get(feature_type_enum)
    if not required_country:
        return None  # No restriction, globally applicable
    
    # Get actual country
    country = custodian_place_data.get("country")
    if not country:
        return f"Feature type '{feature_type_enum}' requires country='{required_country}', but no country specified"
    
    # Validate country matches
    actual_country = country.get("alpha_2") if isinstance(country, dict) else country
    
    if actual_country != required_country:
        return (
            f"Feature type '{feature_type_enum}' restricted to country '{required_country}', "
            f"but CustodianPlace.country='{actual_country}'"
        )
    
    return None  # Valid

# Example usage
if __name__ == "__main__":
    schema_view = SchemaView("schemas/20251121/linkml/01_custodian_name.yaml")
    restrictions = load_feature_type_spatial_restrictions(schema_view)
    
    # Test case 1: Invalid (Pittsburgh designation in Peru)
    invalid_data = {
        "place_name": "Lima Historic Building",
        "country": {"alpha_2": "PE"},
        "has_feature_type": {
            "feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION"
        }
    }
    error = validate_custodian_place_country_restrictions(invalid_data, restrictions)
    assert error is not None, "Should detect country mismatch"
    print(f"❌ Validation error: {error}")
    
    # Test case 2: Valid (Pittsburgh designation in US)
    valid_data = {
        "place_name": "Pittsburgh Historic Building",
        "country": {"alpha_2": "US"},
        "has_feature_type": {
            "feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION"
        }
    }
    error = validate_custodian_place_country_restrictions(valid_data, restrictions)
    assert error is None, "Should pass validation"
    print(f"✅ Valid: Pittsburgh designation in US")
    
    # Test case 3: Valid (MANSION has no restriction, can be anywhere)
    global_data = {
        "place_name": "Mansion in France",
        "country": {"alpha_2": "FR"},
        "has_feature_type": {
            "feature_type": "MANSION"
        }
    }
    error = validate_custodian_place_country_restrictions(global_data, restrictions)
    assert error is None, "Should pass validation (global feature type)"
    print(f"✅ Valid: MANSION (global feature type) in France")

Implementation Checklist

Phase 1: Schema Annotations START HERE

  • Identify all country-specific feature types in FeatureTypeEnum.yaml

    • Search Wikidata descriptions for country names
    • Examples: "City of Pittsburgh", "cultural heritage of Peru", "buitenplaats"
    • Use regex: /(United States|Peru|Netherlands|Brazil|Mexico|France|Germany|etc)/i
  • Add dcterms:spatial annotations to country-specific enum values

    • Format: dcterms:spatial: "US" (ISO 3166-1 alpha-2)
    • Add spatial_note for human readability: "Pittsburgh, Pennsylvania, United States"
  • Document annotation semantics in FeatureTypeEnum header

    # Annotations:
    #   dcterms:spatial - Country restriction (ISO 3166-1 alpha-2 code)
    #                     If present, feature type only applicable in specified country
    #                     If absent, feature type is globally applicable
    

Phase 2: Custom Validator Implementation

  • Create validation script scripts/validate_country_restrictions.py

    • Implement load_feature_type_spatial_restrictions()
    • Implement validate_custodian_place_country_restrictions()
    • Add comprehensive test cases
  • Integrate with LinkML validation workflow

    • Add to linkml-validate post-validation step
    • Or create standalone validate-country-restrictions CLI command
  • Add validation tests to test suite

    • Test country-restricted feature types
    • Test global feature types (no restriction)
    • Test missing country field

Phase 3: Documentation

  • Update CustodianPlace documentation

    • Explain country field is required when using country-specific feature types
    • Link to FeatureTypeEnum country restriction annotations
  • Update FeaturePlace documentation

    • Explain feature type country restrictions
    • Provide examples of restricted vs. global feature types
  • Create VALIDATION.md guide

    • Document validation workflow
    • Provide troubleshooting guide for country restriction errors

Approach: Split FeatureTypeEnum by Country

Create separate enums: FeatureTypeEnum_US, FeatureTypeEnum_NL, etc.

Why not:

  • Duplicates global feature types (MANSION exists in every country enum)
  • Breaks DRY principle
  • Hard to maintain (298 feature types → 298 × N countries)
  • Loses semantic clarity

Approach: Create Country-Specific Subclasses of CustodianPlace

Create CustodianPlace_US, CustodianPlace_NL, etc., each with restricted enum ranges.

Why not:

  • Explosion of subclasses (one per country)
  • Type polymorphism issues
  • Hard to extend to new countries
  • Violates Open/Closed Principle

Approach: Use LinkML any_of Conditional Range

has_feature_type:
  range: FeaturePlace
  any_of:
    - country.alpha_2 = "US" → feature_type in [PITTSBURGH_DESIGNATION, NATIONAL_MEMORIAL, ...]
    - country.alpha_2 = "PE" → feature_type in [CULTURAL_HERITAGE_OF_PERU, ...]

Why not:

  • LinkML any_of doesn't support cross-slot conditionals
  • Would require massive any_of block for every country
  • Unreadable and unmaintainable

Rationale for Chosen Approach

Why Annotations + Custom Validator?

Separation of Concerns:

  • Schema defines what (data structure)
  • Annotations define metadata (country restrictions)
  • Validator enforces constraints (business rules)

Maintainability:

  • Add new country-specific feature type: Just add annotation
  • Change restriction: Update annotation, validator logic unchanged

Flexibility:

  • Easy to extend with other restrictions (e.g., dcterms:temporal for time periods)
  • Custom validators can implement complex logic

Ontology Alignment:

  • dcterms:spatial is W3C standard property
  • Aligns with DBpedia and Schema.org spatial semantics

Backward Compatibility:

  • Existing global feature types unaffected (no annotation = no restriction)
  • Gradual migration: Add annotations incrementally

Next Steps

  1. Run ontology property search to confirm dcterms:spatial is best choice
  2. Audit FeatureTypeEnum to identify all country-specific values
  3. Add annotations to schema
  4. Implement Python validator
  5. Integrate into CI/CD validation pipeline

References

Ontology Documentation

  • Dublin Core Terms: data/ontology/dublin_core_elements.rdf
    • dcterms:spatial - Geographic/jurisdictional applicability
  • RiC-O: data/ontology/RiC-O_1-1.rdf
    • rico:hasOrHadJurisdiction - Organizational jurisdiction
  • Schema.org: data/ontology/schemaorg.owl
    • schema:addressCountry - ISO 3166-1 country codes

LinkML Documentation

  • schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml - Feature type definitions
  • schemas/20251121/linkml/modules/classes/CustodianPlace.yaml - Place class with country field
  • schemas/20251121/linkml/modules/classes/FeaturePlace.yaml - Feature type classifier
  • schemas/20251121/linkml/modules/classes/Country.yaml - ISO 3166-1 country codes
  • AGENTS.md - Agent instructions (Rule 1: Ontology Files Are Your Primary Reference)

Status: Ready for implementation
Priority: Medium (nice-to-have validation, not blocking)
Estimated Effort: 4-6 hours (annotation audit + validator + tests)