# Country Restriction Implementation for FeatureTypeEnum **Date**: 2025-11-22 **Status**: Implementation Plan **Related Files**: - `schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml` - `schemas/20251121/linkml/modules/classes/CustodianPlace.yaml` - `schemas/20251121/linkml/modules/classes/FeaturePlace.yaml` - `schemas/20251121/linkml/modules/classes/Country.yaml` --- ## Problem Statement Some feature types in `FeatureTypeEnum` are **country-specific** and should only be used when the `CustodianPlace.country` matches a specific jurisdiction: **Examples**: - `CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION` (Q64960148) - **US only** (Pittsburgh, Pennsylvania) - `CULTURAL_HERITAGE_OF_PERU` (Q16617058) - **Peru only** - `BUITENPLAATS` (Q2927789) - **Netherlands only** (Dutch country estates) - `NATIONAL_MEMORIAL_OF_THE_UNITED_STATES` (Q1967454) - **US only** **Current Issue**: No validation mechanism enforces country restrictions on feature type usage. --- ## Ontology Properties for Jurisdiction ### 1. **Dublin Core Terms - `dcterms:spatial`** ✅ RECOMMENDED **Property**: `dcterms:spatial` **Definition**: "The spatial or temporal topic of the resource, spatial applicability of the resource, or **jurisdiction under which the resource is relevant**." **Source**: `data/ontology/dublin_core_elements.rdf` ```turtle rdfs:comment "The spatial or temporal topic of the resource, spatial applicability of the resource, or jurisdiction under which the resource is relevant."@en dcterms:description "Spatial topic and spatial applicability may be a named place or a location specified by its geographic coordinates. ... A jurisdiction may be a named administrative entity or a geographic place to which the resource applies."@en ``` **Why this is perfect**: - ✅ Explicitly covers **"jurisdiction under which the resource is relevant"** - ✅ Allows both named places and ISO country codes - ✅ W3C standard, widely adopted - ✅ Already used in DBpedia for HistoricalPeriod → Place relationships **Example usage**: ```yaml CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION: meaning: wd:Q64960148 annotations: dcterms:spatial: "US" # ISO 3166-1 alpha-2 code ``` --- ### 2. **RiC-O - `rico:hasOrHadJurisdiction`** (Alternative) **Property**: `rico:hasOrHadJurisdiction` **Inverse**: `rico:isOrWasJurisdictionOf` **Domain**: `rico:Agent` (organizations) **Range**: `rico:Place` **Source**: `data/ontology/RiC-O_1-1.rdf` ```turtle rdfs:subPropertyOf rico:isAgentAssociatedWithPlace owl:inverseOf rico:isOrWasJurisdictionOf rdfs:domain rico:Agent rdfs:range rico:Place rdfs:comment "Inverse of 'is or was jurisdiction of' object relation"@en ``` **Why this is less suitable**: - ⚠️ Designed for **organizational jurisdiction** (which organization has authority over which place) - ⚠️ Not designed for **feature type geographic applicability** - ⚠️ Domain is `Agent`, not `Feature` or `EnumValue` **Conclusion**: Use RiC-O for organizational jurisdiction (e.g., "Netherlands National Archives has jurisdiction over Noord-Holland"), NOT for feature type restrictions. --- ### 3. **Schema.org - `schema:addressCountry`** ✅ ALREADY USED **Property**: `schema:addressCountry` **Range**: `schema:Country` or ISO 3166-1 alpha-2 code **Current usage**: Already mapped in `CustodianPlace.country`: ```yaml country: slot_uri: schema:addressCountry range: Country ``` **Why this works for validation**: - ✅ `CustodianPlace.country` already uses ISO 3166-1 codes - ✅ Can cross-reference with `dcterms:spatial` in FeatureTypeEnum - ✅ Validation rule: "If feature_type.spatial annotation exists, CustodianPlace.country MUST match" --- ## LinkML Implementation Strategy ### Approach 1: **Annotations + Custom Validation Rules** ✅ RECOMMENDED **Rationale**: LinkML doesn't have built-in "enum value → class field" conditional validation, so we: 1. Add `dcterms:spatial` **annotations** to country-specific enum values 2. Implement **custom validation rules** at the `CustodianPlace` class level #### Step 1: Add `dcterms:spatial` Annotations to FeatureTypeEnum ```yaml # schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml enums: FeatureTypeEnum: permissible_values: CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION: title: City of Pittsburgh historic designation meaning: wd:Q64960148 annotations: wikidata_id: Q64960148 dcterms:spatial: "US" # ← NEW: Country restriction spatial_note: "Pittsburgh, Pennsylvania, United States" CULTURAL_HERITAGE_OF_PERU: title: cultural heritage of Peru meaning: wd:Q16617058 annotations: wikidata_id: Q16617058 dcterms:spatial: "PE" # ← NEW: Country restriction BUITENPLAATS: title: buitenplaats meaning: wd:Q2927789 annotations: wikidata_id: Q2927789 dcterms:spatial: "NL" # ← NEW: Country restriction NATIONAL_MEMORIAL_OF_THE_UNITED_STATES: title: National Memorial of the United States meaning: wd:Q1967454 annotations: wikidata_id: Q1967454 dcterms:spatial: "US" # ← NEW: Country restriction # Global feature types have NO dcterms:spatial annotation MANSION: title: mansion meaning: wd:Q1802963 annotations: wikidata_id: Q1802963 # NO dcterms:spatial - applicable globally ``` #### Step 2: Add Validation Rules to CustodianPlace Class ```yaml # schemas/20251121/linkml/modules/classes/CustodianPlace.yaml classes: CustodianPlace: class_uri: crm:E53_Place slots: - place_name - country - has_feature_type # ... other slots rules: - title: "Feature type country restriction validation" description: >- If a feature type has a dcterms:spatial annotation (country restriction), then the CustodianPlace.country MUST match that restriction. Examples: - CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION requires country.alpha_2 = "US" - CULTURAL_HERITAGE_OF_PERU requires country.alpha_2 = "PE" - BUITENPLAATS requires country.alpha_2 = "NL" Feature types WITHOUT dcterms:spatial are applicable globally. preconditions: slot_conditions: has_feature_type: # If has_feature_type is populated required: true country: # And country is populated required: true postconditions: # CUSTOM VALIDATION (requires external validator) description: >- Validate that if has_feature_type.feature_type enum value has a dcterms:spatial annotation, then country.alpha_2 MUST equal that annotation value. Pseudocode: feature_enum_value = has_feature_type.feature_type spatial_restriction = enum_annotations[feature_enum_value]['dcterms:spatial'] if spatial_restriction is not None: assert country.alpha_2 == spatial_restriction, \ f"Feature type {feature_enum_value} restricted to {spatial_restriction}, \ but CustodianPlace country is {country.alpha_2}" ``` **Limitation**: LinkML's `rules` block **cannot directly access enum annotations**. We need a **custom Python validator**. --- ### Approach 2: **Python Custom Validator** ✅ IMPLEMENTATION REQUIRED Since LinkML rules can't access enum annotations, implement a **post-validation Python script**: ```python # scripts/validate_country_restrictions.py from linkml_runtime.loaders import yaml_loader from linkml_runtime.utils.schemaview import SchemaView from linkml.validators import JsonSchemaDataValidator from typing import Dict, Optional def load_feature_type_spatial_restrictions(schema_view: SchemaView) -> Dict[str, str]: """ Extract dcterms:spatial annotations from FeatureTypeEnum permissible values. Returns: Dict mapping feature type enum key → ISO 3166-1 alpha-2 country code Example: {"CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION": "US", ...} """ restrictions = {} enum_def = schema_view.get_enum("FeatureTypeEnum") for pv_name, pv in enum_def.permissible_values.items(): if pv.annotations and "dcterms:spatial" in pv.annotations: restrictions[pv_name] = pv.annotations["dcterms:spatial"].value return restrictions def validate_custodian_place_country_restrictions( custodian_place_data: dict, spatial_restrictions: Dict[str, str] ) -> Optional[str]: """ Validate that feature types with country restrictions match CustodianPlace.country. Returns: None if valid, error message string if invalid """ # Extract feature type and country feature_place = custodian_place_data.get("has_feature_type") if not feature_place: return None # No feature type, no restriction feature_type_enum = feature_place.get("feature_type") if not feature_type_enum: return None # Check if this feature type has a country restriction required_country = spatial_restrictions.get(feature_type_enum) if not required_country: return None # No restriction, globally applicable # Get actual country country = custodian_place_data.get("country") if not country: return f"Feature type '{feature_type_enum}' requires country='{required_country}', but no country specified" # Validate country matches actual_country = country.get("alpha_2") if isinstance(country, dict) else country if actual_country != required_country: return ( f"Feature type '{feature_type_enum}' restricted to country '{required_country}', " f"but CustodianPlace.country='{actual_country}'" ) return None # Valid # Example usage if __name__ == "__main__": schema_view = SchemaView("schemas/20251121/linkml/01_custodian_name.yaml") restrictions = load_feature_type_spatial_restrictions(schema_view) # Test case 1: Invalid (Pittsburgh designation in Peru) invalid_data = { "place_name": "Lima Historic Building", "country": {"alpha_2": "PE"}, "has_feature_type": { "feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION" } } error = validate_custodian_place_country_restrictions(invalid_data, restrictions) assert error is not None, "Should detect country mismatch" print(f"❌ Validation error: {error}") # Test case 2: Valid (Pittsburgh designation in US) valid_data = { "place_name": "Pittsburgh Historic Building", "country": {"alpha_2": "US"}, "has_feature_type": { "feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION" } } error = validate_custodian_place_country_restrictions(valid_data, restrictions) assert error is None, "Should pass validation" print(f"✅ Valid: Pittsburgh designation in US") # Test case 3: Valid (MANSION has no restriction, can be anywhere) global_data = { "place_name": "Mansion in France", "country": {"alpha_2": "FR"}, "has_feature_type": { "feature_type": "MANSION" } } error = validate_custodian_place_country_restrictions(global_data, restrictions) assert error is None, "Should pass validation (global feature type)" print(f"✅ Valid: MANSION (global feature type) in France") ``` --- ## Implementation Checklist ### Phase 1: Schema Annotations ✅ START HERE - [ ] **Identify all country-specific feature types** in `FeatureTypeEnum.yaml` - Search Wikidata descriptions for country names - Examples: "City of Pittsburgh", "cultural heritage of Peru", "buitenplaats" - Use regex: `/(United States|Peru|Netherlands|Brazil|Mexico|France|Germany|etc)/i` - [ ] **Add `dcterms:spatial` annotations** to country-specific enum values - Format: `dcterms:spatial: "US"` (ISO 3166-1 alpha-2) - Add `spatial_note` for human readability: "Pittsburgh, Pennsylvania, United States" - [ ] **Document annotation semantics** in FeatureTypeEnum header ```yaml # Annotations: # dcterms:spatial - Country restriction (ISO 3166-1 alpha-2 code) # If present, feature type only applicable in specified country # If absent, feature type is globally applicable ``` ### Phase 2: Custom Validator Implementation - [ ] **Create validation script** `scripts/validate_country_restrictions.py` - Implement `load_feature_type_spatial_restrictions()` - Implement `validate_custodian_place_country_restrictions()` - Add comprehensive test cases - [ ] **Integrate with LinkML validation workflow** - Add to `linkml-validate` post-validation step - Or create standalone `validate-country-restrictions` CLI command - [ ] **Add validation tests** to test suite - Test country-restricted feature types - Test global feature types (no restriction) - Test missing country field ### Phase 3: Documentation - [ ] **Update CustodianPlace documentation** - Explain country field is required when using country-specific feature types - Link to FeatureTypeEnum country restriction annotations - [ ] **Update FeaturePlace documentation** - Explain feature type country restrictions - Provide examples of restricted vs. global feature types - [ ] **Create VALIDATION.md guide** - Document validation workflow - Provide troubleshooting guide for country restriction errors --- ## Alternative Approaches (Not Recommended) ### ❌ Approach: Split FeatureTypeEnum by Country Create separate enums: `FeatureTypeEnum_US`, `FeatureTypeEnum_NL`, etc. **Why not**: - Duplicates global feature types (MANSION exists in every country enum) - Breaks DRY principle - Hard to maintain (298 feature types → 298 × N countries) - Loses semantic clarity ### ❌ Approach: Create Country-Specific Subclasses of CustodianPlace Create `CustodianPlace_US`, `CustodianPlace_NL`, etc., each with restricted enum ranges. **Why not**: - Explosion of subclasses (one per country) - Type polymorphism issues - Hard to extend to new countries - Violates Open/Closed Principle ### ❌ Approach: Use LinkML `any_of` Conditional Range ```yaml has_feature_type: range: FeaturePlace any_of: - country.alpha_2 = "US" → feature_type in [PITTSBURGH_DESIGNATION, NATIONAL_MEMORIAL, ...] - country.alpha_2 = "PE" → feature_type in [CULTURAL_HERITAGE_OF_PERU, ...] ``` **Why not**: - LinkML `any_of` doesn't support cross-slot conditionals - Would require massive `any_of` block for every country - Unreadable and unmaintainable --- ## Rationale for Chosen Approach ### Why Annotations + Custom Validator? ✅ **Separation of Concerns**: - Schema defines **what** (data structure) - Annotations define **metadata** (country restrictions) - Validator enforces **constraints** (business rules) ✅ **Maintainability**: - Add new country-specific feature type: Just add annotation - Change restriction: Update annotation, validator logic unchanged ✅ **Flexibility**: - Easy to extend with other restrictions (e.g., `dcterms:temporal` for time periods) - Custom validators can implement complex logic ✅ **Ontology Alignment**: - `dcterms:spatial` is W3C standard property - Aligns with DBpedia and Schema.org spatial semantics ✅ **Backward Compatibility**: - Existing global feature types unaffected (no annotation = no restriction) - Gradual migration: Add annotations incrementally --- ## Next Steps 1. **Run ontology property search** to confirm `dcterms:spatial` is best choice 2. **Audit FeatureTypeEnum** to identify all country-specific values 3. **Add annotations** to schema 4. **Implement Python validator** 5. **Integrate into CI/CD** validation pipeline --- ## References ### Ontology Documentation - **Dublin Core Terms**: `data/ontology/dublin_core_elements.rdf` - `dcterms:spatial` - Geographic/jurisdictional applicability - **RiC-O**: `data/ontology/RiC-O_1-1.rdf` - `rico:hasOrHadJurisdiction` - Organizational jurisdiction - **Schema.org**: `data/ontology/schemaorg.owl` - `schema:addressCountry` - ISO 3166-1 country codes ### LinkML Documentation - **Constraints and Rules**: https://linkml.io/linkml/schemas/constraints.html - **Advanced Features**: https://linkml.io/linkml/schemas/advanced.html - **Conditional Validation Examples**: https://linkml.io/linkml/faq/modeling.html#conditional-slot-ranges ### Related Files - `schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml` - Feature type definitions - `schemas/20251121/linkml/modules/classes/CustodianPlace.yaml` - Place class with country field - `schemas/20251121/linkml/modules/classes/FeaturePlace.yaml` - Feature type classifier - `schemas/20251121/linkml/modules/classes/Country.yaml` - ISO 3166-1 country codes - `AGENTS.md` - Agent instructions (Rule 1: Ontology Files Are Your Primary Reference) --- **Status**: Ready for implementation **Priority**: Medium (nice-to-have validation, not blocking) **Estimated Effort**: 4-6 hours (annotation audit + validator + tests)