glam/COUNTRY_RESTRICTION_IMPLEMENTATION.md
kempersc 67657c39b6 feat: Complete Country Class Implementation and Hypernyms Removal
- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata.
- Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms.
- Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types.
- Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings.
- Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm.
- Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
2025-11-23 13:09:38 +01:00

489 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Country Restriction Implementation for FeatureTypeEnum
**Date**: 2025-11-22
**Status**: Implementation Plan
**Related Files**:
- `schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml`
- `schemas/20251121/linkml/modules/classes/CustodianPlace.yaml`
- `schemas/20251121/linkml/modules/classes/FeaturePlace.yaml`
- `schemas/20251121/linkml/modules/classes/Country.yaml`
---
## Problem Statement
Some feature types in `FeatureTypeEnum` are **country-specific** and should only be used when the `CustodianPlace.country` matches a specific jurisdiction:
**Examples**:
- `CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION` (Q64960148) - **US only** (Pittsburgh, Pennsylvania)
- `CULTURAL_HERITAGE_OF_PERU` (Q16617058) - **Peru only**
- `BUITENPLAATS` (Q2927789) - **Netherlands only** (Dutch country estates)
- `NATIONAL_MEMORIAL_OF_THE_UNITED_STATES` (Q1967454) - **US only**
**Current Issue**: No validation mechanism enforces country restrictions on feature type usage.
---
## Ontology Properties for Jurisdiction
### 1. **Dublin Core Terms - `dcterms:spatial`** ✅ RECOMMENDED
**Property**: `dcterms:spatial`
**Definition**: "The spatial or temporal topic of the resource, spatial applicability of the resource, or **jurisdiction under which the resource is relevant**."
**Source**: `data/ontology/dublin_core_elements.rdf`
```turtle
<dcterms:spatial>
rdfs:comment "The spatial or temporal topic of the resource, spatial applicability
of the resource, or jurisdiction under which the resource is relevant."@en
dcterms:description "Spatial topic and spatial applicability may be a named place or
a location specified by its geographic coordinates. ...
A jurisdiction may be a named administrative entity or a geographic
place to which the resource applies."@en
</dcterms:spatial>
```
**Why this is perfect**:
- ✅ Explicitly covers **"jurisdiction under which the resource is relevant"**
- ✅ Allows both named places and ISO country codes
- ✅ W3C standard, widely adopted
- ✅ Already used in DBpedia for HistoricalPeriod → Place relationships
**Example usage**:
```yaml
CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION:
meaning: wd:Q64960148
annotations:
dcterms:spatial: "US" # ISO 3166-1 alpha-2 code
```
---
### 2. **RiC-O - `rico:hasOrHadJurisdiction`** (Alternative)
**Property**: `rico:hasOrHadJurisdiction`
**Inverse**: `rico:isOrWasJurisdictionOf`
**Domain**: `rico:Agent` (organizations)
**Range**: `rico:Place`
**Source**: `data/ontology/RiC-O_1-1.rdf`
```turtle
<rico:hasOrHadJurisdiction>
rdfs:subPropertyOf rico:isAgentAssociatedWithPlace
owl:inverseOf rico:isOrWasJurisdictionOf
rdfs:domain rico:Agent
rdfs:range rico:Place
rdfs:comment "Inverse of 'is or was jurisdiction of' object relation"@en
</rico:hasOrHadJurisdiction>
```
**Why this is less suitable**:
- ⚠️ Designed for **organizational jurisdiction** (which organization has authority over which place)
- ⚠️ Not designed for **feature type geographic applicability**
- ⚠️ Domain is `Agent`, not `Feature` or `EnumValue`
**Conclusion**: Use RiC-O for organizational jurisdiction (e.g., "Netherlands National Archives has jurisdiction over Noord-Holland"), NOT for feature type restrictions.
---
### 3. **Schema.org - `schema:addressCountry`** ✅ ALREADY USED
**Property**: `schema:addressCountry`
**Range**: `schema:Country` or ISO 3166-1 alpha-2 code
**Current usage**: Already mapped in `CustodianPlace.country`:
```yaml
country:
slot_uri: schema:addressCountry
range: Country
```
**Why this works for validation**:
-`CustodianPlace.country` already uses ISO 3166-1 codes
- ✅ Can cross-reference with `dcterms:spatial` in FeatureTypeEnum
- ✅ Validation rule: "If feature_type.spatial annotation exists, CustodianPlace.country MUST match"
---
## LinkML Implementation Strategy
### Approach 1: **Annotations + Custom Validation Rules** ✅ RECOMMENDED
**Rationale**: LinkML doesn't have built-in "enum value → class field" conditional validation, so we:
1. Add `dcterms:spatial` **annotations** to country-specific enum values
2. Implement **custom validation rules** at the `CustodianPlace` class level
#### Step 1: Add `dcterms:spatial` Annotations to FeatureTypeEnum
```yaml
# schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml
enums:
FeatureTypeEnum:
permissible_values:
CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION:
title: City of Pittsburgh historic designation
meaning: wd:Q64960148
annotations:
wikidata_id: Q64960148
dcterms:spatial: "US" # ← NEW: Country restriction
spatial_note: "Pittsburgh, Pennsylvania, United States"
CULTURAL_HERITAGE_OF_PERU:
title: cultural heritage of Peru
meaning: wd:Q16617058
annotations:
wikidata_id: Q16617058
dcterms:spatial: "PE" # ← NEW: Country restriction
BUITENPLAATS:
title: buitenplaats
meaning: wd:Q2927789
annotations:
wikidata_id: Q2927789
dcterms:spatial: "NL" # ← NEW: Country restriction
NATIONAL_MEMORIAL_OF_THE_UNITED_STATES:
title: National Memorial of the United States
meaning: wd:Q1967454
annotations:
wikidata_id: Q1967454
dcterms:spatial: "US" # ← NEW: Country restriction
# Global feature types have NO dcterms:spatial annotation
MANSION:
title: mansion
meaning: wd:Q1802963
annotations:
wikidata_id: Q1802963
# NO dcterms:spatial - applicable globally
```
#### Step 2: Add Validation Rules to CustodianPlace Class
```yaml
# schemas/20251121/linkml/modules/classes/CustodianPlace.yaml
classes:
CustodianPlace:
class_uri: crm:E53_Place
slots:
- place_name
- country
- has_feature_type
# ... other slots
rules:
- title: "Feature type country restriction validation"
description: >-
If a feature type has a dcterms:spatial annotation (country restriction),
then the CustodianPlace.country MUST match that restriction.
Examples:
- CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION requires country.alpha_2 = "US"
- CULTURAL_HERITAGE_OF_PERU requires country.alpha_2 = "PE"
- BUITENPLAATS requires country.alpha_2 = "NL"
Feature types WITHOUT dcterms:spatial are applicable globally.
preconditions:
slot_conditions:
has_feature_type:
# If has_feature_type is populated
required: true
country:
# And country is populated
required: true
postconditions:
# CUSTOM VALIDATION (requires external validator)
description: >-
Validate that if has_feature_type.feature_type enum value has
a dcterms:spatial annotation, then country.alpha_2 MUST equal
that annotation value.
Pseudocode:
feature_enum_value = has_feature_type.feature_type
spatial_restriction = enum_annotations[feature_enum_value]['dcterms:spatial']
if spatial_restriction is not None:
assert country.alpha_2 == spatial_restriction, \
f"Feature type {feature_enum_value} restricted to {spatial_restriction}, \
but CustodianPlace country is {country.alpha_2}"
```
**Limitation**: LinkML's `rules` block **cannot directly access enum annotations**. We need a **custom Python validator**.
---
### Approach 2: **Python Custom Validator** ✅ IMPLEMENTATION REQUIRED
Since LinkML rules can't access enum annotations, implement a **post-validation Python script**:
```python
# scripts/validate_country_restrictions.py
from linkml_runtime.loaders import yaml_loader
from linkml_runtime.utils.schemaview import SchemaView
from linkml.validators import JsonSchemaDataValidator
from typing import Dict, Optional
def load_feature_type_spatial_restrictions(schema_view: SchemaView) -> Dict[str, str]:
"""
Extract dcterms:spatial annotations from FeatureTypeEnum permissible values.
Returns:
Dict mapping feature type enum key → ISO 3166-1 alpha-2 country code
Example: {"CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION": "US", ...}
"""
restrictions = {}
enum_def = schema_view.get_enum("FeatureTypeEnum")
for pv_name, pv in enum_def.permissible_values.items():
if pv.annotations and "dcterms:spatial" in pv.annotations:
restrictions[pv_name] = pv.annotations["dcterms:spatial"].value
return restrictions
def validate_custodian_place_country_restrictions(
custodian_place_data: dict,
spatial_restrictions: Dict[str, str]
) -> Optional[str]:
"""
Validate that feature types with country restrictions match CustodianPlace.country.
Returns:
None if valid, error message string if invalid
"""
# Extract feature type and country
feature_place = custodian_place_data.get("has_feature_type")
if not feature_place:
return None # No feature type, no restriction
feature_type_enum = feature_place.get("feature_type")
if not feature_type_enum:
return None
# Check if this feature type has a country restriction
required_country = spatial_restrictions.get(feature_type_enum)
if not required_country:
return None # No restriction, globally applicable
# Get actual country
country = custodian_place_data.get("country")
if not country:
return f"Feature type '{feature_type_enum}' requires country='{required_country}', but no country specified"
# Validate country matches
actual_country = country.get("alpha_2") if isinstance(country, dict) else country
if actual_country != required_country:
return (
f"Feature type '{feature_type_enum}' restricted to country '{required_country}', "
f"but CustodianPlace.country='{actual_country}'"
)
return None # Valid
# Example usage
if __name__ == "__main__":
schema_view = SchemaView("schemas/20251121/linkml/01_custodian_name.yaml")
restrictions = load_feature_type_spatial_restrictions(schema_view)
# Test case 1: Invalid (Pittsburgh designation in Peru)
invalid_data = {
"place_name": "Lima Historic Building",
"country": {"alpha_2": "PE"},
"has_feature_type": {
"feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION"
}
}
error = validate_custodian_place_country_restrictions(invalid_data, restrictions)
assert error is not None, "Should detect country mismatch"
print(f"❌ Validation error: {error}")
# Test case 2: Valid (Pittsburgh designation in US)
valid_data = {
"place_name": "Pittsburgh Historic Building",
"country": {"alpha_2": "US"},
"has_feature_type": {
"feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION"
}
}
error = validate_custodian_place_country_restrictions(valid_data, restrictions)
assert error is None, "Should pass validation"
print(f"✅ Valid: Pittsburgh designation in US")
# Test case 3: Valid (MANSION has no restriction, can be anywhere)
global_data = {
"place_name": "Mansion in France",
"country": {"alpha_2": "FR"},
"has_feature_type": {
"feature_type": "MANSION"
}
}
error = validate_custodian_place_country_restrictions(global_data, restrictions)
assert error is None, "Should pass validation (global feature type)"
print(f"✅ Valid: MANSION (global feature type) in France")
```
---
## Implementation Checklist
### Phase 1: Schema Annotations ✅ START HERE
- [ ] **Identify all country-specific feature types** in `FeatureTypeEnum.yaml`
- Search Wikidata descriptions for country names
- Examples: "City of Pittsburgh", "cultural heritage of Peru", "buitenplaats"
- Use regex: `/(United States|Peru|Netherlands|Brazil|Mexico|France|Germany|etc)/i`
- [ ] **Add `dcterms:spatial` annotations** to country-specific enum values
- Format: `dcterms:spatial: "US"` (ISO 3166-1 alpha-2)
- Add `spatial_note` for human readability: "Pittsburgh, Pennsylvania, United States"
- [ ] **Document annotation semantics** in FeatureTypeEnum header
```yaml
# Annotations:
# dcterms:spatial - Country restriction (ISO 3166-1 alpha-2 code)
# If present, feature type only applicable in specified country
# If absent, feature type is globally applicable
```
### Phase 2: Custom Validator Implementation
- [ ] **Create validation script** `scripts/validate_country_restrictions.py`
- Implement `load_feature_type_spatial_restrictions()`
- Implement `validate_custodian_place_country_restrictions()`
- Add comprehensive test cases
- [ ] **Integrate with LinkML validation workflow**
- Add to `linkml-validate` post-validation step
- Or create standalone `validate-country-restrictions` CLI command
- [ ] **Add validation tests** to test suite
- Test country-restricted feature types
- Test global feature types (no restriction)
- Test missing country field
### Phase 3: Documentation
- [ ] **Update CustodianPlace documentation**
- Explain country field is required when using country-specific feature types
- Link to FeatureTypeEnum country restriction annotations
- [ ] **Update FeaturePlace documentation**
- Explain feature type country restrictions
- Provide examples of restricted vs. global feature types
- [ ] **Create VALIDATION.md guide**
- Document validation workflow
- Provide troubleshooting guide for country restriction errors
---
## Alternative Approaches (Not Recommended)
### ❌ Approach: Split FeatureTypeEnum by Country
Create separate enums: `FeatureTypeEnum_US`, `FeatureTypeEnum_NL`, etc.
**Why not**:
- Duplicates global feature types (MANSION exists in every country enum)
- Breaks DRY principle
- Hard to maintain (298 feature types → 298 × N countries)
- Loses semantic clarity
### ❌ Approach: Create Country-Specific Subclasses of CustodianPlace
Create `CustodianPlace_US`, `CustodianPlace_NL`, etc., each with restricted enum ranges.
**Why not**:
- Explosion of subclasses (one per country)
- Type polymorphism issues
- Hard to extend to new countries
- Violates Open/Closed Principle
### ❌ Approach: Use LinkML `any_of` Conditional Range
```yaml
has_feature_type:
range: FeaturePlace
any_of:
- country.alpha_2 = "US" → feature_type in [PITTSBURGH_DESIGNATION, NATIONAL_MEMORIAL, ...]
- country.alpha_2 = "PE" → feature_type in [CULTURAL_HERITAGE_OF_PERU, ...]
```
**Why not**:
- LinkML `any_of` doesn't support cross-slot conditionals
- Would require massive `any_of` block for every country
- Unreadable and unmaintainable
---
## Rationale for Chosen Approach
### Why Annotations + Custom Validator?
**Separation of Concerns**:
- Schema defines **what** (data structure)
- Annotations define **metadata** (country restrictions)
- Validator enforces **constraints** (business rules)
**Maintainability**:
- Add new country-specific feature type: Just add annotation
- Change restriction: Update annotation, validator logic unchanged
**Flexibility**:
- Easy to extend with other restrictions (e.g., `dcterms:temporal` for time periods)
- Custom validators can implement complex logic
**Ontology Alignment**:
- `dcterms:spatial` is W3C standard property
- Aligns with DBpedia and Schema.org spatial semantics
**Backward Compatibility**:
- Existing global feature types unaffected (no annotation = no restriction)
- Gradual migration: Add annotations incrementally
---
## Next Steps
1. **Run ontology property search** to confirm `dcterms:spatial` is best choice
2. **Audit FeatureTypeEnum** to identify all country-specific values
3. **Add annotations** to schema
4. **Implement Python validator**
5. **Integrate into CI/CD** validation pipeline
---
## References
### Ontology Documentation
- **Dublin Core Terms**: `data/ontology/dublin_core_elements.rdf`
- `dcterms:spatial` - Geographic/jurisdictional applicability
- **RiC-O**: `data/ontology/RiC-O_1-1.rdf`
- `rico:hasOrHadJurisdiction` - Organizational jurisdiction
- **Schema.org**: `data/ontology/schemaorg.owl`
- `schema:addressCountry` - ISO 3166-1 country codes
### LinkML Documentation
- **Constraints and Rules**: https://linkml.io/linkml/schemas/constraints.html
- **Advanced Features**: https://linkml.io/linkml/schemas/advanced.html
- **Conditional Validation Examples**: https://linkml.io/linkml/faq/modeling.html#conditional-slot-ranges
### Related Files
- `schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml` - Feature type definitions
- `schemas/20251121/linkml/modules/classes/CustodianPlace.yaml` - Place class with country field
- `schemas/20251121/linkml/modules/classes/FeaturePlace.yaml` - Feature type classifier
- `schemas/20251121/linkml/modules/classes/Country.yaml` - ISO 3166-1 country codes
- `AGENTS.md` - Agent instructions (Rule 1: Ontology Files Are Your Primary Reference)
---
**Status**: Ready for implementation
**Priority**: Medium (nice-to-have validation, not blocking)
**Estimated Effort**: 4-6 hours (annotation audit + validator + tests)