- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata. - Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms. - Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types. - Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings. - Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm. - Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
489 lines
17 KiB
Markdown
489 lines
17 KiB
Markdown
# Country Restriction Implementation for FeatureTypeEnum
|
||
|
||
**Date**: 2025-11-22
|
||
**Status**: Implementation Plan
|
||
**Related Files**:
|
||
- `schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml`
|
||
- `schemas/20251121/linkml/modules/classes/CustodianPlace.yaml`
|
||
- `schemas/20251121/linkml/modules/classes/FeaturePlace.yaml`
|
||
- `schemas/20251121/linkml/modules/classes/Country.yaml`
|
||
|
||
---
|
||
|
||
## Problem Statement
|
||
|
||
Some feature types in `FeatureTypeEnum` are **country-specific** and should only be used when the `CustodianPlace.country` matches a specific jurisdiction:
|
||
|
||
**Examples**:
|
||
- `CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION` (Q64960148) - **US only** (Pittsburgh, Pennsylvania)
|
||
- `CULTURAL_HERITAGE_OF_PERU` (Q16617058) - **Peru only**
|
||
- `BUITENPLAATS` (Q2927789) - **Netherlands only** (Dutch country estates)
|
||
- `NATIONAL_MEMORIAL_OF_THE_UNITED_STATES` (Q1967454) - **US only**
|
||
|
||
**Current Issue**: No validation mechanism enforces country restrictions on feature type usage.
|
||
|
||
---
|
||
|
||
## Ontology Properties for Jurisdiction
|
||
|
||
### 1. **Dublin Core Terms - `dcterms:spatial`** ✅ RECOMMENDED
|
||
|
||
**Property**: `dcterms:spatial`
|
||
**Definition**: "The spatial or temporal topic of the resource, spatial applicability of the resource, or **jurisdiction under which the resource is relevant**."
|
||
|
||
**Source**: `data/ontology/dublin_core_elements.rdf`
|
||
|
||
```turtle
|
||
<dcterms:spatial>
|
||
rdfs:comment "The spatial or temporal topic of the resource, spatial applicability
|
||
of the resource, or jurisdiction under which the resource is relevant."@en
|
||
dcterms:description "Spatial topic and spatial applicability may be a named place or
|
||
a location specified by its geographic coordinates. ...
|
||
A jurisdiction may be a named administrative entity or a geographic
|
||
place to which the resource applies."@en
|
||
</dcterms:spatial>
|
||
```
|
||
|
||
**Why this is perfect**:
|
||
- ✅ Explicitly covers **"jurisdiction under which the resource is relevant"**
|
||
- ✅ Allows both named places and ISO country codes
|
||
- ✅ W3C standard, widely adopted
|
||
- ✅ Already used in DBpedia for HistoricalPeriod → Place relationships
|
||
|
||
**Example usage**:
|
||
```yaml
|
||
CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION:
|
||
meaning: wd:Q64960148
|
||
annotations:
|
||
dcterms:spatial: "US" # ISO 3166-1 alpha-2 code
|
||
```
|
||
|
||
---
|
||
|
||
### 2. **RiC-O - `rico:hasOrHadJurisdiction`** (Alternative)
|
||
|
||
**Property**: `rico:hasOrHadJurisdiction`
|
||
**Inverse**: `rico:isOrWasJurisdictionOf`
|
||
**Domain**: `rico:Agent` (organizations)
|
||
**Range**: `rico:Place`
|
||
|
||
**Source**: `data/ontology/RiC-O_1-1.rdf`
|
||
|
||
```turtle
|
||
<rico:hasOrHadJurisdiction>
|
||
rdfs:subPropertyOf rico:isAgentAssociatedWithPlace
|
||
owl:inverseOf rico:isOrWasJurisdictionOf
|
||
rdfs:domain rico:Agent
|
||
rdfs:range rico:Place
|
||
rdfs:comment "Inverse of 'is or was jurisdiction of' object relation"@en
|
||
</rico:hasOrHadJurisdiction>
|
||
```
|
||
|
||
**Why this is less suitable**:
|
||
- ⚠️ Designed for **organizational jurisdiction** (which organization has authority over which place)
|
||
- ⚠️ Not designed for **feature type geographic applicability**
|
||
- ⚠️ Domain is `Agent`, not `Feature` or `EnumValue`
|
||
|
||
**Conclusion**: Use RiC-O for organizational jurisdiction (e.g., "Netherlands National Archives has jurisdiction over Noord-Holland"), NOT for feature type restrictions.
|
||
|
||
---
|
||
|
||
### 3. **Schema.org - `schema:addressCountry`** ✅ ALREADY USED
|
||
|
||
**Property**: `schema:addressCountry`
|
||
**Range**: `schema:Country` or ISO 3166-1 alpha-2 code
|
||
|
||
**Current usage**: Already mapped in `CustodianPlace.country`:
|
||
```yaml
|
||
country:
|
||
slot_uri: schema:addressCountry
|
||
range: Country
|
||
```
|
||
|
||
**Why this works for validation**:
|
||
- ✅ `CustodianPlace.country` already uses ISO 3166-1 codes
|
||
- ✅ Can cross-reference with `dcterms:spatial` in FeatureTypeEnum
|
||
- ✅ Validation rule: "If feature_type.spatial annotation exists, CustodianPlace.country MUST match"
|
||
|
||
---
|
||
|
||
## LinkML Implementation Strategy
|
||
|
||
### Approach 1: **Annotations + Custom Validation Rules** ✅ RECOMMENDED
|
||
|
||
**Rationale**: LinkML doesn't have built-in "enum value → class field" conditional validation, so we:
|
||
1. Add `dcterms:spatial` **annotations** to country-specific enum values
|
||
2. Implement **custom validation rules** at the `CustodianPlace` class level
|
||
|
||
#### Step 1: Add `dcterms:spatial` Annotations to FeatureTypeEnum
|
||
|
||
```yaml
|
||
# schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml
|
||
|
||
enums:
|
||
FeatureTypeEnum:
|
||
permissible_values:
|
||
CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION:
|
||
title: City of Pittsburgh historic designation
|
||
meaning: wd:Q64960148
|
||
annotations:
|
||
wikidata_id: Q64960148
|
||
dcterms:spatial: "US" # ← NEW: Country restriction
|
||
spatial_note: "Pittsburgh, Pennsylvania, United States"
|
||
|
||
CULTURAL_HERITAGE_OF_PERU:
|
||
title: cultural heritage of Peru
|
||
meaning: wd:Q16617058
|
||
annotations:
|
||
wikidata_id: Q16617058
|
||
dcterms:spatial: "PE" # ← NEW: Country restriction
|
||
|
||
BUITENPLAATS:
|
||
title: buitenplaats
|
||
meaning: wd:Q2927789
|
||
annotations:
|
||
wikidata_id: Q2927789
|
||
dcterms:spatial: "NL" # ← NEW: Country restriction
|
||
|
||
NATIONAL_MEMORIAL_OF_THE_UNITED_STATES:
|
||
title: National Memorial of the United States
|
||
meaning: wd:Q1967454
|
||
annotations:
|
||
wikidata_id: Q1967454
|
||
dcterms:spatial: "US" # ← NEW: Country restriction
|
||
|
||
# Global feature types have NO dcterms:spatial annotation
|
||
MANSION:
|
||
title: mansion
|
||
meaning: wd:Q1802963
|
||
annotations:
|
||
wikidata_id: Q1802963
|
||
# NO dcterms:spatial - applicable globally
|
||
```
|
||
|
||
#### Step 2: Add Validation Rules to CustodianPlace Class
|
||
|
||
```yaml
|
||
# schemas/20251121/linkml/modules/classes/CustodianPlace.yaml
|
||
|
||
classes:
|
||
CustodianPlace:
|
||
class_uri: crm:E53_Place
|
||
slots:
|
||
- place_name
|
||
- country
|
||
- has_feature_type
|
||
# ... other slots
|
||
|
||
rules:
|
||
- title: "Feature type country restriction validation"
|
||
description: >-
|
||
If a feature type has a dcterms:spatial annotation (country restriction),
|
||
then the CustodianPlace.country MUST match that restriction.
|
||
|
||
Examples:
|
||
- CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION requires country.alpha_2 = "US"
|
||
- CULTURAL_HERITAGE_OF_PERU requires country.alpha_2 = "PE"
|
||
- BUITENPLAATS requires country.alpha_2 = "NL"
|
||
|
||
Feature types WITHOUT dcterms:spatial are applicable globally.
|
||
|
||
preconditions:
|
||
slot_conditions:
|
||
has_feature_type:
|
||
# If has_feature_type is populated
|
||
required: true
|
||
country:
|
||
# And country is populated
|
||
required: true
|
||
|
||
postconditions:
|
||
# CUSTOM VALIDATION (requires external validator)
|
||
description: >-
|
||
Validate that if has_feature_type.feature_type enum value has
|
||
a dcterms:spatial annotation, then country.alpha_2 MUST equal
|
||
that annotation value.
|
||
|
||
Pseudocode:
|
||
feature_enum_value = has_feature_type.feature_type
|
||
spatial_restriction = enum_annotations[feature_enum_value]['dcterms:spatial']
|
||
|
||
if spatial_restriction is not None:
|
||
assert country.alpha_2 == spatial_restriction, \
|
||
f"Feature type {feature_enum_value} restricted to {spatial_restriction}, \
|
||
but CustodianPlace country is {country.alpha_2}"
|
||
```
|
||
|
||
**Limitation**: LinkML's `rules` block **cannot directly access enum annotations**. We need a **custom Python validator**.
|
||
|
||
---
|
||
|
||
### Approach 2: **Python Custom Validator** ✅ IMPLEMENTATION REQUIRED
|
||
|
||
Since LinkML rules can't access enum annotations, implement a **post-validation Python script**:
|
||
|
||
```python
|
||
# scripts/validate_country_restrictions.py
|
||
|
||
from linkml_runtime.loaders import yaml_loader
|
||
from linkml_runtime.utils.schemaview import SchemaView
|
||
from linkml.validators import JsonSchemaDataValidator
|
||
from typing import Dict, Optional
|
||
|
||
def load_feature_type_spatial_restrictions(schema_view: SchemaView) -> Dict[str, str]:
|
||
"""
|
||
Extract dcterms:spatial annotations from FeatureTypeEnum permissible values.
|
||
|
||
Returns:
|
||
Dict mapping feature type enum key → ISO 3166-1 alpha-2 country code
|
||
Example: {"CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION": "US", ...}
|
||
"""
|
||
restrictions = {}
|
||
|
||
enum_def = schema_view.get_enum("FeatureTypeEnum")
|
||
for pv_name, pv in enum_def.permissible_values.items():
|
||
if pv.annotations and "dcterms:spatial" in pv.annotations:
|
||
restrictions[pv_name] = pv.annotations["dcterms:spatial"].value
|
||
|
||
return restrictions
|
||
|
||
def validate_custodian_place_country_restrictions(
|
||
custodian_place_data: dict,
|
||
spatial_restrictions: Dict[str, str]
|
||
) -> Optional[str]:
|
||
"""
|
||
Validate that feature types with country restrictions match CustodianPlace.country.
|
||
|
||
Returns:
|
||
None if valid, error message string if invalid
|
||
"""
|
||
# Extract feature type and country
|
||
feature_place = custodian_place_data.get("has_feature_type")
|
||
if not feature_place:
|
||
return None # No feature type, no restriction
|
||
|
||
feature_type_enum = feature_place.get("feature_type")
|
||
if not feature_type_enum:
|
||
return None
|
||
|
||
# Check if this feature type has a country restriction
|
||
required_country = spatial_restrictions.get(feature_type_enum)
|
||
if not required_country:
|
||
return None # No restriction, globally applicable
|
||
|
||
# Get actual country
|
||
country = custodian_place_data.get("country")
|
||
if not country:
|
||
return f"Feature type '{feature_type_enum}' requires country='{required_country}', but no country specified"
|
||
|
||
# Validate country matches
|
||
actual_country = country.get("alpha_2") if isinstance(country, dict) else country
|
||
|
||
if actual_country != required_country:
|
||
return (
|
||
f"Feature type '{feature_type_enum}' restricted to country '{required_country}', "
|
||
f"but CustodianPlace.country='{actual_country}'"
|
||
)
|
||
|
||
return None # Valid
|
||
|
||
# Example usage
|
||
if __name__ == "__main__":
|
||
schema_view = SchemaView("schemas/20251121/linkml/01_custodian_name.yaml")
|
||
restrictions = load_feature_type_spatial_restrictions(schema_view)
|
||
|
||
# Test case 1: Invalid (Pittsburgh designation in Peru)
|
||
invalid_data = {
|
||
"place_name": "Lima Historic Building",
|
||
"country": {"alpha_2": "PE"},
|
||
"has_feature_type": {
|
||
"feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION"
|
||
}
|
||
}
|
||
error = validate_custodian_place_country_restrictions(invalid_data, restrictions)
|
||
assert error is not None, "Should detect country mismatch"
|
||
print(f"❌ Validation error: {error}")
|
||
|
||
# Test case 2: Valid (Pittsburgh designation in US)
|
||
valid_data = {
|
||
"place_name": "Pittsburgh Historic Building",
|
||
"country": {"alpha_2": "US"},
|
||
"has_feature_type": {
|
||
"feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION"
|
||
}
|
||
}
|
||
error = validate_custodian_place_country_restrictions(valid_data, restrictions)
|
||
assert error is None, "Should pass validation"
|
||
print(f"✅ Valid: Pittsburgh designation in US")
|
||
|
||
# Test case 3: Valid (MANSION has no restriction, can be anywhere)
|
||
global_data = {
|
||
"place_name": "Mansion in France",
|
||
"country": {"alpha_2": "FR"},
|
||
"has_feature_type": {
|
||
"feature_type": "MANSION"
|
||
}
|
||
}
|
||
error = validate_custodian_place_country_restrictions(global_data, restrictions)
|
||
assert error is None, "Should pass validation (global feature type)"
|
||
print(f"✅ Valid: MANSION (global feature type) in France")
|
||
```
|
||
|
||
---
|
||
|
||
## Implementation Checklist
|
||
|
||
### Phase 1: Schema Annotations ✅ START HERE
|
||
|
||
- [ ] **Identify all country-specific feature types** in `FeatureTypeEnum.yaml`
|
||
- Search Wikidata descriptions for country names
|
||
- Examples: "City of Pittsburgh", "cultural heritage of Peru", "buitenplaats"
|
||
- Use regex: `/(United States|Peru|Netherlands|Brazil|Mexico|France|Germany|etc)/i`
|
||
|
||
- [ ] **Add `dcterms:spatial` annotations** to country-specific enum values
|
||
- Format: `dcterms:spatial: "US"` (ISO 3166-1 alpha-2)
|
||
- Add `spatial_note` for human readability: "Pittsburgh, Pennsylvania, United States"
|
||
|
||
- [ ] **Document annotation semantics** in FeatureTypeEnum header
|
||
```yaml
|
||
# Annotations:
|
||
# dcterms:spatial - Country restriction (ISO 3166-1 alpha-2 code)
|
||
# If present, feature type only applicable in specified country
|
||
# If absent, feature type is globally applicable
|
||
```
|
||
|
||
### Phase 2: Custom Validator Implementation
|
||
|
||
- [ ] **Create validation script** `scripts/validate_country_restrictions.py`
|
||
- Implement `load_feature_type_spatial_restrictions()`
|
||
- Implement `validate_custodian_place_country_restrictions()`
|
||
- Add comprehensive test cases
|
||
|
||
- [ ] **Integrate with LinkML validation workflow**
|
||
- Add to `linkml-validate` post-validation step
|
||
- Or create standalone `validate-country-restrictions` CLI command
|
||
|
||
- [ ] **Add validation tests** to test suite
|
||
- Test country-restricted feature types
|
||
- Test global feature types (no restriction)
|
||
- Test missing country field
|
||
|
||
### Phase 3: Documentation
|
||
|
||
- [ ] **Update CustodianPlace documentation**
|
||
- Explain country field is required when using country-specific feature types
|
||
- Link to FeatureTypeEnum country restriction annotations
|
||
|
||
- [ ] **Update FeaturePlace documentation**
|
||
- Explain feature type country restrictions
|
||
- Provide examples of restricted vs. global feature types
|
||
|
||
- [ ] **Create VALIDATION.md guide**
|
||
- Document validation workflow
|
||
- Provide troubleshooting guide for country restriction errors
|
||
|
||
---
|
||
|
||
## Alternative Approaches (Not Recommended)
|
||
|
||
### ❌ Approach: Split FeatureTypeEnum by Country
|
||
|
||
Create separate enums: `FeatureTypeEnum_US`, `FeatureTypeEnum_NL`, etc.
|
||
|
||
**Why not**:
|
||
- Duplicates global feature types (MANSION exists in every country enum)
|
||
- Breaks DRY principle
|
||
- Hard to maintain (298 feature types → 298 × N countries)
|
||
- Loses semantic clarity
|
||
|
||
### ❌ Approach: Create Country-Specific Subclasses of CustodianPlace
|
||
|
||
Create `CustodianPlace_US`, `CustodianPlace_NL`, etc., each with restricted enum ranges.
|
||
|
||
**Why not**:
|
||
- Explosion of subclasses (one per country)
|
||
- Type polymorphism issues
|
||
- Hard to extend to new countries
|
||
- Violates Open/Closed Principle
|
||
|
||
### ❌ Approach: Use LinkML `any_of` Conditional Range
|
||
|
||
```yaml
|
||
has_feature_type:
|
||
range: FeaturePlace
|
||
any_of:
|
||
- country.alpha_2 = "US" → feature_type in [PITTSBURGH_DESIGNATION, NATIONAL_MEMORIAL, ...]
|
||
- country.alpha_2 = "PE" → feature_type in [CULTURAL_HERITAGE_OF_PERU, ...]
|
||
```
|
||
|
||
**Why not**:
|
||
- LinkML `any_of` doesn't support cross-slot conditionals
|
||
- Would require massive `any_of` block for every country
|
||
- Unreadable and unmaintainable
|
||
|
||
---
|
||
|
||
## Rationale for Chosen Approach
|
||
|
||
### Why Annotations + Custom Validator?
|
||
|
||
✅ **Separation of Concerns**:
|
||
- Schema defines **what** (data structure)
|
||
- Annotations define **metadata** (country restrictions)
|
||
- Validator enforces **constraints** (business rules)
|
||
|
||
✅ **Maintainability**:
|
||
- Add new country-specific feature type: Just add annotation
|
||
- Change restriction: Update annotation, validator logic unchanged
|
||
|
||
✅ **Flexibility**:
|
||
- Easy to extend with other restrictions (e.g., `dcterms:temporal` for time periods)
|
||
- Custom validators can implement complex logic
|
||
|
||
✅ **Ontology Alignment**:
|
||
- `dcterms:spatial` is W3C standard property
|
||
- Aligns with DBpedia and Schema.org spatial semantics
|
||
|
||
✅ **Backward Compatibility**:
|
||
- Existing global feature types unaffected (no annotation = no restriction)
|
||
- Gradual migration: Add annotations incrementally
|
||
|
||
---
|
||
|
||
## Next Steps
|
||
|
||
1. **Run ontology property search** to confirm `dcterms:spatial` is best choice
|
||
2. **Audit FeatureTypeEnum** to identify all country-specific values
|
||
3. **Add annotations** to schema
|
||
4. **Implement Python validator**
|
||
5. **Integrate into CI/CD** validation pipeline
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
### Ontology Documentation
|
||
- **Dublin Core Terms**: `data/ontology/dublin_core_elements.rdf`
|
||
- `dcterms:spatial` - Geographic/jurisdictional applicability
|
||
- **RiC-O**: `data/ontology/RiC-O_1-1.rdf`
|
||
- `rico:hasOrHadJurisdiction` - Organizational jurisdiction
|
||
- **Schema.org**: `data/ontology/schemaorg.owl`
|
||
- `schema:addressCountry` - ISO 3166-1 country codes
|
||
|
||
### LinkML Documentation
|
||
- **Constraints and Rules**: https://linkml.io/linkml/schemas/constraints.html
|
||
- **Advanced Features**: https://linkml.io/linkml/schemas/advanced.html
|
||
- **Conditional Validation Examples**: https://linkml.io/linkml/faq/modeling.html#conditional-slot-ranges
|
||
|
||
### Related Files
|
||
- `schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml` - Feature type definitions
|
||
- `schemas/20251121/linkml/modules/classes/CustodianPlace.yaml` - Place class with country field
|
||
- `schemas/20251121/linkml/modules/classes/FeaturePlace.yaml` - Feature type classifier
|
||
- `schemas/20251121/linkml/modules/classes/Country.yaml` - ISO 3166-1 country codes
|
||
- `AGENTS.md` - Agent instructions (Rule 1: Ontology Files Are Your Primary Reference)
|
||
|
||
---
|
||
|
||
**Status**: Ready for implementation
|
||
**Priority**: Medium (nice-to-have validation, not blocking)
|
||
**Estimated Effort**: 4-6 hours (annotation audit + validator + tests)
|