- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata. - Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms. - Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types. - Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings. - Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm. - Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
199 lines
5.8 KiB
Markdown
199 lines
5.8 KiB
Markdown
# Country Restriction Quick Start Guide
|
|
|
|
**Goal**: Ensure country-specific feature types (like "City of Pittsburgh historic designation") are only used in the correct country.
|
|
|
|
---
|
|
|
|
## TL;DR Solution
|
|
|
|
1. **Add `dcterms:spatial` annotations** to country-specific feature types in FeatureTypeEnum
|
|
2. **Implement Python validator** to check CustodianPlace.country matches feature type restriction
|
|
3. **Integrate validator** into data validation pipeline
|
|
|
|
---
|
|
|
|
## 3-Step Implementation
|
|
|
|
### Step 1: Annotate Country-Specific Feature Types (15 min)
|
|
|
|
Edit `schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml`:
|
|
|
|
```yaml
|
|
permissible_values:
|
|
CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION:
|
|
title: City of Pittsburgh historic designation
|
|
meaning: wd:Q64960148
|
|
annotations:
|
|
wikidata_id: Q64960148
|
|
dcterms:spatial: "US" # ← ADD THIS
|
|
spatial_note: "Pittsburgh, Pennsylvania, United States"
|
|
|
|
CULTURAL_HERITAGE_OF_PERU:
|
|
meaning: wd:Q16617058
|
|
annotations:
|
|
dcterms:spatial: "PE" # ← ADD THIS
|
|
|
|
BUITENPLAATS:
|
|
meaning: wd:Q2927789
|
|
annotations:
|
|
dcterms:spatial: "NL" # ← ADD THIS
|
|
|
|
NATIONAL_MEMORIAL_OF_THE_UNITED_STATES:
|
|
meaning: wd:Q1967454
|
|
annotations:
|
|
dcterms:spatial: "US" # ← ADD THIS
|
|
|
|
# Global feature types have NO dcterms:spatial
|
|
MANSION:
|
|
meaning: wd:Q1802963
|
|
# No dcterms:spatial - can be used anywhere
|
|
```
|
|
|
|
### Step 2: Create Validator Script (30 min)
|
|
|
|
Create `scripts/validate_country_restrictions.py`:
|
|
|
|
```python
|
|
from linkml_runtime.utils.schemaview import SchemaView
|
|
|
|
def validate_country_restrictions(custodian_place_data: dict, schema_view: SchemaView):
|
|
"""Validate feature type country restrictions."""
|
|
|
|
# Extract spatial restrictions from enum annotations
|
|
enum_def = schema_view.get_enum("FeatureTypeEnum")
|
|
restrictions = {}
|
|
for pv_name, pv in enum_def.permissible_values.items():
|
|
if pv.annotations and "dcterms:spatial" in pv.annotations:
|
|
restrictions[pv_name] = pv.annotations["dcterms:spatial"].value
|
|
|
|
# Get feature type and country from data
|
|
feature_place = custodian_place_data.get("has_feature_type")
|
|
if not feature_place:
|
|
return None # No restriction if no feature type
|
|
|
|
feature_type = feature_place.get("feature_type")
|
|
required_country = restrictions.get(feature_type)
|
|
|
|
if not required_country:
|
|
return None # No restriction for this feature type
|
|
|
|
# Check country matches
|
|
country = custodian_place_data.get("country", {})
|
|
actual_country = country.get("alpha_2") if isinstance(country, dict) else country
|
|
|
|
if actual_country != required_country:
|
|
return f"❌ ERROR: Feature type '{feature_type}' restricted to '{required_country}', but country is '{actual_country}'"
|
|
|
|
return None # Valid
|
|
|
|
# Test
|
|
schema = SchemaView("schemas/20251121/linkml/01_custodian_name.yaml")
|
|
test_data = {
|
|
"place_name": "Lima Building",
|
|
"country": {"alpha_2": "PE"},
|
|
"has_feature_type": {"feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION"}
|
|
}
|
|
error = validate_country_restrictions(test_data, schema)
|
|
print(error) # Should print error message
|
|
```
|
|
|
|
### Step 3: Integrate Validator (15 min)
|
|
|
|
Add to data loading pipeline:
|
|
|
|
```python
|
|
# In your data processing script
|
|
from validate_country_restrictions import validate_country_restrictions
|
|
|
|
for custodian_place in data:
|
|
error = validate_country_restrictions(custodian_place, schema_view)
|
|
if error:
|
|
logger.warning(error)
|
|
# Or raise ValidationError(error) to halt processing
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Test
|
|
|
|
```bash
|
|
# Create test file
|
|
cat > test_country_restriction.yaml << EOF
|
|
place_name: "Lima Historic Site"
|
|
country:
|
|
alpha_2: "PE"
|
|
has_feature_type:
|
|
feature_type: CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION # Should fail
|
|
EOF
|
|
|
|
# Run validator
|
|
python scripts/validate_country_restrictions.py test_country_restriction.yaml
|
|
|
|
# Expected output:
|
|
# ❌ ERROR: Feature type 'CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION'
|
|
# restricted to 'US', but country is 'PE'
|
|
```
|
|
|
|
---
|
|
|
|
## Country-Specific Feature Types to Annotate
|
|
|
|
**Search for these patterns in FeatureTypeEnum.yaml**:
|
|
|
|
- `CITY_OF_PITTSBURGH_*` → `dcterms:spatial: "US"`
|
|
- `CULTURAL_HERITAGE_OF_PERU` → `dcterms:spatial: "PE"`
|
|
- `BUITENPLAATS` → `dcterms:spatial: "NL"`
|
|
- `NATIONAL_MEMORIAL_OF_THE_UNITED_STATES` → `dcterms:spatial: "US"`
|
|
- Search descriptions for: "United States", "Peru", "Netherlands", "Brazil", etc.
|
|
|
|
**Regex search**:
|
|
```bash
|
|
rg "(United States|Peru|Netherlands|Brazil|Mexico|France|Germany|India|China|Japan)" \
|
|
schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml
|
|
```
|
|
|
|
---
|
|
|
|
## Why This Approach?
|
|
|
|
✅ **Ontology-aligned**: Uses W3C Dublin Core `dcterms:spatial` property
|
|
✅ **Non-invasive**: No schema restructuring needed
|
|
✅ **Maintainable**: Add annotation to restrict, remove to unrestrict
|
|
✅ **Flexible**: Easy to extend to other restrictions (temporal, etc.)
|
|
|
|
---
|
|
|
|
## FAQ
|
|
|
|
**Q: What if a feature type doesn't have `dcterms:spatial`?**
|
|
A: It's globally applicable (can be used in any country).
|
|
|
|
**Q: Can a feature type apply to multiple countries?**
|
|
A: Not with current design. For multi-country restrictions, use:
|
|
```yaml
|
|
annotations:
|
|
dcterms:spatial: ["US", "CA"] # List format
|
|
```
|
|
And update validator to check `if actual_country in required_countries`.
|
|
|
|
**Q: What about regions (e.g., "European Union")?**
|
|
A: Use ISO 3166-1 alpha-2 codes only. For regional restrictions, list all country codes.
|
|
|
|
**Q: When is `CustodianPlace.country` required?**
|
|
A: Only when `has_feature_type` uses a country-restricted enum value.
|
|
|
|
---
|
|
|
|
## Complete Documentation
|
|
|
|
See `COUNTRY_RESTRICTION_IMPLEMENTATION.md` for:
|
|
- Full ontology property analysis
|
|
- Alternative approaches considered
|
|
- Detailed implementation steps
|
|
- Python validator code with tests
|
|
|
|
---
|
|
|
|
**Status**: Ready to implement
|
|
**Time**: ~1 hour total
|
|
**Priority**: Medium (validation enhancement, not blocking)
|