- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata. - Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms. - Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types. - Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings. - Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm. - Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
5.8 KiB
Country Restriction Quick Start Guide
Goal: Ensure country-specific feature types (like "City of Pittsburgh historic designation") are only used in the correct country.
TL;DR Solution
- Add
dcterms:spatialannotations to country-specific feature types in FeatureTypeEnum - Implement Python validator to check CustodianPlace.country matches feature type restriction
- Integrate validator into data validation pipeline
3-Step Implementation
Step 1: Annotate Country-Specific Feature Types (15 min)
Edit schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml:
permissible_values:
CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION:
title: City of Pittsburgh historic designation
meaning: wd:Q64960148
annotations:
wikidata_id: Q64960148
dcterms:spatial: "US" # ← ADD THIS
spatial_note: "Pittsburgh, Pennsylvania, United States"
CULTURAL_HERITAGE_OF_PERU:
meaning: wd:Q16617058
annotations:
dcterms:spatial: "PE" # ← ADD THIS
BUITENPLAATS:
meaning: wd:Q2927789
annotations:
dcterms:spatial: "NL" # ← ADD THIS
NATIONAL_MEMORIAL_OF_THE_UNITED_STATES:
meaning: wd:Q1967454
annotations:
dcterms:spatial: "US" # ← ADD THIS
# Global feature types have NO dcterms:spatial
MANSION:
meaning: wd:Q1802963
# No dcterms:spatial - can be used anywhere
Step 2: Create Validator Script (30 min)
Create scripts/validate_country_restrictions.py:
from linkml_runtime.utils.schemaview import SchemaView
def validate_country_restrictions(custodian_place_data: dict, schema_view: SchemaView):
"""Validate feature type country restrictions."""
# Extract spatial restrictions from enum annotations
enum_def = schema_view.get_enum("FeatureTypeEnum")
restrictions = {}
for pv_name, pv in enum_def.permissible_values.items():
if pv.annotations and "dcterms:spatial" in pv.annotations:
restrictions[pv_name] = pv.annotations["dcterms:spatial"].value
# Get feature type and country from data
feature_place = custodian_place_data.get("has_feature_type")
if not feature_place:
return None # No restriction if no feature type
feature_type = feature_place.get("feature_type")
required_country = restrictions.get(feature_type)
if not required_country:
return None # No restriction for this feature type
# Check country matches
country = custodian_place_data.get("country", {})
actual_country = country.get("alpha_2") if isinstance(country, dict) else country
if actual_country != required_country:
return f"❌ ERROR: Feature type '{feature_type}' restricted to '{required_country}', but country is '{actual_country}'"
return None # Valid
# Test
schema = SchemaView("schemas/20251121/linkml/01_custodian_name.yaml")
test_data = {
"place_name": "Lima Building",
"country": {"alpha_2": "PE"},
"has_feature_type": {"feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION"}
}
error = validate_country_restrictions(test_data, schema)
print(error) # Should print error message
Step 3: Integrate Validator (15 min)
Add to data loading pipeline:
# In your data processing script
from validate_country_restrictions import validate_country_restrictions
for custodian_place in data:
error = validate_country_restrictions(custodian_place, schema_view)
if error:
logger.warning(error)
# Or raise ValidationError(error) to halt processing
Quick Test
# Create test file
cat > test_country_restriction.yaml << EOF
place_name: "Lima Historic Site"
country:
alpha_2: "PE"
has_feature_type:
feature_type: CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION # Should fail
EOF
# Run validator
python scripts/validate_country_restrictions.py test_country_restriction.yaml
# Expected output:
# ❌ ERROR: Feature type 'CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION'
# restricted to 'US', but country is 'PE'
Country-Specific Feature Types to Annotate
Search for these patterns in FeatureTypeEnum.yaml:
CITY_OF_PITTSBURGH_*→dcterms:spatial: "US"CULTURAL_HERITAGE_OF_PERU→dcterms:spatial: "PE"BUITENPLAATS→dcterms:spatial: "NL"NATIONAL_MEMORIAL_OF_THE_UNITED_STATES→dcterms:spatial: "US"- Search descriptions for: "United States", "Peru", "Netherlands", "Brazil", etc.
Regex search:
rg "(United States|Peru|Netherlands|Brazil|Mexico|France|Germany|India|China|Japan)" \
schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml
Why This Approach?
✅ Ontology-aligned: Uses W3C Dublin Core dcterms:spatial property
✅ Non-invasive: No schema restructuring needed
✅ Maintainable: Add annotation to restrict, remove to unrestrict
✅ Flexible: Easy to extend to other restrictions (temporal, etc.)
FAQ
Q: What if a feature type doesn't have dcterms:spatial?
A: It's globally applicable (can be used in any country).
Q: Can a feature type apply to multiple countries?
A: Not with current design. For multi-country restrictions, use:
annotations:
dcterms:spatial: ["US", "CA"] # List format
And update validator to check if actual_country in required_countries.
Q: What about regions (e.g., "European Union")?
A: Use ISO 3166-1 alpha-2 codes only. For regional restrictions, list all country codes.
Q: When is CustodianPlace.country required?
A: Only when has_feature_type uses a country-restricted enum value.
Complete Documentation
See COUNTRY_RESTRICTION_IMPLEMENTATION.md for:
- Full ontology property analysis
- Alternative approaches considered
- Detailed implementation steps
- Python validator code with tests
Status: Ready to implement
Time: ~1 hour total
Priority: Medium (validation enhancement, not blocking)