glam/COUNTRY_RESTRICTION_QUICKSTART.md
kempersc 67657c39b6 feat: Complete Country Class Implementation and Hypernyms Removal
- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata.
- Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms.
- Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types.
- Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings.
- Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm.
- Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
2025-11-23 13:09:38 +01:00

5.8 KiB

Country Restriction Quick Start Guide

Goal: Ensure country-specific feature types (like "City of Pittsburgh historic designation") are only used in the correct country.


TL;DR Solution

  1. Add dcterms:spatial annotations to country-specific feature types in FeatureTypeEnum
  2. Implement Python validator to check CustodianPlace.country matches feature type restriction
  3. Integrate validator into data validation pipeline

3-Step Implementation

Step 1: Annotate Country-Specific Feature Types (15 min)

Edit schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml:

permissible_values:
  CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION:
    title: City of Pittsburgh historic designation
    meaning: wd:Q64960148
    annotations:
      wikidata_id: Q64960148
      dcterms:spatial: "US"  # ← ADD THIS
      spatial_note: "Pittsburgh, Pennsylvania, United States"
  
  CULTURAL_HERITAGE_OF_PERU:
    meaning: wd:Q16617058
    annotations:
      dcterms:spatial: "PE"  # ← ADD THIS
  
  BUITENPLAATS:
    meaning: wd:Q2927789
    annotations:
      dcterms:spatial: "NL"  # ← ADD THIS
  
  NATIONAL_MEMORIAL_OF_THE_UNITED_STATES:
    meaning: wd:Q1967454
    annotations:
      dcterms:spatial: "US"  # ← ADD THIS
  
  # Global feature types have NO dcterms:spatial
  MANSION:
    meaning: wd:Q1802963
    # No dcterms:spatial - can be used anywhere

Step 2: Create Validator Script (30 min)

Create scripts/validate_country_restrictions.py:

from linkml_runtime.utils.schemaview import SchemaView

def validate_country_restrictions(custodian_place_data: dict, schema_view: SchemaView):
    """Validate feature type country restrictions."""
    
    # Extract spatial restrictions from enum annotations
    enum_def = schema_view.get_enum("FeatureTypeEnum")
    restrictions = {}
    for pv_name, pv in enum_def.permissible_values.items():
        if pv.annotations and "dcterms:spatial" in pv.annotations:
            restrictions[pv_name] = pv.annotations["dcterms:spatial"].value
    
    # Get feature type and country from data
    feature_place = custodian_place_data.get("has_feature_type")
    if not feature_place:
        return None  # No restriction if no feature type
    
    feature_type = feature_place.get("feature_type")
    required_country = restrictions.get(feature_type)
    
    if not required_country:
        return None  # No restriction for this feature type
    
    # Check country matches
    country = custodian_place_data.get("country", {})
    actual_country = country.get("alpha_2") if isinstance(country, dict) else country
    
    if actual_country != required_country:
        return f"❌ ERROR: Feature type '{feature_type}' restricted to '{required_country}', but country is '{actual_country}'"
    
    return None  # Valid

# Test
schema = SchemaView("schemas/20251121/linkml/01_custodian_name.yaml")
test_data = {
    "place_name": "Lima Building",
    "country": {"alpha_2": "PE"},
    "has_feature_type": {"feature_type": "CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION"}
}
error = validate_country_restrictions(test_data, schema)
print(error)  # Should print error message

Step 3: Integrate Validator (15 min)

Add to data loading pipeline:

# In your data processing script
from validate_country_restrictions import validate_country_restrictions

for custodian_place in data:
    error = validate_country_restrictions(custodian_place, schema_view)
    if error:
        logger.warning(error)
        # Or raise ValidationError(error) to halt processing

Quick Test

# Create test file
cat > test_country_restriction.yaml << EOF
place_name: "Lima Historic Site"
country:
  alpha_2: "PE"
has_feature_type:
  feature_type: CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION  # Should fail
EOF

# Run validator
python scripts/validate_country_restrictions.py test_country_restriction.yaml

# Expected output:
# ❌ ERROR: Feature type 'CITY_OF_PITTSBURGH_HISTORIC_DESIGNATION' 
#    restricted to 'US', but country is 'PE'

Country-Specific Feature Types to Annotate

Search for these patterns in FeatureTypeEnum.yaml:

  • CITY_OF_PITTSBURGH_*dcterms:spatial: "US"
  • CULTURAL_HERITAGE_OF_PERUdcterms:spatial: "PE"
  • BUITENPLAATSdcterms:spatial: "NL"
  • NATIONAL_MEMORIAL_OF_THE_UNITED_STATESdcterms:spatial: "US"
  • Search descriptions for: "United States", "Peru", "Netherlands", "Brazil", etc.

Regex search:

rg "(United States|Peru|Netherlands|Brazil|Mexico|France|Germany|India|China|Japan)" \
   schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml

Why This Approach?

Ontology-aligned: Uses W3C Dublin Core dcterms:spatial property
Non-invasive: No schema restructuring needed
Maintainable: Add annotation to restrict, remove to unrestrict
Flexible: Easy to extend to other restrictions (temporal, etc.)


FAQ

Q: What if a feature type doesn't have dcterms:spatial?
A: It's globally applicable (can be used in any country).

Q: Can a feature type apply to multiple countries?
A: Not with current design. For multi-country restrictions, use:

annotations:
  dcterms:spatial: ["US", "CA"]  # List format

And update validator to check if actual_country in required_countries.

Q: What about regions (e.g., "European Union")?
A: Use ISO 3166-1 alpha-2 codes only. For regional restrictions, list all country codes.

Q: When is CustodianPlace.country required?
A: Only when has_feature_type uses a country-restricted enum value.


Complete Documentation

See COUNTRY_RESTRICTION_IMPLEMENTATION.md for:

  • Full ontology property analysis
  • Alternative approaches considered
  • Detailed implementation steps
  • Python validator code with tests

Status: Ready to implement
Time: ~1 hour total
Priority: Medium (validation enhancement, not blocking)