glam/COUNTRY_CLASS_IMPLEMENTATION_COMPLETE.md
kempersc 67657c39b6 feat: Complete Country Class Implementation and Hypernyms Removal
- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata.
- Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms.
- Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types.
- Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings.
- Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm.
- Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
2025-11-23 13:09:38 +01:00

12 KiB

Country Class Implementation - Complete

Date: 2025-11-22
Session: Continuation of FeaturePlace implementation


Summary

Successfully created the Country class to handle country-specific feature types and legal forms.

What Was Completed

1. Fixed Duplicate Keys in FeatureTypeEnum.yaml

  • Problem: YAML had 2 duplicate keys (RESIDENTIAL_BUILDING, SANCTUARY)
  • Resolution:
    • Removed duplicate RESIDENTIAL_BUILDING (Q11755880) at lines 3692-3714
    • Kept Catholic-specific SANCTUARY (Q21850178 "shrine of the Catholic Church")
    • Removed generic SANCTUARY (Q29553 "sacred place")
  • Result: 294 unique feature types (down from 296 with duplicates)

2. Created Country Class

File: schemas/20251121/linkml/modules/classes/Country.yaml

Design Philosophy:

  • Minimal design: ONLY ISO 3166-1 alpha-2 and alpha-3 codes
  • No other metadata: No country names, languages, capitals, regions
  • Rationale: ISO codes are authoritative, stable, language-neutral identifiers
  • All other country metadata should be resolved via external services (GeoNames, UN M49)

Schema:

Country:
  description: Country identified by ISO 3166-1 codes
  slots:
    - alpha_2  # ISO 3166-1 alpha-2 (2-letter: "NL", "PE", "US")
    - alpha_3  # ISO 3166-1 alpha-3 (3-letter: "NLD", "PER", "USA")
  
  slot_usage:
    alpha_2:
      required: true
      pattern: "^[A-Z]{2}$"
      slot_uri: schema:addressCountry
    
    alpha_3:
      required: true
      pattern: "^[A-Z]{3}$"

Examples:

  • Netherlands: alpha_2="NL", alpha_3="NLD"
  • Peru: alpha_2="PE", alpha_3="PER"
  • United States: alpha_2="US", alpha_3="USA"
  • Japan: alpha_2="JP", alpha_3="JPN"

3. Integrated Country into CustodianPlace

File: schemas/20251121/linkml/modules/classes/CustodianPlace.yaml

Changes:

  • Added country slot (optional, range: Country)
  • Links place to its country location
  • Enables country-specific feature type validation

Use Cases:

  • Disambiguate places across countries ("Victoria Museum" exists in multiple countries)
  • Enable country-conditional feature types (e.g., "cultural heritage of Peru")
  • Generate country-specific enum values

Example:

CustodianPlace:
  place_name: "Machu Picchu"
  place_language: "es"
  country:
    alpha_2: "PE"
    alpha_3: "PER"
  has_feature_type:
    feature_type: CULTURAL_HERITAGE_OF_PERU  # Only valid for PE!

4. Integrated Country into LegalForm

File: schemas/20251121/linkml/modules/classes/LegalForm.yaml

Changes:

  • Updated country_code from string pattern to Country class reference
  • Enforces jurisdiction-specific legal forms via ontology links

Rationale:

  • Legal forms are jurisdiction-specific
  • A "Stichting" in Netherlands ≠ "Fundación" in Spain (different legal meaning)
  • Country class provides canonical ISO codes for legal jurisdictions

Before (string):

country_code:
  range: string
  pattern: "^[A-Z]{2}$"

After (Country class):

country_code:
  range: Country
  required: true

Example:

LegalForm:
  elf_code: "8888"
  country_code:
    alpha_2: "NL"
    alpha_3: "NLD"
  local_name: "Stichting"
  abbreviation: "Stg."

5. Created Example Instance File

File: schemas/20251121/examples/country_integration_example.yaml

Shows:

  • Country instances (NL, PE, US)
  • CustodianPlace with country linking
  • LegalForm with country linking
  • Country-specific feature types (CULTURAL_HERITAGE_OF_PERU, BUITENPLAATS)

Design Decisions

Why Minimal Country Class?

Excluded Metadata:

  • Country names (language-dependent: "Netherlands" vs "Pays-Bas" vs "荷兰")
  • Capital cities (change over time: Myanmar moved capital 2006)
  • Languages (multilingual countries: Belgium has 3 official languages)
  • Regions/continents (political: Is Turkey in Europe or Asia?)
  • Currency (changes: Eurozone adoption)
  • Phone codes (technical, not heritage-relevant)

Rationale:

  1. Language neutrality: ISO codes work across all languages
  2. Temporal stability: Country names and capitals change; ISO codes are persistent
  3. Separation of concerns: Heritage ontology shouldn't duplicate geopolitical databases
  4. External resolution: Use GeoNames, UN M49, or ISO 3166 Maintenance Agency for metadata

External Services:

  • GeoNames API: Country names in 20+ languages, capitals, regions
  • UN M49: Standard country codes and regions
  • ISO 3166 Maintenance Agency: Official ISO code updates

Why Both Alpha-2 and Alpha-3?

Alpha-2 (2-letter):

  • Used by: Internet ccTLDs (.nl, .pe, .us), Schema.org addressCountry
  • Compact, widely recognized
  • Primary for web applications

Alpha-3 (3-letter):

  • Used by: United Nations, International Olympic Committee, ISO 4217 (currency codes)
  • Less ambiguous (e.g., "AT" = Austria vs "AT" = @-sign in some systems)
  • Primary for international standards

Both are required to ensure interoperability across different systems.


Country-Specific Feature Types

Problem: Some feature types only apply to specific countries

Examples from FeatureTypeEnum.yaml:

  1. CULTURAL_HERITAGE_OF_PERU (Q16617058)

    • Description: "cultural heritage of Peru"
    • Only valid for: country.alpha_2 = "PE"
    • Hypernym: cultural heritage
  2. BUITENPLAATS (Q2927789)

    • Description: "summer residence for rich townspeople in the Netherlands"
    • Only valid for: country.alpha_2 = "NL"
    • Hypernym: heritage site
  3. NATIONAL_MEMORIAL_OF_THE_UNITED_STATES (Q20010800)

    • Description: "national memorial in the United States"
    • Only valid for: country.alpha_2 = "US"
    • Hypernym: heritage site

Solution: Country-Conditional Enum Values

Implementation Strategy:

When validating CustodianPlace.has_feature_type:

place_country = custodian_place.country.alpha_2

if feature_type == "CULTURAL_HERITAGE_OF_PERU":
    assert place_country == "PE", "CULTURAL_HERITAGE_OF_PERU only valid for Peru"

if feature_type == "BUITENPLAATS":
    assert place_country == "NL", "BUITENPLAATS only valid for Netherlands"

LinkML Implementation (future enhancement):

FeatureTypeEnum:
  permissible_values:
    CULTURAL_HERITAGE_OF_PERU:
      meaning: wd:Q16617058
      annotations:
        country_restriction: "PE"  # Only valid for Peru
    
    BUITENPLAATS:
      meaning: wd:Q2927789
      annotations:
        country_restriction: "NL"  # Only valid for Netherlands

Files Modified

Created:

  1. schemas/20251121/linkml/modules/classes/Country.yaml (new)
  2. schemas/20251121/examples/country_integration_example.yaml (new)

Modified:

  1. schemas/20251121/linkml/modules/classes/CustodianPlace.yaml

    • Added country import
    • Added country slot
    • Added slot_usage documentation
  2. schemas/20251121/linkml/modules/classes/LegalForm.yaml

    • Added Country import
    • Changed country_code from string to Country class reference
  3. schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml

    • Removed duplicate RESIDENTIAL_BUILDING (Q11755880)
    • Removed duplicate SANCTUARY (Q29553, kept Q21850178)
    • Result: 294 unique feature types

Validation Results

YAML Syntax: Valid

Total enum values: 294
No duplicate keys found

Country Class: Minimal Design

- alpha_2: Required, pattern: ^[A-Z]{2}$
- alpha_3: Required, pattern: ^[A-Z]{3}$
- No other fields (names, languages, capitals excluded)

Integration: Complete

  • CustodianPlace → Country (optional link)
  • LegalForm → Country (required link, jurisdiction-specific)
  • FeatureTypeEnum → Ready for country-conditional validation

Next Steps (Future Work)

1. Country-Conditional Enum Validation

Task: Implement validation rules for country-specific feature types

Approach:

  • Add country_restriction annotation to FeatureTypeEnum entries
  • Create LinkML validation rule to check CustodianPlace.country matches restriction
  • Generate country-specific enum subsets for UI dropdowns

Example Rule:

rules:
  - title: "Country-specific feature type validation"
    preconditions:
      slot_conditions:
        has_feature_type:
          range: FeaturePlace
    postconditions:
      slot_conditions:
        country:
          value_must_match: "{has_feature_type.country_restriction}"

2. Populate Country Instances

Task: Create Country instances for all countries in the dataset

Data Source: ISO 3166-1 official list (249 countries)

Implementation:

# schemas/20251121/data/countries.yaml
countries:
  - id: https://nde.nl/ontology/hc/country/NL
    alpha_2: "NL"
    alpha_3: "NLD"
  
  - id: https://nde.nl/ontology/hc/country/PE
    alpha_2: "PE"
    alpha_3: "PER"
  
  # ... 247 more entries

Task: Populate LegalForm instances with ISO 20275 Entity Legal Form codes

Data Source: GLEIF ISO 20275 Code List (1,600+ legal forms across 150+ jurisdictions)

Example:

# Netherlands legal forms
- id: https://nde.nl/ontology/hc/legal-form/nl-8888
  elf_code: "8888"
  country_code: {alpha_2: "NL", alpha_3: "NLD"}
  local_name: "Stichting"
  abbreviation: "Stg."

- id: https://nde.nl/ontology/hc/legal-form/nl-akd2
  elf_code: "AKD2"
  country_code: {alpha_2: "NL", alpha_3: "NLD"}
  local_name: "Besloten vennootschap"
  abbreviation: "B.V."

4. External Resolution Service Integration

Task: Provide helper functions to resolve country metadata via GeoNames API

Implementation:

from typing import Dict
import requests

def resolve_country_metadata(alpha_2: str) -> Dict:
    """Resolve country metadata from GeoNames API."""
    url = f"http://api.geonames.org/countryInfoJSON"
    params = {
        "country": alpha_2,
        "username": "your_geonames_username"
    }
    response = requests.get(url, params=params)
    data = response.json()
    
    return {
        "name_en": data["geonames"][0]["countryName"],
        "capital": data["geonames"][0]["capital"],
        "languages": data["geonames"][0]["languages"].split(","),
        "continent": data["geonames"][0]["continent"]
    }

# Usage
country_metadata = resolve_country_metadata("NL")
# Returns: {
#   "name_en": "Netherlands",
#   "capital": "Amsterdam",
#   "languages": ["nl", "fy"],
#   "continent": "EU"
# }

5. UI Dropdown Generation

Task: Generate country-filtered feature type dropdowns for data entry forms

Use Case: When user selects "Netherlands" as country, only show:

  • Universal feature types (MUSEUM, CHURCH, MANSION, etc.)
  • Netherlands-specific types (BUITENPLAATS)
  • Exclude Peru-specific types (CULTURAL_HERITAGE_OF_PERU)

Implementation:

def get_valid_feature_types(country_alpha_2: str) -> List[str]:
    """Get valid feature types for a given country."""
    universal_types = [ft for ft in FeatureTypeEnum if not has_country_restriction(ft)]
    country_specific = [ft for ft in FeatureTypeEnum if get_country_restriction(ft) == country_alpha_2]
    return universal_types + country_specific

References


Status

Country Class Implementation: COMPLETE

  • Duplicate keys fixed in FeatureTypeEnum.yaml
  • Country class created with minimal design
  • CustodianPlace integrated with country linking
  • LegalForm integrated with country linking
  • Example instance file created
  • Documentation complete

Ready for: Country-conditional enum validation and LegalForm population with ISO 20275 codes.