glam/COUNTRY_CLASS_IMPLEMENTATION_COMPLETE.md
kempersc 67657c39b6 feat: Complete Country Class Implementation and Hypernyms Removal
- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata.
- Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms.
- Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types.
- Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings.
- Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm.
- Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
2025-11-23 13:09:38 +01:00

407 lines
12 KiB
Markdown

# Country Class Implementation - Complete
**Date**: 2025-11-22
**Session**: Continuation of FeaturePlace implementation
---
## Summary
Successfully created the **Country class** to handle country-specific feature types and legal forms.
### ✅ What Was Completed
#### 1. Fixed Duplicate Keys in FeatureTypeEnum.yaml
- **Problem**: YAML had 2 duplicate keys (RESIDENTIAL_BUILDING, SANCTUARY)
- **Resolution**:
- Removed duplicate RESIDENTIAL_BUILDING (Q11755880) at lines 3692-3714
- Kept Catholic-specific SANCTUARY (Q21850178 "shrine of the Catholic Church")
- Removed generic SANCTUARY (Q29553 "sacred place")
- **Result**: 294 unique feature types (down from 296 with duplicates)
#### 2. Created Country Class
**File**: `schemas/20251121/linkml/modules/classes/Country.yaml`
**Design Philosophy**:
- **Minimal design**: ONLY ISO 3166-1 alpha-2 and alpha-3 codes
- **No other metadata**: No country names, languages, capitals, regions
- **Rationale**: ISO codes are authoritative, stable, language-neutral identifiers
- All other country metadata should be resolved via external services (GeoNames, UN M49)
**Schema**:
```yaml
Country:
description: Country identified by ISO 3166-1 codes
slots:
- alpha_2 # ISO 3166-1 alpha-2 (2-letter: "NL", "PE", "US")
- alpha_3 # ISO 3166-1 alpha-3 (3-letter: "NLD", "PER", "USA")
slot_usage:
alpha_2:
required: true
pattern: "^[A-Z]{2}$"
slot_uri: schema:addressCountry
alpha_3:
required: true
pattern: "^[A-Z]{3}$"
```
**Examples**:
- Netherlands: `alpha_2="NL"`, `alpha_3="NLD"`
- Peru: `alpha_2="PE"`, `alpha_3="PER"`
- United States: `alpha_2="US"`, `alpha_3="USA"`
- Japan: `alpha_2="JP"`, `alpha_3="JPN"`
#### 3. Integrated Country into CustodianPlace
**File**: `schemas/20251121/linkml/modules/classes/CustodianPlace.yaml`
**Changes**:
- Added `country` slot (optional, range: Country)
- Links place to its country location
- Enables country-specific feature type validation
**Use Cases**:
- Disambiguate places across countries ("Victoria Museum" exists in multiple countries)
- Enable country-conditional feature types (e.g., "cultural heritage of Peru")
- Generate country-specific enum values
**Example**:
```yaml
CustodianPlace:
place_name: "Machu Picchu"
place_language: "es"
country:
alpha_2: "PE"
alpha_3: "PER"
has_feature_type:
feature_type: CULTURAL_HERITAGE_OF_PERU # Only valid for PE!
```
#### 4. Integrated Country into LegalForm
**File**: `schemas/20251121/linkml/modules/classes/LegalForm.yaml`
**Changes**:
- Updated `country_code` from string pattern to Country class reference
- Enforces jurisdiction-specific legal forms via ontology links
**Rationale**:
- Legal forms are jurisdiction-specific
- A "Stichting" in Netherlands ≠ "Fundación" in Spain (different legal meaning)
- Country class provides canonical ISO codes for legal jurisdictions
**Before** (string):
```yaml
country_code:
range: string
pattern: "^[A-Z]{2}$"
```
**After** (Country class):
```yaml
country_code:
range: Country
required: true
```
**Example**:
```yaml
LegalForm:
elf_code: "8888"
country_code:
alpha_2: "NL"
alpha_3: "NLD"
local_name: "Stichting"
abbreviation: "Stg."
```
#### 5. Created Example Instance File
**File**: `schemas/20251121/examples/country_integration_example.yaml`
Shows:
- Country instances (NL, PE, US)
- CustodianPlace with country linking
- LegalForm with country linking
- Country-specific feature types (CULTURAL_HERITAGE_OF_PERU, BUITENPLAATS)
---
## Design Decisions
### Why Minimal Country Class?
**Excluded Metadata**:
- ❌ Country names (language-dependent: "Netherlands" vs "Pays-Bas" vs "荷兰")
- ❌ Capital cities (change over time: Myanmar moved capital 2006)
- ❌ Languages (multilingual countries: Belgium has 3 official languages)
- ❌ Regions/continents (political: Is Turkey in Europe or Asia?)
- ❌ Currency (changes: Eurozone adoption)
- ❌ Phone codes (technical, not heritage-relevant)
**Rationale**:
1. **Language neutrality**: ISO codes work across all languages
2. **Temporal stability**: Country names and capitals change; ISO codes are persistent
3. **Separation of concerns**: Heritage ontology shouldn't duplicate geopolitical databases
4. **External resolution**: Use GeoNames, UN M49, or ISO 3166 Maintenance Agency for metadata
**External Services**:
- GeoNames API: Country names in 20+ languages, capitals, regions
- UN M49: Standard country codes and regions
- ISO 3166 Maintenance Agency: Official ISO code updates
### Why Both Alpha-2 and Alpha-3?
**Alpha-2** (2-letter):
- Used by: Internet ccTLDs (.nl, .pe, .us), Schema.org addressCountry
- Compact, widely recognized
- **Primary for web applications**
**Alpha-3** (3-letter):
- Used by: United Nations, International Olympic Committee, ISO 4217 (currency codes)
- Less ambiguous (e.g., "AT" = Austria vs "AT" = @-sign in some systems)
- **Primary for international standards**
**Both are required** to ensure interoperability across different systems.
---
## Country-Specific Feature Types
### Problem: Some feature types only apply to specific countries
**Examples from FeatureTypeEnum.yaml**:
1. **CULTURAL_HERITAGE_OF_PERU** (Q16617058)
- Description: "cultural heritage of Peru"
- Only valid for: `country.alpha_2 = "PE"`
- Hypernym: cultural heritage
2. **BUITENPLAATS** (Q2927789)
- Description: "summer residence for rich townspeople in the Netherlands"
- Only valid for: `country.alpha_2 = "NL"`
- Hypernym: heritage site
3. **NATIONAL_MEMORIAL_OF_THE_UNITED_STATES** (Q20010800)
- Description: "national memorial in the United States"
- Only valid for: `country.alpha_2 = "US"`
- Hypernym: heritage site
### Solution: Country-Conditional Enum Values
**Implementation Strategy**:
When validating CustodianPlace.has_feature_type:
```python
place_country = custodian_place.country.alpha_2
if feature_type == "CULTURAL_HERITAGE_OF_PERU":
assert place_country == "PE", "CULTURAL_HERITAGE_OF_PERU only valid for Peru"
if feature_type == "BUITENPLAATS":
assert place_country == "NL", "BUITENPLAATS only valid for Netherlands"
```
**LinkML Implementation** (future enhancement):
```yaml
FeatureTypeEnum:
permissible_values:
CULTURAL_HERITAGE_OF_PERU:
meaning: wd:Q16617058
annotations:
country_restriction: "PE" # Only valid for Peru
BUITENPLAATS:
meaning: wd:Q2927789
annotations:
country_restriction: "NL" # Only valid for Netherlands
```
---
## Files Modified
### Created:
1. `schemas/20251121/linkml/modules/classes/Country.yaml` (new)
2. `schemas/20251121/examples/country_integration_example.yaml` (new)
### Modified:
3. `schemas/20251121/linkml/modules/classes/CustodianPlace.yaml`
- Added `country` import
- Added `country` slot
- Added slot_usage documentation
4. `schemas/20251121/linkml/modules/classes/LegalForm.yaml`
- Added `Country` import
- Changed `country_code` from string to Country class reference
5. `schemas/20251121/linkml/modules/enums/FeatureTypeEnum.yaml`
- Removed duplicate RESIDENTIAL_BUILDING (Q11755880)
- Removed duplicate SANCTUARY (Q29553, kept Q21850178)
- Result: 294 unique feature types
---
## Validation Results
### YAML Syntax: ✅ Valid
```
Total enum values: 294
No duplicate keys found
```
### Country Class: ✅ Minimal Design
```
- alpha_2: Required, pattern: ^[A-Z]{2}$
- alpha_3: Required, pattern: ^[A-Z]{3}$
- No other fields (names, languages, capitals excluded)
```
### Integration: ✅ Complete
- CustodianPlace → Country (optional link)
- LegalForm → Country (required link, jurisdiction-specific)
- FeatureTypeEnum → Ready for country-conditional validation
---
## Next Steps (Future Work)
### 1. Country-Conditional Enum Validation
**Task**: Implement validation rules for country-specific feature types
**Approach**:
- Add `country_restriction` annotation to FeatureTypeEnum entries
- Create LinkML validation rule to check CustodianPlace.country matches restriction
- Generate country-specific enum subsets for UI dropdowns
**Example Rule**:
```yaml
rules:
- title: "Country-specific feature type validation"
preconditions:
slot_conditions:
has_feature_type:
range: FeaturePlace
postconditions:
slot_conditions:
country:
value_must_match: "{has_feature_type.country_restriction}"
```
### 2. Populate Country Instances
**Task**: Create Country instances for all countries in the dataset
**Data Source**: ISO 3166-1 official list (249 countries)
**Implementation**:
```yaml
# schemas/20251121/data/countries.yaml
countries:
- id: https://nde.nl/ontology/hc/country/NL
alpha_2: "NL"
alpha_3: "NLD"
- id: https://nde.nl/ontology/hc/country/PE
alpha_2: "PE"
alpha_3: "PER"
# ... 247 more entries
```
### 3. Link LegalForm to ISO 20275 ELF Codes
**Task**: Populate LegalForm instances with ISO 20275 Entity Legal Form codes
**Data Source**: GLEIF ISO 20275 Code List (1,600+ legal forms across 150+ jurisdictions)
**Example**:
```yaml
# Netherlands legal forms
- id: https://nde.nl/ontology/hc/legal-form/nl-8888
elf_code: "8888"
country_code: {alpha_2: "NL", alpha_3: "NLD"}
local_name: "Stichting"
abbreviation: "Stg."
- id: https://nde.nl/ontology/hc/legal-form/nl-akd2
elf_code: "AKD2"
country_code: {alpha_2: "NL", alpha_3: "NLD"}
local_name: "Besloten vennootschap"
abbreviation: "B.V."
```
### 4. External Resolution Service Integration
**Task**: Provide helper functions to resolve country metadata via GeoNames API
**Implementation**:
```python
from typing import Dict
import requests
def resolve_country_metadata(alpha_2: str) -> Dict:
"""Resolve country metadata from GeoNames API."""
url = f"http://api.geonames.org/countryInfoJSON"
params = {
"country": alpha_2,
"username": "your_geonames_username"
}
response = requests.get(url, params=params)
data = response.json()
return {
"name_en": data["geonames"][0]["countryName"],
"capital": data["geonames"][0]["capital"],
"languages": data["geonames"][0]["languages"].split(","),
"continent": data["geonames"][0]["continent"]
}
# Usage
country_metadata = resolve_country_metadata("NL")
# Returns: {
# "name_en": "Netherlands",
# "capital": "Amsterdam",
# "languages": ["nl", "fy"],
# "continent": "EU"
# }
```
### 5. UI Dropdown Generation
**Task**: Generate country-filtered feature type dropdowns for data entry forms
**Use Case**: When user selects "Netherlands" as country, only show:
- Universal feature types (MUSEUM, CHURCH, MANSION, etc.)
- Netherlands-specific types (BUITENPLAATS)
- Exclude Peru-specific types (CULTURAL_HERITAGE_OF_PERU)
**Implementation**:
```python
def get_valid_feature_types(country_alpha_2: str) -> List[str]:
"""Get valid feature types for a given country."""
universal_types = [ft for ft in FeatureTypeEnum if not has_country_restriction(ft)]
country_specific = [ft for ft in FeatureTypeEnum if get_country_restriction(ft) == country_alpha_2]
return universal_types + country_specific
```
---
## References
- **ISO 3166-1**: https://www.iso.org/iso-3166-country-codes.html
- **GeoNames API**: https://www.geonames.org/export/web-services.html
- **UN M49**: https://unstats.un.org/unsd/methodology/m49/
- **ISO 20275**: https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list
- **Schema.org addressCountry**: https://schema.org/addressCountry
- **Wikidata Q-numbers**: Country-specific heritage feature types
---
## Status
**Country Class Implementation: COMPLETE**
- [x] Duplicate keys fixed in FeatureTypeEnum.yaml
- [x] Country class created with minimal design
- [x] CustodianPlace integrated with country linking
- [x] LegalForm integrated with country linking
- [x] Example instance file created
- [x] Documentation complete
**Ready for**: Country-conditional enum validation and LegalForm population with ISO 20275 codes.