16 KiB
Migration Guide: UN/LOCODE to GeoNames for GHCID
Version: 1.0
Migration Date: 2025-11-05
Status: Pre-Migration Planning
Overview
This guide explains how to migrate existing GHCID identifiers from UN/LOCODE-based city codes to GeoNames-based city abbreviations.
Why migrate?
- UN/LOCODE covers only 10.5% of Dutch cities (50/475)
- GeoNames provides 100% coverage (475+ Dutch cities)
- Enables global expansion to 60+ countries
Impact:
- Existing 152 GHCIDs may change
- New GHCIDs generated for 212 previously uncovered institutions
- Overall GHCID coverage increases from 41.8% to >95%
Breaking Changes
1. City Code Source Change
Before (UN/LOCODE):
Amsterdam → AMS (from UN/LOCODE registry)
Rotterdam → RTM (from UN/LOCODE registry)
After (GeoNames):
Amsterdam → AMS (first 3 letters of city name)
Rotterdam → ROT (first 3 letters of city name)
Impact: Some city codes will change format.
2. GHCID String Changes
Example institution: Science Museum in Rotterdam
Old GHCID:
NL-ZH-RTM-M-SM
New GHCID:
NL-ZH-ROT-M-SM
Note: Rotterdam's UN/LOCODE is "RTM", but GeoNames abbreviation is "ROT" (first 3 letters).
3. Numeric Hash Changes
Critical: The ghcid_numeric field (SHA256 hash) will change because it's computed from the GHCID string.
Old hash:
SHA256("NL-ZH-RTM-M-SM")[:8] → 12345678901234567890
New hash:
SHA256("NL-ZH-ROT-M-SM")[:8] → 98765432109876543210
Impact: Any systems referencing ghcid_numeric must update references.
Migration Strategy
Phase 1: Preparation (Pre-Migration)
1.1 Export Current GHCIDs
# scripts/export_current_ghcids.py
from glam_extractor.parsers.isil_registry import ISILRegistryParser
parser = ISILRegistryParser()
records = parser.parse_and_convert("data/ISIL-codes_2025-08-01.csv")
# Export current GHCIDs to CSV
with open("data/migration/ghcids_before_migration.csv", "w") as f:
f.write("isil_code,institution_name,ghcid_current,ghcid_numeric\n")
for record in records:
if record.ghcid_current:
f.write(f"{record.identifiers[0].identifier_value},"
f"{record.name},"
f"{record.ghcid_current},"
f"{record.ghcid_numeric}\n")
Output: 152 records with current GHCIDs saved to CSV.
1.2 Identify Affected Records
# scripts/identify_ghcid_changes.py
def compare_locodes_vs_geonames():
"""Compare old LOCODE-based GHCIDs with new GeoNames-based GHCIDs."""
old_records = load_csv("data/migration/ghcids_before_migration.csv")
changes = []
for record in old_records:
old_ghcid = record['ghcid_current']
new_ghcid = generate_new_ghcid(record) # Using GeoNames
if old_ghcid != new_ghcid:
changes.append({
'isil_code': record['isil_code'],
'institution': record['institution_name'],
'old_ghcid': old_ghcid,
'new_ghcid': new_ghcid,
'reason': 'City code changed (LOCODE → GeoNames)'
})
# Save report
save_csv("data/migration/ghcid_changes.csv", changes)
print(f"Found {len(changes)} GHCIDs that will change")
return changes
Expected: 10-30% of existing GHCIDs will change.
1.3 Create Mapping Table
# data/migration/ghcid_mapping.csv
# Maps old GHCID → new GHCID for backward compatibility
old_ghcid,new_ghcid,old_numeric,new_numeric,change_reason
NL-ZH-RTM-M-SM,NL-ZH-ROT-M-SM,12345678901234567890,98765432109876543210,City code RTM→ROT
NL-NH-HAG-A-NA,NL-NH-DEN-A-NA,11111111111111111111,22222222222222222222,City code HAG→DEN
Purpose: Support lookups by old GHCID during transition period.
Phase 2: Implementation (Migration Day)
2.1 Build GeoNames Database
# Download GeoNames data
wget http://download.geonames.org/export/dump/NL.zip
unzip NL.zip
# Build SQLite database
python scripts/build_geonames_db.py \
--input NL.txt \
--output data/reference/geonames.db
# Validate completeness
python scripts/validate_geonames_db.py
# Expected: 475+ Dutch cities loaded
2.2 Update Code
Update lookups.py:
# OLD (deprecated)
def get_city_locode(city: str, country: str = "NL") -> Optional[str]:
return _NL_CITY_LOCODES.get("cities", {}).get(city)
# NEW (GeoNames-based)
from glam_extractor.geocoding.geonames_lookup import GeoNamesDB
_geonames_db = GeoNamesDB("data/reference/geonames.db")
def get_city_abbreviation(city: str, country: str = "NL") -> Optional[str]:
"""Get 3-letter city abbreviation from GeoNames."""
result = _geonames_db.get_city_details(city, country)
if result:
return result['abbreviation'] # First 3 letters, uppercase
return None
Update GHCID generation in isil_registry.py:
# OLD
city_locode = get_city_locode(record.plaats, "NL")
if not city_locode:
return None # Skip GHCID generation
# NEW
city_abbr = get_city_abbreviation(record.plaats, "NL")
if not city_abbr:
print(f"Warning: No GeoNames match for city: {record.plaats}")
return None
2.3 Regenerate All GHCIDs
# scripts/regenerate_ghcids_with_geonames.py
from glam_extractor.parsers.isil_registry import ISILRegistryParser
from datetime import datetime, timezone
parser = ISILRegistryParser()
records = parser.parse_and_convert("data/ISIL-codes_2025-08-01.csv")
migration_date = datetime.now(timezone.utc)
for record in records:
# Save old GHCID to history
if record.ghcid_current:
old_ghcid = record.ghcid_current
old_numeric = record.ghcid_numeric
# Generate new GHCID (using GeoNames)
new_components = generate_ghcid_components(record) # Now uses GeoNames
new_ghcid = new_components.to_string()
new_numeric = new_components.to_numeric()
if old_ghcid != new_ghcid:
# Add history entry for change
history_entry = GHCIDHistoryEntry(
ghcid=old_ghcid,
ghcid_numeric=old_numeric,
valid_from=record.ghcid_history[0].valid_from, # Original date
valid_to=migration_date,
reason="Migrated from UN/LOCODE to GeoNames city abbreviation",
institution_name=record.name,
location_city=record.locations[0].city,
location_country=record.locations[0].country
)
record.ghcid_history.append(history_entry)
# Update current GHCID
record.ghcid_current = new_ghcid
record.ghcid_numeric = new_numeric
# Keep ghcid_original unchanged (immutable)
# Save updated records
export_to_jsonld(records, "output/heritage_custodians_geonames.jsonld")
Phase 3: Validation (Post-Migration)
3.1 Verify Coverage Improvement
# scripts/validate_migration.py
def validate_migration():
"""Verify GHCID generation improved after GeoNames migration."""
# Before migration
old_stats = load_csv("data/migration/ghcids_before_migration.csv")
old_coverage = len(old_stats) # 152 records
# After migration
new_records = parse_isil_registry()
new_coverage = sum(1 for r in new_records if r.ghcid_current)
print(f"Coverage before: {old_coverage}/364 ({old_coverage/364*100:.1f}%)")
print(f"Coverage after: {new_coverage}/364 ({new_coverage/364*100:.1f}%)")
print(f"Improvement: +{new_coverage - old_coverage} records")
assert new_coverage > old_coverage, "Migration should improve coverage"
assert new_coverage >= 345, "Should reach >95% coverage"
Expected results:
- Before: 152/364 (41.8%)
- After: 345+/364 (>95%)
- Improvement: +193 records
3.2 Verify Mapping Table
def verify_ghcid_mapping():
"""Ensure all old GHCIDs map to new GHCIDs."""
mapping = load_csv("data/migration/ghcid_mapping.csv")
for row in mapping:
old_ghcid = row['old_ghcid']
new_ghcid = row['new_ghcid']
# Verify new GHCID exists
record = lookup_by_ghcid(new_ghcid)
assert record, f"New GHCID not found: {new_ghcid}"
# Verify history entry exists
assert any(h.ghcid == old_ghcid for h in record.ghcid_history), \
f"Old GHCID not in history: {old_ghcid}"
3.3 Test Suite
# Run full test suite
pytest tests/ -v
# Expected: All 150+ tests passing
# New tests for GeoNames integration should be added
Phase 4: Backward Compatibility
4.1 Support Old GHCID Lookups
# src/glam_extractor/identifiers/ghcid_lookup.py
class GHCIDLookup:
"""Lookup institutions by GHCID (supports old + new identifiers)."""
def __init__(self, mapping_file: str = "data/migration/ghcid_mapping.csv"):
self.mapping = load_mapping(mapping_file)
def lookup(self, ghcid: str) -> Optional[HeritageCustodian]:
"""
Lookup institution by GHCID.
Supports:
- Current GHCID (GeoNames-based)
- Legacy GHCID (UN/LOCODE-based, via mapping table)
"""
# Try current GHCID first
record = self._lookup_by_current_ghcid(ghcid)
if record:
return record
# Check if it's a legacy GHCID
new_ghcid = self.mapping.get(ghcid)
if new_ghcid:
print(f"Info: Old GHCID {ghcid} → new GHCID {new_ghcid}")
return self._lookup_by_current_ghcid(new_ghcid)
return None # Not found
Purpose: Existing systems can continue using old GHCIDs for 6-12 months.
4.2 Deprecation Timeline
| Date | Action |
|---|---|
| 2025-11-05 | Migration complete, mapping table created |
| 2025-11-05 - 2026-05-05 | Support both old + new GHCIDs (6 months) |
| 2026-02-05 | Send deprecation warnings for old GHCIDs (3 months notice) |
| 2026-05-05 | Remove old GHCID support, mapping table archived |
Communication: Notify users via:
- Documentation updates
- Log warnings when old GHCID used
- Email to API consumers (if applicable)
City Code Changes Reference
Common Changes
| City | UN/LOCODE (Old) | GeoNames (New) | Change? |
|---|---|---|---|
| Amsterdam | AMS | AMS | No ✅ |
| Rotterdam | RTM | ROT | Yes ⚠️ |
| Den Haag | HAG | DEN | Yes ⚠️ |
| Utrecht | UTC | UTR | Yes ⚠️ |
| Eindhoven | EIN | EIN | No ✅ |
| Groningen | GRQ | GRO | Yes ⚠️ |
| Tilburg | TIL | TIL | No ✅ |
| Almere | ALM | ALM | No ✅ |
| Breda | BRE | BRE | No ✅ |
| Nijmegen | NIM | NIJ | Yes ⚠️ |
Pattern: Cities where UN/LOCODE uses different abbreviation than first 3 letters will change.
Newly Covered Cities
Cities not in UN/LOCODE but now in GeoNames:
| City | GeoNames Abbr | New GHCIDs Generated |
|---|---|---|
| Achtkarspelen | ACH | ~2 institutions |
| Almkerk | ALM | ~1 institution |
| Ameland | AME | ~1 institution |
| Bunschoten | BUN | ~3 institutions |
| ... | ... | ... |
Total: +212 institutions can now generate GHCIDs.
Rollback Plan
If migration causes critical issues:
1. Immediate Rollback (Emergency)
# Restore old code
git checkout tags/pre-geonames-migration
# Restore old data
cp data/migration/ghcids_before_migration.csv data/ghcids_current.csv
# Restart services
systemctl restart glam-extractor
Downtime: <5 minutes
2. Keep GeoNames, Revert GHCIDs
If GeoNames database is fine but GHCIDs need adjustment:
# Restore old GHCIDs from backup
restore_ghcids_from_backup("data/migration/ghcids_before_migration.csv")
# Keep GeoNames database for new institutions
# Only update new institutions, leave existing unchanged
Downtime: <1 hour
Testing Checklist
Pre-Migration Testing:
- GeoNames database contains all 475 Dutch cities
- City abbreviation algorithm tested (Amsterdam → AMS)
- Province code mapping works (Amsterdam → NH)
- Mapping table created (old GHCID → new GHCID)
Migration Testing:
- All 150+ tests pass
- GHCID coverage increased from 41.8% to >95%
- No duplicate GHCIDs generated
- History entries correctly capture old GHCIDs
Post-Migration Testing:
- Old GHCID lookup works (via mapping table)
- New GHCID lookup works (direct)
- Export formats valid (JSON-LD, RDF, CSV)
- No data loss (all original ISIL records present)
Communication Plan
Internal Team
Email template:
Subject: GHCID Migration to GeoNames - Action Required
Team,
On 2025-11-05, we're migrating GHCID city codes from UN/LOCODE to GeoNames.
BENEFITS:
- Coverage increases from 41.8% to >95%
- +212 institutions can now generate GHCIDs
- Enables global expansion to 60+ countries
BREAKING CHANGES:
- Some GHCIDs will change (10-30% of existing)
- Numeric hashes will change (computed from GHCID)
- Mapping table provided for backward compatibility
ACTIONS:
1. Review migration guide: docs/migration/ghcid_locode_to_geonames.md
2. Test with new GHCIDs in staging environment
3. Update any hardcoded GHCID references
4. Plan to migrate to new GHCIDs by 2026-05-05
Questions? Reply to this email or see documentation.
External Users (if applicable)
API changelog:
## Version 2.0.0 - 2025-11-05
### BREAKING CHANGES
- GHCID city codes now use GeoNames abbreviations instead of UN/LOCODE
- Some existing GHCIDs have changed format (see mapping table)
- `ghcid_numeric` field may have new values
### Upgrade Guide
- Download GHCID mapping table: https://example.org/ghcid_mapping.csv
- Update your local GHCID references
- Old GHCIDs supported until 2026-05-05
### New Features
- GHCID coverage increased to >95% for Dutch institutions
- Support for 475+ Dutch cities (previously 50)
- Global expansion enabled for 60+ countries
Success Criteria
Migration is successful if:
- ✅ GHCID coverage >95% (>345/364 ISIL records)
- ✅ All 150+ tests passing
- ✅ Mapping table covers all changed GHCIDs
- ✅ Zero data loss (all ISIL records present)
- ✅ History entries capture old GHCIDs
- ✅ Old GHCID lookup works (6-month compatibility)
- ✅ No duplicate GHCIDs generated
- ✅ GeoNames database <10MB (NL-only)
Post-Migration Tasks
-
Monitor for issues (first 2 weeks)
- Check error logs daily
- Track old GHCID lookup usage
- Collect user feedback
-
Update documentation (within 1 week)
- Mark UN/LOCODE approach as deprecated
- Update all examples to use GeoNames
- Add migration date to CHANGELOG
-
Performance monitoring (first month)
- Measure city lookup latency (<1ms target)
- Check database query performance
- Monitor storage usage
-
Quarterly GeoNames updates (ongoing)
- Download latest GeoNames dump
- Rebuild database
- Validate no regressions
Lessons Learned (Post-Migration)
To be filled after migration complete:
What went well:
- TBD
What could be improved:
- TBD
Unexpected issues:
- TBD
Recommendations for future migrations:
- TBD
References
- GeoNames Integration Design:
docs/plan/global_glam/08-geonames-integration.md - GHCID Specification:
docs/plan/global_glam/06-global-identifier-system.md - Collision Resolution:
docs/plan/global_glam/07-ghcid-collision-resolution.md - GeoNames Official: https://www.geonames.org
Version: 1.0
Last Updated: 2025-11-05
Status: Pre-Migration Planning
Next Review: After migration complete