glam/schemas/20251121/MIGRATION_QUICK_REFERENCE.md
2025-11-21 22:12:33 +01:00

6.8 KiB

ISO 20275 Migration Quick Reference Card

Version: 1.0.0
Date: 2025-11-21
For: Heritage custodian legal form data migration


🎯 Quick Start (30 seconds)

# Install
pip install pyyaml

# Migrate
python scripts/migrate_legal_form_to_iso20275.py \
    --input data.yaml --output migrated.yaml --country NL

# Review
cat migration_report.txt

🌍 Country Codes

Code Country Guide
NL Netherlands elf_codes/netherlands/README.md
FR France elf_codes/france/README.md
DE Germany elf_codes/germany/README.md
GB United Kingdom elf_codes/uk/README.md
US United States elf_codes/usa/README.md

📋 Common Mappings

Netherlands (NL)

STICHTING → V44D          # Confidence: 1.0
ASSOCIATION → 33MN        # Confidence: 0.9
GOVERNMENT_AGENCY → A0W7  # Confidence: 0.95

France (FR)

ASSOCIATION → BEWI        # Confidence: 1.0
GOVERNMENT_AGENCY → 5RDO  # Confidence: 1.0
STICHTING → 9T5S          # Confidence: 0.8 (Fondation)

Germany (DE)

STICHTING → V2YH          # Confidence: 1.0 (Stiftung)
ASSOCIATION → QZ3L        # Confidence: 1.0 (e.V.)
GOVERNMENT_AGENCY → SQKS  # Confidence: 1.0 (KdöR)

United Kingdom (GB)

NGO → 9HLU                # Confidence: 0.95 (Charity)
TRUST → FC0R              # Confidence: 1.0
GOVERNMENT_AGENCY → AVYY  # Confidence: 1.0

United States (US)

NGO → QQQ0                # Confidence: 0.95 (501(c)(3))
TRUST → 7TPC              # Confidence: 1.0
GOVERNMENT_AGENCY → W2ES  # Confidence: 1.0

🔍 Confidence Thresholds

Range Meaning Action
1.0 Perfect match Auto-migrate
0.9-0.99 High confidence Auto-migrate
0.7-0.89 Moderate Auto-migrate (default threshold)
0.5-0.69 Low confidence Manual review
< 0.5 Very uncertain Manual review required

Change threshold:

--confidence-threshold 0.8  # Only auto-migrate if ≥ 0.8

🚦 Migration Status Codes

Status Meaning Next Action
migrated Successfully converted None (complete)
unchanged Already ISO 20275 None (complete)
manual_review Requires verification Check migration report
error Invalid data Fix data, re-run

⚠️ Manual Review Triggers

The script flags records for manual review when:

  1. Low confidence (< 0.7): Ambiguous mapping

    TRUST → V44D  # Confidence: 0.6 in NL context
    
  2. Unknown enum: Not in predefined mappings

    legal_form: PRIVATE_FOUNDATION  # Unknown value
    
  3. Country unknown: No locations[0].country field

    locations: []  # Cannot determine country-specific mapping
    
  4. Invalid existing code: Already ISO 20275 but inactive

    legal_form: ZZZZ  # Code exists but status = INAC
    

📝 Example Migration

Before (generic enum):

id: https://w3id.org/heritage/org/rijksmuseum
legal_name: Stichting Rijksmuseum
legal_form: STICHTING
locations:
  - city: Amsterdam
    country: NL
provenance:
  data_source: CSV_REGISTRY
  notes: Extracted from Dutch ISIL registry

After (ISO 20275):

id: https://w3id.org/heritage/org/rijksmuseum
legal_name: Stichting Rijksmuseum
legal_form: V44D  # ← Migrated!
locations:
  - city: Amsterdam
    country: NL
provenance:
  data_source: CSV_REGISTRY
  notes: |
    Extracted from Dutch ISIL registry
    
    [MIGRATION 2025-11-21T14:30:00Z] legal_form migrated: 'STICHTING' → 'V44D' (ISO 20275). 
    Country: NL. Confidence: 1.0. Mapped to Stichting (Stichting)    

🛠️ Common Commands

Dry Run (Preview Changes)

python scripts/migrate_legal_form_to_iso20275.py \
    --input data.yaml --output /dev/null --dry-run

Single File

python scripts/migrate_legal_form_to_iso20275.py \
    --input data.yaml --output migrated.yaml --country NL

Batch Directory

python scripts/migrate_legal_form_to_iso20275.py \
    --input-dir data/ --output-dir migrated/

Report Only

python scripts/migrate_legal_form_to_iso20275.py \
    --input data.yaml --output migrated.yaml --report-only

Custom Confidence Threshold

python scripts/migrate_legal_form_to_iso20275.py \
    --input data.yaml --output migrated.yaml \
    --confidence-threshold 0.8

🧪 Testing

Run Unit Tests

pytest tests/test_legal_form_migration.py -v

Test with Example Data

python scripts/migrate_legal_form_to_iso20275.py \
    --input schemas/20251121/examples/rijksmuseum_observation_reconstruction.yaml \
    --output /tmp/test_migrated.yaml \
    --country NL --dry-run

🔧 Troubleshooting

"ELF codes file not found"

# Download ISO 20275 CSV
wget -O data/ontology/2023-09-28-elf-code-list-v1.5.csv \
    https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list

"Code 'XXXX' not found in registry"

  1. Check typo (case-sensitive!)
  2. Verify code exists in ISO 20275 CSV
  3. Check if code is inactive (INAC status)

"Low confidence mapping"

  1. Review migration report
  2. Consult country guide (elf_codes/{country}/README.md)
  3. Verify with legal documents
  4. Update manually if needed

📚 Full Documentation

  • Migration Guide: schemas/20251121/MIGRATION_GUIDE.md
  • Country Guides: schemas/20251121/elf_codes/{country}/README.md
  • Script Source: scripts/migrate_legal_form_to_iso20275.py
  • Unit Tests: tests/test_legal_form_migration.py

Validation Checklist

After migration, verify:

  • All legal_form values match pattern ^[A-Z0-9]{4}$
  • All codes exist in ISO 20275 registry
  • All codes have status ACTV (not INAC)
  • Migration metadata added to provenance.notes
  • Manual review items addressed
  • Schema validation passes (linkml-validate)

🎯 Critical Distinctions

Operational name vs Legal name vs Legal form:

# Three SEPARATE concepts!

# 1. Operational name (OrganizationName.standardized_name)
standardized_name: "Rijksmuseum"  # Emic, daily operations

# 2. Legal registered name (org:legalName)
legal_name: "Stichting Rijksmuseum"  # Legal documents, KvK registry

# 3. Legal form code (org:classification)
legal_form: "V44D"  # ISO 20275 code (NOT a name!)

These can differ significantly:

  • Getty Museum (operational) vs. J. Paul Getty Trust (legal)
  • British Museum (operational) vs. The Trustees of the British Museum (legal)
  • Rijksmuseum (operational) vs. Stichting Rijksmuseum (legal)

Print this card: schemas/20251121/MIGRATION_QUICK_REFERENCE.md
Last Updated: 2025-11-21
Version: 1.0.0