glam/schemas/20251121/MIGRATION_GUIDE.md
2025-11-21 22:12:33 +01:00

12 KiB

Legal Form Migration Guide: Generic Enums → ISO 20275

Purpose: Migrate heritage institution data from generic legal form enumerations to ISO 20275 Entity Legal Forms (ELF) codes.

Status: Migration script ready for testing
Date: 2025-11-21
Version: 1.0.0


Quick Start

1. Install Dependencies

pip install pyyaml

2. Run Migration Script

# Single file (Dutch institutions)
python scripts/migrate_legal_form_to_iso20275.py \
    --input data/nde/institutions.yaml \
    --output data/nde/institutions_migrated.yaml \
    --country NL

# Entire directory (auto-detect country)
python scripts/migrate_legal_form_to_iso20275.py \
    --input-dir data/nde/ \
    --output-dir data/nde_migrated/

# Dry run (preview changes)
python scripts/migrate_legal_form_to_iso20275.py \
    --input data/nde/institutions.yaml \
    --output /dev/null \
    --dry-run

3. Review Migration Report

The script generates a detailed report saved to migration_report.txt:

Migration Report
================
Total records processed: 1351
Successfully migrated: 1200
Unchanged (already ISO 20275): 50
Requiring manual review: 95
Errors: 6

Success rate: 88.8%

Migration Mappings

Netherlands (NL)

Old Enum ISO 20275 Code Local Name Confidence
STICHTING V44D Stichting 1.0
ASSOCIATION 33MN Vereniging met volledige rechtsbevoegdheid 0.9
NGO 33MN Vereniging 0.7
GOVERNMENT_AGENCY A0W7 Publiekrechtelijke rechtspersoon 0.95
LIMITED_COMPANY 54M6 Besloten vennootschap (BV) 0.85
COOPERATIVE NFFH Coöperatie 1.0
TRUST V44D Stichting 0.6

France (FR)

Old Enum ISO 20275 Code Local Name Confidence
STICHTING 9T5S Fondation 0.8
ASSOCIATION BEWI Association déclarée 1.0
NGO BEWI Association 0.9
GOVERNMENT_AGENCY 5RDO Établissement public 1.0
LIMITED_COMPANY KMPN SARL 0.9
COOPERATIVE 6HB6 Société coopérative (SCOP) 1.0

Germany (DE)

Old Enum ISO 20275 Code Local Name Confidence
STICHTING V2YH Stiftung 1.0
ASSOCIATION QZ3L Eingetragener Verein (e.V.) 1.0
NGO QZ3L e.V. 0.9
GOVERNMENT_AGENCY SQKS Körperschaft des öffentlichen Rechts 1.0
LIMITED_COMPANY XLWA GmbH 0.9
COOPERATIVE XAEA Eingetragene Genossenschaft (eG) 1.0

United Kingdom (GB/UK)

Old Enum ISO 20275 Code Local Name Confidence
STICHTING FC0R Trust 0.8
ASSOCIATION 9HLU Charity 0.9
NGO 9HLU Charity 0.95
GOVERNMENT_AGENCY AVYY Public corporation 1.0
LIMITED_COMPANY CBL2 Private company limited by shares (Ltd) 0.9
COOPERATIVE 83XL Co-operative Society 1.0
TRUST FC0R Trust 1.0

United States (US)

Old Enum ISO 20275 Code Local Name Confidence
STICHTING QQQ0 501(c)(3) Nonprofit Organization 0.8
ASSOCIATION QQQ0 501(c)(3) 0.9
NGO QQQ0 501(c)(3) 0.95
GOVERNMENT_AGENCY W2ES Government Entity 1.0
LIMITED_COMPANY CNQ3 Business Corporation 0.8
COOPERATIVE S63E Cooperative 1.0
TRUST 7TPC Trust 1.0

Understanding Confidence Scores

Confidence Range: 0.0 (uncertain) to 1.0 (certain)

Thresholds:

  • 1.0: Exact one-to-one mapping (e.g., STICHTING → V44D in Netherlands)
  • 0.9-0.99: High confidence, common pattern (e.g., ASSOCIATION → BEWI in France)
  • 0.7-0.89: Moderate confidence, generally correct but verify (e.g., NGO → 33MN in Netherlands)
  • 0.5-0.69: Low confidence, requires manual review (e.g., TRUST → V44D in Netherlands)
  • < 0.5: Very uncertain, manual mapping required

Default threshold: 0.7 (automatically migrate only if confidence ≥ 0.7)

Change threshold:

python scripts/migrate_legal_form_to_iso20275.py \
    --confidence-threshold 0.8 \
    ...

Manual Review Cases

The script flags records for manual review in these situations:

1. Low Confidence Mappings

Example: TRUST in Netherlands context

  • Generic enum TRUST could map to V44D (stichting) or FC0R (trust)
  • Confidence: 0.6 (below default threshold)
  • Action: Verify legal documents, check KvK registry

2. Unknown Enum Values

Example: PRIVATE_FOUNDATION (not in standard mappings)

  • No predefined mapping available
  • Action: Consult ISO 20275 CSV file, check country-specific guide

3. Country Unknown

Example: Record without locations field

  • Cannot determine country-specific mapping
  • Uses default fallback mapping
  • Action: Add country code, re-run migration

4. Ambiguous Cases

Example: NGO could be various legal forms

  • France: BEWI (association) vs 9T5S (fondation)
  • US: QQQ0 (501(c)(3)) vs 7TPC (trust)
  • Action: Check organizational charter, verify legal form

Provenance Tracking

The script automatically adds migration metadata to provenance.notes:

provenance:
  data_source: CSV_REGISTRY
  notes: |
    Extracted from Dutch ISIL registry
    
    [MIGRATION 2025-11-21T14:30:00Z] legal_form migrated: 'STICHTING' → 'V44D' (ISO 20275). 
    Country: NL. Confidence: 1.0. Mapped to Stichting (Stichting)    

Timestamp: ISO 8601 format with timezone
Format: [MIGRATION <timestamp>] legal_form migrated: '<old>' → '<new>' (ISO 20275). Country: <country>. Confidence: <score>. <notes>


Validation After Migration

1. Schema Validation

# Install LinkML CLI
pip install linkml

# Validate migrated data
linkml-validate -s schemas/20251121/linkml/02_organization_observation_reconstruction.yaml \
    data/nde_migrated/institutions.yaml

2. Manual Spot Checks

Review 10-20 random records:

# Extract random sample
shuf data/nde_migrated/institutions.yaml | head -20 > sample_review.yaml

# Check:
# - legal_form matches expected ISO 20275 code
# - legal_form matches country (e.g., V44D for NL institutions)
# - Provenance notes document migration

3. Cross-Reference with ISO 20275 Registry

# Check all migrated codes exist in registry
grep -o '"[A-Z0-9]\{4\}"' data/nde_migrated/institutions.yaml | sort -u > used_codes.txt

# Compare with ISO 20275 CSV
cut -d',' -f1 data/ontology/2023-09-28-elf-code-list-v1.5.csv | sort -u > valid_codes.txt

comm -23 used_codes.txt valid_codes.txt  # Should be empty (all codes valid)

Edge Cases and Solutions

Problem: Institution changed legal form over time

legal_form: STICHTING  # Founded as foundation
# ... but became government agency in 2010

Solution: Use ChangeEvent to track legal form changes

legal_form: A0W7  # Current form (government agency)
change_history:
  - change_type: LEGAL_CHANGE
    event_date: "2010-01-01"
    event_description: "Converted from private stichting to public entity"
    old_legal_form: V44D
    new_legal_form: A0W7

Case 2: Foreign Entities Operating in Country

Problem: French association operating branch in Netherlands

legal_form: ASSOCIATION
locations:
  - city: Amsterdam
    country: NL

Solution: Use parent country, not branch location

legal_form: BEWI  # French association (parent entity)
locations:
  - city: Paris
    country: FR
    is_headquarters: true
  - city: Amsterdam
    country: NL
    is_branch: true

Problem: Holding company with multiple subsidiaries

legal_name: Museum Group Holding
legal_form: ???  # Parent is BV, subsidiary is stichting

Solution: Create separate records, link via parent_organization

# Parent
- id: https://w3id.org/heritage/org/museum-holding
  legal_name: Museum Group Holding BV
  legal_form: 54M6  # Dutch BV
  
# Subsidiary
- id: https://w3id.org/heritage/org/museum-foundation
  legal_name: Stichting Museum X
  legal_form: V44D  # Dutch stichting
  parent_organization: https://w3id.org/heritage/org/museum-holding

Troubleshooting

Error: "ELF codes file not found"

Cause: Missing ISO 20275 CSV file

Solution:

# Download from GLEIF
wget -O data/ontology/2023-09-28-elf-code-list-v1.5.csv \
    https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list/download-the-code-list

# Or specify custom path
python scripts/migrate_legal_form_to_iso20275.py \
    --elf-codes /path/to/elf-codes.csv \
    ...

Error: "Code 'XXXX' not found in ISO 20275 registry"

Cause: Invalid or non-existent ELF code

Solutions:

  1. Check for typos (codes are case-sensitive)
  2. Verify code is in latest ISO 20275 version
  3. Check if code is inactive (ELF Status ACTV/INAC column)
  4. Use country-specific guide to find correct code

Warning: "Low confidence mapping"

Not an error - Script requires manual review

Actions:

  1. Review migration report for flagged records
  2. Consult country-specific guide (elf_codes/{country}/README.md)
  3. Verify with legal documents (statutes, KvK registry)
  4. Update record manually if needed

Country-Specific Guides

For detailed legal form mappings:

  • Netherlands: /schemas/20251121/elf_codes/netherlands/README.md (30+ codes)
  • France: /schemas/20251121/elf_codes/france/README.md (240+ codes)
  • Germany: /schemas/20251121/elf_codes/germany/README.md (30+ codes)
  • United Kingdom: /schemas/20251121/elf_codes/uk/README.md (40+ codes)
  • United States: /schemas/20251121/elf_codes/usa/README.md (732+ codes)

Testing the Migration

Run unit tests:

pytest tests/test_legal_form_migration.py -v

Test coverage:

  • ELF code validation
  • Country-specific mappings
  • Confidence scoring
  • Manual review flagging
  • Provenance tracking
  • Edge case handling

Script Options Reference

usage: migrate_legal_form_to_iso20275.py [-h] [--input INPUT] [--output OUTPUT]
                                          [--input-dir INPUT_DIR] [--output-dir OUTPUT_DIR]
                                          [--country COUNTRY] [--elf-codes ELF_CODES]
                                          [--confidence-threshold CONFIDENCE_THRESHOLD]
                                          [--dry-run] [--report-only]
                                          [--report-path REPORT_PATH]

options:
  --input INPUT              Input YAML file
  --output OUTPUT            Output YAML file
  --input-dir INPUT_DIR      Input directory (batch mode)
  --output-dir OUTPUT_DIR    Output directory (batch mode)
  --country COUNTRY          Default country code (ISO 3166-1, e.g., NL, FR, DE)
  --elf-codes ELF_CODES      Path to ISO 20275 CSV file
  --confidence-threshold     Minimum confidence for auto-migration (0.0-1.0, default: 0.7)
  --dry-run                  Preview changes without writing files
  --report-only              Generate report only, no file output
  --report-path              Path to save migration report (default: migration_report.txt)

Next Steps After Migration

  1. Validate migrated data

    • Run LinkML schema validation
    • Spot check 10-20 records manually
    • Cross-reference with ISO 20275 registry
  2. Regenerate RDF files

    • Convert migrated YAML to RDF
    • Validate triples with ontology alignment
    • Update RDF_GENERATION_SUMMARY.md
  3. Update documentation

    • Document migration statistics
    • Note any manual corrections made
    • Update schema change log
  4. Commit changes

    • Commit migrated data files
    • Commit migration report
    • Tag release with migration completion

References


Last Updated: 2025-11-21
Maintainer: GLAM Ontology Project
Version: 1.0.0