12 KiB
Legal Form Migration Guide: Generic Enums → ISO 20275
Purpose: Migrate heritage institution data from generic legal form enumerations to ISO 20275 Entity Legal Forms (ELF) codes.
Status: Migration script ready for testing
Date: 2025-11-21
Version: 1.0.0
Quick Start
1. Install Dependencies
pip install pyyaml
2. Run Migration Script
# Single file (Dutch institutions)
python scripts/migrate_legal_form_to_iso20275.py \
--input data/nde/institutions.yaml \
--output data/nde/institutions_migrated.yaml \
--country NL
# Entire directory (auto-detect country)
python scripts/migrate_legal_form_to_iso20275.py \
--input-dir data/nde/ \
--output-dir data/nde_migrated/
# Dry run (preview changes)
python scripts/migrate_legal_form_to_iso20275.py \
--input data/nde/institutions.yaml \
--output /dev/null \
--dry-run
3. Review Migration Report
The script generates a detailed report saved to migration_report.txt:
Migration Report
================
Total records processed: 1351
Successfully migrated: 1200
Unchanged (already ISO 20275): 50
Requiring manual review: 95
Errors: 6
Success rate: 88.8%
Migration Mappings
Netherlands (NL)
| Old Enum | ISO 20275 Code | Local Name | Confidence |
|---|---|---|---|
STICHTING |
V44D |
Stichting | 1.0 |
ASSOCIATION |
33MN |
Vereniging met volledige rechtsbevoegdheid | 0.9 |
NGO |
33MN |
Vereniging | 0.7 |
GOVERNMENT_AGENCY |
A0W7 |
Publiekrechtelijke rechtspersoon | 0.95 |
LIMITED_COMPANY |
54M6 |
Besloten vennootschap (BV) | 0.85 |
COOPERATIVE |
NFFH |
Coöperatie | 1.0 |
TRUST |
V44D |
Stichting | 0.6 |
France (FR)
| Old Enum | ISO 20275 Code | Local Name | Confidence |
|---|---|---|---|
STICHTING |
9T5S |
Fondation | 0.8 |
ASSOCIATION |
BEWI |
Association déclarée | 1.0 |
NGO |
BEWI |
Association | 0.9 |
GOVERNMENT_AGENCY |
5RDO |
Établissement public | 1.0 |
LIMITED_COMPANY |
KMPN |
SARL | 0.9 |
COOPERATIVE |
6HB6 |
Société coopérative (SCOP) | 1.0 |
Germany (DE)
| Old Enum | ISO 20275 Code | Local Name | Confidence |
|---|---|---|---|
STICHTING |
V2YH |
Stiftung | 1.0 |
ASSOCIATION |
QZ3L |
Eingetragener Verein (e.V.) | 1.0 |
NGO |
QZ3L |
e.V. | 0.9 |
GOVERNMENT_AGENCY |
SQKS |
Körperschaft des öffentlichen Rechts | 1.0 |
LIMITED_COMPANY |
XLWA |
GmbH | 0.9 |
COOPERATIVE |
XAEA |
Eingetragene Genossenschaft (eG) | 1.0 |
United Kingdom (GB/UK)
| Old Enum | ISO 20275 Code | Local Name | Confidence |
|---|---|---|---|
STICHTING |
FC0R |
Trust | 0.8 |
ASSOCIATION |
9HLU |
Charity | 0.9 |
NGO |
9HLU |
Charity | 0.95 |
GOVERNMENT_AGENCY |
AVYY |
Public corporation | 1.0 |
LIMITED_COMPANY |
CBL2 |
Private company limited by shares (Ltd) | 0.9 |
COOPERATIVE |
83XL |
Co-operative Society | 1.0 |
TRUST |
FC0R |
Trust | 1.0 |
United States (US)
| Old Enum | ISO 20275 Code | Local Name | Confidence |
|---|---|---|---|
STICHTING |
QQQ0 |
501(c)(3) Nonprofit Organization | 0.8 |
ASSOCIATION |
QQQ0 |
501(c)(3) | 0.9 |
NGO |
QQQ0 |
501(c)(3) | 0.95 |
GOVERNMENT_AGENCY |
W2ES |
Government Entity | 1.0 |
LIMITED_COMPANY |
CNQ3 |
Business Corporation | 0.8 |
COOPERATIVE |
S63E |
Cooperative | 1.0 |
TRUST |
7TPC |
Trust | 1.0 |
Understanding Confidence Scores
Confidence Range: 0.0 (uncertain) to 1.0 (certain)
Thresholds:
- 1.0: Exact one-to-one mapping (e.g., STICHTING → V44D in Netherlands)
- 0.9-0.99: High confidence, common pattern (e.g., ASSOCIATION → BEWI in France)
- 0.7-0.89: Moderate confidence, generally correct but verify (e.g., NGO → 33MN in Netherlands)
- 0.5-0.69: Low confidence, requires manual review (e.g., TRUST → V44D in Netherlands)
- < 0.5: Very uncertain, manual mapping required
Default threshold: 0.7 (automatically migrate only if confidence ≥ 0.7)
Change threshold:
python scripts/migrate_legal_form_to_iso20275.py \
--confidence-threshold 0.8 \
...
Manual Review Cases
The script flags records for manual review in these situations:
1. Low Confidence Mappings
Example: TRUST in Netherlands context
- Generic enum
TRUSTcould map toV44D(stichting) orFC0R(trust) - Confidence: 0.6 (below default threshold)
- Action: Verify legal documents, check KvK registry
2. Unknown Enum Values
Example: PRIVATE_FOUNDATION (not in standard mappings)
- No predefined mapping available
- Action: Consult ISO 20275 CSV file, check country-specific guide
3. Country Unknown
Example: Record without locations field
- Cannot determine country-specific mapping
- Uses default fallback mapping
- Action: Add country code, re-run migration
4. Ambiguous Cases
Example: NGO could be various legal forms
- France:
BEWI(association) vs9T5S(fondation) - US:
QQQ0(501(c)(3)) vs7TPC(trust) - Action: Check organizational charter, verify legal form
Provenance Tracking
The script automatically adds migration metadata to provenance.notes:
provenance:
data_source: CSV_REGISTRY
notes: |
Extracted from Dutch ISIL registry
[MIGRATION 2025-11-21T14:30:00Z] legal_form migrated: 'STICHTING' → 'V44D' (ISO 20275).
Country: NL. Confidence: 1.0. Mapped to Stichting (Stichting)
Timestamp: ISO 8601 format with timezone
Format: [MIGRATION <timestamp>] legal_form migrated: '<old>' → '<new>' (ISO 20275). Country: <country>. Confidence: <score>. <notes>
Validation After Migration
1. Schema Validation
# Install LinkML CLI
pip install linkml
# Validate migrated data
linkml-validate -s schemas/20251121/linkml/02_organization_observation_reconstruction.yaml \
data/nde_migrated/institutions.yaml
2. Manual Spot Checks
Review 10-20 random records:
# Extract random sample
shuf data/nde_migrated/institutions.yaml | head -20 > sample_review.yaml
# Check:
# - legal_form matches expected ISO 20275 code
# - legal_form matches country (e.g., V44D for NL institutions)
# - Provenance notes document migration
3. Cross-Reference with ISO 20275 Registry
# Check all migrated codes exist in registry
grep -o '"[A-Z0-9]\{4\}"' data/nde_migrated/institutions.yaml | sort -u > used_codes.txt
# Compare with ISO 20275 CSV
cut -d',' -f1 data/ontology/2023-09-28-elf-code-list-v1.5.csv | sort -u > valid_codes.txt
comm -23 used_codes.txt valid_codes.txt # Should be empty (all codes valid)
Edge Cases and Solutions
Case 1: Mixed Legal Forms in Single Record
Problem: Institution changed legal form over time
legal_form: STICHTING # Founded as foundation
# ... but became government agency in 2010
Solution: Use ChangeEvent to track legal form changes
legal_form: A0W7 # Current form (government agency)
change_history:
- change_type: LEGAL_CHANGE
event_date: "2010-01-01"
event_description: "Converted from private stichting to public entity"
old_legal_form: V44D
new_legal_form: A0W7
Case 2: Foreign Entities Operating in Country
Problem: French association operating branch in Netherlands
legal_form: ASSOCIATION
locations:
- city: Amsterdam
country: NL
Solution: Use parent country, not branch location
legal_form: BEWI # French association (parent entity)
locations:
- city: Paris
country: FR
is_headquarters: true
- city: Amsterdam
country: NL
is_branch: true
Case 3: Multiple Legal Forms (Group Structure)
Problem: Holding company with multiple subsidiaries
legal_name: Museum Group Holding
legal_form: ??? # Parent is BV, subsidiary is stichting
Solution: Create separate records, link via parent_organization
# Parent
- id: https://w3id.org/heritage/org/museum-holding
legal_name: Museum Group Holding BV
legal_form: 54M6 # Dutch BV
# Subsidiary
- id: https://w3id.org/heritage/org/museum-foundation
legal_name: Stichting Museum X
legal_form: V44D # Dutch stichting
parent_organization: https://w3id.org/heritage/org/museum-holding
Troubleshooting
Error: "ELF codes file not found"
Cause: Missing ISO 20275 CSV file
Solution:
# Download from GLEIF
wget -O data/ontology/2023-09-28-elf-code-list-v1.5.csv \
https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list/download-the-code-list
# Or specify custom path
python scripts/migrate_legal_form_to_iso20275.py \
--elf-codes /path/to/elf-codes.csv \
...
Error: "Code 'XXXX' not found in ISO 20275 registry"
Cause: Invalid or non-existent ELF code
Solutions:
- Check for typos (codes are case-sensitive)
- Verify code is in latest ISO 20275 version
- Check if code is inactive (
ELF Status ACTV/INACcolumn) - Use country-specific guide to find correct code
Warning: "Low confidence mapping"
Not an error - Script requires manual review
Actions:
- Review migration report for flagged records
- Consult country-specific guide (
elf_codes/{country}/README.md) - Verify with legal documents (statutes, KvK registry)
- Update record manually if needed
Country-Specific Guides
For detailed legal form mappings:
- Netherlands:
/schemas/20251121/elf_codes/netherlands/README.md(30+ codes) - France:
/schemas/20251121/elf_codes/france/README.md(240+ codes) - Germany:
/schemas/20251121/elf_codes/germany/README.md(30+ codes) - United Kingdom:
/schemas/20251121/elf_codes/uk/README.md(40+ codes) - United States:
/schemas/20251121/elf_codes/usa/README.md(732+ codes)
Testing the Migration
Run unit tests:
pytest tests/test_legal_form_migration.py -v
Test coverage:
- ✅ ELF code validation
- ✅ Country-specific mappings
- ✅ Confidence scoring
- ✅ Manual review flagging
- ✅ Provenance tracking
- ✅ Edge case handling
Script Options Reference
usage: migrate_legal_form_to_iso20275.py [-h] [--input INPUT] [--output OUTPUT]
[--input-dir INPUT_DIR] [--output-dir OUTPUT_DIR]
[--country COUNTRY] [--elf-codes ELF_CODES]
[--confidence-threshold CONFIDENCE_THRESHOLD]
[--dry-run] [--report-only]
[--report-path REPORT_PATH]
options:
--input INPUT Input YAML file
--output OUTPUT Output YAML file
--input-dir INPUT_DIR Input directory (batch mode)
--output-dir OUTPUT_DIR Output directory (batch mode)
--country COUNTRY Default country code (ISO 3166-1, e.g., NL, FR, DE)
--elf-codes ELF_CODES Path to ISO 20275 CSV file
--confidence-threshold Minimum confidence for auto-migration (0.0-1.0, default: 0.7)
--dry-run Preview changes without writing files
--report-only Generate report only, no file output
--report-path Path to save migration report (default: migration_report.txt)
Next Steps After Migration
-
Validate migrated data
- Run LinkML schema validation
- Spot check 10-20 records manually
- Cross-reference with ISO 20275 registry
-
Regenerate RDF files
- Convert migrated YAML to RDF
- Validate triples with ontology alignment
- Update RDF_GENERATION_SUMMARY.md
-
Update documentation
- Document migration statistics
- Note any manual corrections made
- Update schema change log
-
Commit changes
- Commit migrated data files
- Commit migration report
- Tag release with migration completion
References
- ISO 20275 Standard: https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list
- GLEIF Registry: https://www.gleif.org
- Schema Documentation:
/schemas/20251121/linkml/02_organization_observation_reconstruction.yaml - Country Guides:
/schemas/20251121/elf_codes/{country}/README.md - Migration Script:
/scripts/migrate_legal_form_to_iso20275.py - Unit Tests:
/tests/test_legal_form_migration.py
Last Updated: 2025-11-21
Maintainer: GLAM Ontology Project
Version: 1.0.0