434 lines
12 KiB
Markdown
434 lines
12 KiB
Markdown
# Legal Form Migration Guide: Generic Enums → ISO 20275
|
|
|
|
**Purpose**: Migrate heritage institution data from generic legal form enumerations to ISO 20275 Entity Legal Forms (ELF) codes.
|
|
|
|
**Status**: Migration script ready for testing
|
|
**Date**: 2025-11-21
|
|
**Version**: 1.0.0
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install Dependencies
|
|
|
|
```bash
|
|
pip install pyyaml
|
|
```
|
|
|
|
### 2. Run Migration Script
|
|
|
|
```bash
|
|
# Single file (Dutch institutions)
|
|
python scripts/migrate_legal_form_to_iso20275.py \
|
|
--input data/nde/institutions.yaml \
|
|
--output data/nde/institutions_migrated.yaml \
|
|
--country NL
|
|
|
|
# Entire directory (auto-detect country)
|
|
python scripts/migrate_legal_form_to_iso20275.py \
|
|
--input-dir data/nde/ \
|
|
--output-dir data/nde_migrated/
|
|
|
|
# Dry run (preview changes)
|
|
python scripts/migrate_legal_form_to_iso20275.py \
|
|
--input data/nde/institutions.yaml \
|
|
--output /dev/null \
|
|
--dry-run
|
|
```
|
|
|
|
### 3. Review Migration Report
|
|
|
|
The script generates a detailed report saved to `migration_report.txt`:
|
|
|
|
```
|
|
Migration Report
|
|
================
|
|
Total records processed: 1351
|
|
Successfully migrated: 1200
|
|
Unchanged (already ISO 20275): 50
|
|
Requiring manual review: 95
|
|
Errors: 6
|
|
|
|
Success rate: 88.8%
|
|
```
|
|
|
|
---
|
|
|
|
## Migration Mappings
|
|
|
|
### Netherlands (NL)
|
|
|
|
| Old Enum | ISO 20275 Code | Local Name | Confidence |
|
|
|----------|----------------|------------|------------|
|
|
| `STICHTING` | `V44D` | Stichting | 1.0 |
|
|
| `ASSOCIATION` | `33MN` | Vereniging met volledige rechtsbevoegdheid | 0.9 |
|
|
| `NGO` | `33MN` | Vereniging | 0.7 |
|
|
| `GOVERNMENT_AGENCY` | `A0W7` | Publiekrechtelijke rechtspersoon | 0.95 |
|
|
| `LIMITED_COMPANY` | `54M6` | Besloten vennootschap (BV) | 0.85 |
|
|
| `COOPERATIVE` | `NFFH` | Coöperatie | 1.0 |
|
|
| `TRUST` | `V44D` | Stichting | 0.6 |
|
|
|
|
### France (FR)
|
|
|
|
| Old Enum | ISO 20275 Code | Local Name | Confidence |
|
|
|----------|----------------|------------|------------|
|
|
| `STICHTING` | `9T5S` | Fondation | 0.8 |
|
|
| `ASSOCIATION` | `BEWI` | Association déclarée | 1.0 |
|
|
| `NGO` | `BEWI` | Association | 0.9 |
|
|
| `GOVERNMENT_AGENCY` | `5RDO` | Établissement public | 1.0 |
|
|
| `LIMITED_COMPANY` | `KMPN` | SARL | 0.9 |
|
|
| `COOPERATIVE` | `6HB6` | Société coopérative (SCOP) | 1.0 |
|
|
|
|
### Germany (DE)
|
|
|
|
| Old Enum | ISO 20275 Code | Local Name | Confidence |
|
|
|----------|----------------|------------|------------|
|
|
| `STICHTING` | `V2YH` | Stiftung | 1.0 |
|
|
| `ASSOCIATION` | `QZ3L` | Eingetragener Verein (e.V.) | 1.0 |
|
|
| `NGO` | `QZ3L` | e.V. | 0.9 |
|
|
| `GOVERNMENT_AGENCY` | `SQKS` | Körperschaft des öffentlichen Rechts | 1.0 |
|
|
| `LIMITED_COMPANY` | `XLWA` | GmbH | 0.9 |
|
|
| `COOPERATIVE` | `XAEA` | Eingetragene Genossenschaft (eG) | 1.0 |
|
|
|
|
### United Kingdom (GB/UK)
|
|
|
|
| Old Enum | ISO 20275 Code | Local Name | Confidence |
|
|
|----------|----------------|------------|------------|
|
|
| `STICHTING` | `FC0R` | Trust | 0.8 |
|
|
| `ASSOCIATION` | `9HLU` | Charity | 0.9 |
|
|
| `NGO` | `9HLU` | Charity | 0.95 |
|
|
| `GOVERNMENT_AGENCY` | `AVYY` | Public corporation | 1.0 |
|
|
| `LIMITED_COMPANY` | `CBL2` | Private company limited by shares (Ltd) | 0.9 |
|
|
| `COOPERATIVE` | `83XL` | Co-operative Society | 1.0 |
|
|
| `TRUST` | `FC0R` | Trust | 1.0 |
|
|
|
|
### United States (US)
|
|
|
|
| Old Enum | ISO 20275 Code | Local Name | Confidence |
|
|
|----------|----------------|------------|------------|
|
|
| `STICHTING` | `QQQ0` | 501(c)(3) Nonprofit Organization | 0.8 |
|
|
| `ASSOCIATION` | `QQQ0` | 501(c)(3) | 0.9 |
|
|
| `NGO` | `QQQ0` | 501(c)(3) | 0.95 |
|
|
| `GOVERNMENT_AGENCY` | `W2ES` | Government Entity | 1.0 |
|
|
| `LIMITED_COMPANY` | `CNQ3` | Business Corporation | 0.8 |
|
|
| `COOPERATIVE` | `S63E` | Cooperative | 1.0 |
|
|
| `TRUST` | `7TPC` | Trust | 1.0 |
|
|
|
|
---
|
|
|
|
## Understanding Confidence Scores
|
|
|
|
**Confidence Range**: 0.0 (uncertain) to 1.0 (certain)
|
|
|
|
**Thresholds**:
|
|
- **1.0**: Exact one-to-one mapping (e.g., STICHTING → V44D in Netherlands)
|
|
- **0.9-0.99**: High confidence, common pattern (e.g., ASSOCIATION → BEWI in France)
|
|
- **0.7-0.89**: Moderate confidence, generally correct but verify (e.g., NGO → 33MN in Netherlands)
|
|
- **0.5-0.69**: Low confidence, requires manual review (e.g., TRUST → V44D in Netherlands)
|
|
- **< 0.5**: Very uncertain, manual mapping required
|
|
|
|
**Default threshold**: 0.7 (automatically migrate only if confidence ≥ 0.7)
|
|
|
|
Change threshold:
|
|
```bash
|
|
python scripts/migrate_legal_form_to_iso20275.py \
|
|
--confidence-threshold 0.8 \
|
|
...
|
|
```
|
|
|
|
---
|
|
|
|
## Manual Review Cases
|
|
|
|
The script flags records for manual review in these situations:
|
|
|
|
### 1. Low Confidence Mappings
|
|
|
|
**Example**: `TRUST` in Netherlands context
|
|
- Generic enum `TRUST` could map to `V44D` (stichting) or `FC0R` (trust)
|
|
- Confidence: 0.6 (below default threshold)
|
|
- **Action**: Verify legal documents, check KvK registry
|
|
|
|
### 2. Unknown Enum Values
|
|
|
|
**Example**: `PRIVATE_FOUNDATION` (not in standard mappings)
|
|
- No predefined mapping available
|
|
- **Action**: Consult ISO 20275 CSV file, check country-specific guide
|
|
|
|
### 3. Country Unknown
|
|
|
|
**Example**: Record without `locations` field
|
|
- Cannot determine country-specific mapping
|
|
- Uses default fallback mapping
|
|
- **Action**: Add country code, re-run migration
|
|
|
|
### 4. Ambiguous Cases
|
|
|
|
**Example**: `NGO` could be various legal forms
|
|
- France: `BEWI` (association) vs `9T5S` (fondation)
|
|
- US: `QQQ0` (501(c)(3)) vs `7TPC` (trust)
|
|
- **Action**: Check organizational charter, verify legal form
|
|
|
|
---
|
|
|
|
## Provenance Tracking
|
|
|
|
The script automatically adds migration metadata to `provenance.notes`:
|
|
|
|
```yaml
|
|
provenance:
|
|
data_source: CSV_REGISTRY
|
|
notes: |
|
|
Extracted from Dutch ISIL registry
|
|
|
|
[MIGRATION 2025-11-21T14:30:00Z] legal_form migrated: 'STICHTING' → 'V44D' (ISO 20275).
|
|
Country: NL. Confidence: 1.0. Mapped to Stichting (Stichting)
|
|
```
|
|
|
|
**Timestamp**: ISO 8601 format with timezone
|
|
**Format**: `[MIGRATION <timestamp>] legal_form migrated: '<old>' → '<new>' (ISO 20275). Country: <country>. Confidence: <score>. <notes>`
|
|
|
|
---
|
|
|
|
## Validation After Migration
|
|
|
|
### 1. Schema Validation
|
|
|
|
```bash
|
|
# Install LinkML CLI
|
|
pip install linkml
|
|
|
|
# Validate migrated data
|
|
linkml-validate -s schemas/20251121/linkml/02_organization_observation_reconstruction.yaml \
|
|
data/nde_migrated/institutions.yaml
|
|
```
|
|
|
|
### 2. Manual Spot Checks
|
|
|
|
Review 10-20 random records:
|
|
|
|
```bash
|
|
# Extract random sample
|
|
shuf data/nde_migrated/institutions.yaml | head -20 > sample_review.yaml
|
|
|
|
# Check:
|
|
# - legal_form matches expected ISO 20275 code
|
|
# - legal_form matches country (e.g., V44D for NL institutions)
|
|
# - Provenance notes document migration
|
|
```
|
|
|
|
### 3. Cross-Reference with ISO 20275 Registry
|
|
|
|
```bash
|
|
# Check all migrated codes exist in registry
|
|
grep -o '"[A-Z0-9]\{4\}"' data/nde_migrated/institutions.yaml | sort -u > used_codes.txt
|
|
|
|
# Compare with ISO 20275 CSV
|
|
cut -d',' -f1 data/ontology/2023-09-28-elf-code-list-v1.5.csv | sort -u > valid_codes.txt
|
|
|
|
comm -23 used_codes.txt valid_codes.txt # Should be empty (all codes valid)
|
|
```
|
|
|
|
---
|
|
|
|
## Edge Cases and Solutions
|
|
|
|
### Case 1: Mixed Legal Forms in Single Record
|
|
|
|
**Problem**: Institution changed legal form over time
|
|
```yaml
|
|
legal_form: STICHTING # Founded as foundation
|
|
# ... but became government agency in 2010
|
|
```
|
|
|
|
**Solution**: Use `ChangeEvent` to track legal form changes
|
|
```yaml
|
|
legal_form: A0W7 # Current form (government agency)
|
|
change_history:
|
|
- change_type: LEGAL_CHANGE
|
|
event_date: "2010-01-01"
|
|
event_description: "Converted from private stichting to public entity"
|
|
old_legal_form: V44D
|
|
new_legal_form: A0W7
|
|
```
|
|
|
|
### Case 2: Foreign Entities Operating in Country
|
|
|
|
**Problem**: French association operating branch in Netherlands
|
|
```yaml
|
|
legal_form: ASSOCIATION
|
|
locations:
|
|
- city: Amsterdam
|
|
country: NL
|
|
```
|
|
|
|
**Solution**: Use parent country, not branch location
|
|
```yaml
|
|
legal_form: BEWI # French association (parent entity)
|
|
locations:
|
|
- city: Paris
|
|
country: FR
|
|
is_headquarters: true
|
|
- city: Amsterdam
|
|
country: NL
|
|
is_branch: true
|
|
```
|
|
|
|
### Case 3: Multiple Legal Forms (Group Structure)
|
|
|
|
**Problem**: Holding company with multiple subsidiaries
|
|
```yaml
|
|
legal_name: Museum Group Holding
|
|
legal_form: ??? # Parent is BV, subsidiary is stichting
|
|
```
|
|
|
|
**Solution**: Create separate records, link via `parent_organization`
|
|
```yaml
|
|
# Parent
|
|
- id: https://w3id.org/heritage/org/museum-holding
|
|
legal_name: Museum Group Holding BV
|
|
legal_form: 54M6 # Dutch BV
|
|
|
|
# Subsidiary
|
|
- id: https://w3id.org/heritage/org/museum-foundation
|
|
legal_name: Stichting Museum X
|
|
legal_form: V44D # Dutch stichting
|
|
parent_organization: https://w3id.org/heritage/org/museum-holding
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Error: "ELF codes file not found"
|
|
|
|
**Cause**: Missing ISO 20275 CSV file
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Download from GLEIF
|
|
wget -O data/ontology/2023-09-28-elf-code-list-v1.5.csv \
|
|
https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list/download-the-code-list
|
|
|
|
# Or specify custom path
|
|
python scripts/migrate_legal_form_to_iso20275.py \
|
|
--elf-codes /path/to/elf-codes.csv \
|
|
...
|
|
```
|
|
|
|
### Error: "Code 'XXXX' not found in ISO 20275 registry"
|
|
|
|
**Cause**: Invalid or non-existent ELF code
|
|
|
|
**Solutions**:
|
|
1. Check for typos (codes are case-sensitive)
|
|
2. Verify code is in latest ISO 20275 version
|
|
3. Check if code is inactive (`ELF Status ACTV/INAC` column)
|
|
4. Use country-specific guide to find correct code
|
|
|
|
### Warning: "Low confidence mapping"
|
|
|
|
**Not an error** - Script requires manual review
|
|
|
|
**Actions**:
|
|
1. Review migration report for flagged records
|
|
2. Consult country-specific guide (`elf_codes/{country}/README.md`)
|
|
3. Verify with legal documents (statutes, KvK registry)
|
|
4. Update record manually if needed
|
|
|
|
---
|
|
|
|
## Country-Specific Guides
|
|
|
|
For detailed legal form mappings:
|
|
|
|
- **Netherlands**: `/schemas/20251121/elf_codes/netherlands/README.md` (30+ codes)
|
|
- **France**: `/schemas/20251121/elf_codes/france/README.md` (240+ codes)
|
|
- **Germany**: `/schemas/20251121/elf_codes/germany/README.md` (30+ codes)
|
|
- **United Kingdom**: `/schemas/20251121/elf_codes/uk/README.md` (40+ codes)
|
|
- **United States**: `/schemas/20251121/elf_codes/usa/README.md` (732+ codes)
|
|
|
|
---
|
|
|
|
## Testing the Migration
|
|
|
|
Run unit tests:
|
|
|
|
```bash
|
|
pytest tests/test_legal_form_migration.py -v
|
|
```
|
|
|
|
Test coverage:
|
|
- ✅ ELF code validation
|
|
- ✅ Country-specific mappings
|
|
- ✅ Confidence scoring
|
|
- ✅ Manual review flagging
|
|
- ✅ Provenance tracking
|
|
- ✅ Edge case handling
|
|
|
|
---
|
|
|
|
## Script Options Reference
|
|
|
|
```
|
|
usage: migrate_legal_form_to_iso20275.py [-h] [--input INPUT] [--output OUTPUT]
|
|
[--input-dir INPUT_DIR] [--output-dir OUTPUT_DIR]
|
|
[--country COUNTRY] [--elf-codes ELF_CODES]
|
|
[--confidence-threshold CONFIDENCE_THRESHOLD]
|
|
[--dry-run] [--report-only]
|
|
[--report-path REPORT_PATH]
|
|
|
|
options:
|
|
--input INPUT Input YAML file
|
|
--output OUTPUT Output YAML file
|
|
--input-dir INPUT_DIR Input directory (batch mode)
|
|
--output-dir OUTPUT_DIR Output directory (batch mode)
|
|
--country COUNTRY Default country code (ISO 3166-1, e.g., NL, FR, DE)
|
|
--elf-codes ELF_CODES Path to ISO 20275 CSV file
|
|
--confidence-threshold Minimum confidence for auto-migration (0.0-1.0, default: 0.7)
|
|
--dry-run Preview changes without writing files
|
|
--report-only Generate report only, no file output
|
|
--report-path Path to save migration report (default: migration_report.txt)
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps After Migration
|
|
|
|
1. **Validate migrated data**
|
|
- Run LinkML schema validation
|
|
- Spot check 10-20 records manually
|
|
- Cross-reference with ISO 20275 registry
|
|
|
|
2. **Regenerate RDF files**
|
|
- Convert migrated YAML to RDF
|
|
- Validate triples with ontology alignment
|
|
- Update RDF_GENERATION_SUMMARY.md
|
|
|
|
3. **Update documentation**
|
|
- Document migration statistics
|
|
- Note any manual corrections made
|
|
- Update schema change log
|
|
|
|
4. **Commit changes**
|
|
- Commit migrated data files
|
|
- Commit migration report
|
|
- Tag release with migration completion
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- **ISO 20275 Standard**: https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list
|
|
- **GLEIF Registry**: https://www.gleif.org
|
|
- **Schema Documentation**: `/schemas/20251121/linkml/02_organization_observation_reconstruction.yaml`
|
|
- **Country Guides**: `/schemas/20251121/elf_codes/{country}/README.md`
|
|
- **Migration Script**: `/scripts/migrate_legal_form_to_iso20275.py`
|
|
- **Unit Tests**: `/tests/test_legal_form_migration.py`
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-11-21
|
|
**Maintainer**: GLAM Ontology Project
|
|
**Version**: 1.0.0
|