glam/.opencode/rules/linkml-union-type-range-any-rule.md
2026-01-19 00:09:28 +01:00

4.7 KiB

Rule 59: LinkML Union Types Require range: Any

🚨 CRITICAL: When using any_of for union types in LinkML, you MUST also specify range: Any at the attribute level. Without it, the union type validation does NOT work.

The Problem

LinkML's any_of construct allows defining slots that accept multiple types (e.g., string OR integer). However, there's a critical implementation detail:

Without range: Any, the any_of constraint is silently ignored during validation.

This leads to validation failures where data that should be valid (e.g., integer value in a string/integer union field) is rejected.

Correct Pattern

slots:
  identifier_value:
    range: Any  # ← REQUIRED for any_of to work
    any_of:
      - range: string
      - range: integer
    description: The identifier value (can be string or integer)

Incorrect Pattern (WILL FAIL)

slots:
  identifier_value:
    # Missing range: Any - validation will fail!
    any_of:
      - range: string
      - range: integer
    description: The identifier value (can be string or integer)

Common Use Cases

This pattern is required for:

Use Case Types Example Fields
Identifier values string | integer identifier_value, geonames_id, viaf_id
Social media IDs string | array youtube_channel_id, facebook_id, twitter_username
Flexible identifiers object | array identifiers (dict or list format)
Numeric strings string | integer postal_code, kvk_number

Real-World Examples from GLAM Schema

Example 1: OriginalEntryIdentifier.yaml

# Before (BROKEN):
attributes:
  identifier_value:
    any_of:
      - range: string
      - range: integer

# After (WORKING):
attributes:
  identifier_value:
    range: Any  # Added
    any_of:
      - range: string
      - range: integer

Example 2: WikidataSocialMedia.yaml

# Social media fields that can be single value or array
attributes:
  youtube_channel_id:
    range: Any  # Required for string|array union
    any_of:
      - range: string
      - range: string
        multivalued: true
    description: YouTube channel ID (single value or array)
    
  facebook_id:
    range: Any
    any_of:
      - range: string
      - range: string
        multivalued: true

Example 3: OriginalEntry.yaml (object|array union)

# identifiers field that accepts both dict and array formats
attributes:
  identifiers:
    range: Any  # Required for flexible typing
    description: >-
      Identifiers from original source. Accepts both dict format
      (e.g., {isil: "XX-123"}) and array format
      (e.g., [{scheme: "isil", value: "XX-123"}])      

Example 4: OriginalEntryLocation.yaml

attributes:
  geonames_id:
    range: Any  # Required for string|integer
    any_of:
      - range: string
      - range: integer
    description: GeoNames ID (may be string or integer depending on source)

Validation Behavior

Schema Definition Integer Data String Data Result
range: string FAIL PASS Strict string only
range: integer PASS FAIL Strict integer only
any_of without range: Any FAIL FAIL Broken - nothing works
any_of with range: Any PASS PASS Correct union behavior

Why This Happens

LinkML's validation engine processes range first to determine the basic type constraint. When range is not specified (or defaults to string), it applies that constraint before checking any_of. The range: Any tells the validator to defer type checking to the any_of constraints.

Checklist for Union Types

When adding a field that accepts multiple types:

  • Define the any_of block with all acceptable ranges
  • Add range: Any at the same level as any_of
  • Test with sample data of each type
  • Document the accepted types in the description

See Also

  • LinkML Documentation: Union Types
  • GLAM Validation: schemas/20251121/linkml/modules/classes/CustodianSourceFile.yaml
  • Validation command: linkml-validate -s <schema>.yaml <data>.yaml

Migration Notes

Affected Files (Fixed January 2026):

  • OriginalEntryIdentifier.yaml - identifier_value
  • Identifier.yaml - identifier_value slot_usage
  • WikidataSocialMedia.yaml - youtube_channel_id, facebook_id, instagram_username, linkedin_company_id, twitter_username, facebook_page_id
  • YoutubeEnrichment.yaml - channel_id
  • OriginalEntryLocation.yaml - geonames_id
  • OriginalEntry.yaml - identifiers

Version: 1.0 Created: 2026-01-18 Author: AI Agent (OpenCode Claude)