glam/schemas/20251121/linkml/modules/classes/CustodianName.yaml

id: https://nde.nl/ontology/hc/class/CustodianName
name: CustodianName
title: Custodian Name Class

imports:
  - linkml:types
  - ./Custodian
  - ./CustodianObservation
  - ./ReconstructionActivity
  - ./TimeSpan
  - ./ReconstructedEntity

classes:
  CustodianName:
    is_a: ReconstructedEntity
    class_uri: skos:Concept
    description: |
      Standardized emic (insider) name DERIVED FROM CustodianObservation(s).

      CRITICAL: CustodianName is NOT a subclass of CustodianObservation!
      - CustodianObservation = Evidence seen in sources (input)
      - CustodianName = Standardized interpretation (output)
      - Relationship: CustodianName prov:wasDerivedFrom CustodianObservation

      CustodianName represents the CANONICAL LABEL - the standardized form
      accepted by the custodian itself for public identification.

      IMPORTANT: CustodianName ≠ Legal Name
      - CustodianName = How custodian presents itself (emic, operational)
      - Legal Name = Formal registered name (in CustodianLegalStatus)
      - Example: "Rijksmuseum" (emic) vs "Stichting Rijksmuseum" (legal)

      ===========================================================================
      MANDATORY RULE: Legal Form Terms MUST Be Filtered
      ===========================================================================

      Legal form designations (Stichting, Foundation, Inc., Ltd., GmbH, etc.)
      MUST ALWAYS be removed from CustodianName, even when the custodian
      self-identifies with them. This is the ONE EXCEPTION to the emic principle.

      RATIONALE:
      1. Legal form is METADATA about the entity, not part of its identity
      2. Legal forms change (foundation→corporation) but identity persists
      3. Enables consistent cross-jurisdictional comparison
      4. Prevents duplicate entries ("X Foundation" vs "X")
      5. Aligns with ISO 20275 (Legal Entity Identifier) principles

      EXAMPLES:
      - "Stichting Rijksmuseum" → CustodianName: "Rijksmuseum"
      - "Hidde Nijland Stichting" → CustodianName: "Hidde Nijland"
      - "The Getty Foundation" → CustodianName: "The Getty"
      - "British Museum Trust Ltd" → CustodianName: "British Museum"
      - "Fundação Biblioteca Nacional" → CustodianName: "Biblioteca Nacional"

      LEGAL FORM TERMS TO FILTER (partial list by jurisdiction):
      - Dutch: Stichting, Vereniging, Coöperatie, B.V., N.V., V.O.F.
      - English: Foundation, Trust, Inc., Ltd., LLC, Corp., Association
      - German: Stiftung, Verein, e.V., GmbH, AG
      - French: Fondation, Association, S.A., S.A.R.L.
      - Spanish: Fundación, Asociación, S.A., S.L.
      - Portuguese: Fundação, Associação, Ltda., S.A.
      - Italian: Fondazione, Associazione, S.p.A., S.r.l.

      See: .opencode/LEGAL_FORM_FILTERING_RULE.md for comprehensive global list

      ===========================================================================
      MANDATORY RULE: Special Characters MUST Be Excluded from Abbreviations
      ===========================================================================

      When generating abbreviations for GHCID, special characters and symbols
      MUST be completely removed. Only alphabetic characters (A-Z) are permitted
      in the abbreviation component of the GHCID.

      RATIONALE:
      1. URL/URI safety - Special characters require encoding in URIs
      2. Filename safety - Characters like &, /, \, : are invalid in filenames
      3. Parsing consistency - Avoids delimiter conflicts in data pipelines
      4. Cross-system compatibility - Ensures interoperability with all systems
      5. Human readability - Clean identifiers are easier to communicate

      CHARACTERS TO REMOVE (exhaustive list):
      - Ampersand: & (e.g., "Records & Archives" → "RA", not "R&A")
      - Slash: / (e.g., "Art/Design Museum" → "ADM", not "A/DM")
      - Backslash: \
      - Plus: + (e.g., "Culture+" → "C")
      - At sign: @
      - Hash/Pound: #
      - Percent: %
      - Dollar: $
      - Asterisk: *
      - Parentheses: ( )
      - Brackets: [ ] { }
      - Pipe: |
      - Colon: :
      - Semicolon: ;
      - Quotation marks: " ' `
      - Comma: ,
      - Period: . (unless part of abbreviation like "U.S." → "US")
      - Hyphen: - (skip, do not replace with letter)
      - Underscore: _
      - Equals: =
      - Question mark: ?
      - Exclamation: !
      - Tilde: ~
      - Caret: ^
      - Less/Greater than: < >

      EXAMPLES:
      - "Department of Records & Information Management" → "DRIM" (not "DR&IM")
      - "Art + Culture Center" → "ACC" (not "A+CC")
      - "Museum/Gallery Amsterdam" → "MGA" (not "M/GA")
      - "Heritage@Digital" → "HD" (not "H@D")
      - "Archives (Historical)" → "AH" (not "A(H)")

      See: .opencode/ABBREVIATION_SPECIAL_CHAR_RULE.md for complete documentation

      ===========================================================================
      MANDATORY RULE: Diacritics MUST Be Normalized to ASCII in Abbreviations
      ===========================================================================

      When generating abbreviations for GHCID, diacritics (accented characters)
      MUST be normalized to their ASCII base letter equivalents. Only ASCII
      uppercase letters (A-Z) are permitted in the abbreviation component.

      RATIONALE:
      1. URI/URL safety - Non-ASCII requires percent-encoding
      2. Cross-system compatibility - ASCII is universally supported
      3. Parsing consistency - No special character handling needed
      4. Human readability - Easier to type and communicate

      DIACRITICS TO NORMALIZE (examples by language):
      - Czech: Č→C, Ř→R, Š→S, Ž→Z, Ě→E, Ů→U
      - Polish: Ł→L, Ń→N, Ó→O, Ś→S, Ź→Z, Ż→Z, Ą→A, Ę→E
      - German: Ä→A, Ö→O, Ü→U, ß→SS
      - French: É→E, È→E, Ê→E, Ç→C, Ô→O
      - Spanish: Ñ→N, Á→A, É→E, Í→I, Ó→O, Ú→U
      - Nordic: Å→A, Ä→A, Ö→O, Ø→O, Æ→AE

      EXAMPLES:
      - "Vlastivědné muzeum" (Czech) → "VM" (not "VM" with háček)
      - "Österreichische Nationalbibliothek" (German) → "ON"
      - "Bibliothèque nationale" (French) → "BN"

      REAL-WORLD EXAMPLE:
      - ❌ WRONG:  CZ-VY-TEL-L-VHSPAOČRZS (contains Č)
      - ✅ CORRECT: CZ-VY-TEL-L-VHSPAOCRZS (ASCII only)

      IMPLEMENTATION:
      ```python
      import unicodedata
      normalized = unicodedata.normalize('NFD', text)
      ascii_text = ''.join(c for c in normalized if unicodedata.category(c) != 'Mn')
      ```

      See: .opencode/ABBREVIATION_SPECIAL_CHAR_RULE.md for complete documentation

      Can be generated by:
      1. ReconstructionActivity (formal entity resolution) - was_generated_by link
      2. Direct extraction (simple standardization) - no was_generated_by link
    exact_mappings:
      - skos:prefLabel
      - schema:name
      - foaf:name
    close_mappings:
      - rdfs:label
      - dcterms:title
      - org:legalName
      - tooi:officieleNaamInclSoort
      - rico:name
    related_mappings:
      - skos:altLabel
      - schema:alternateName
      - foaf:nick
      - gleif:hasOtherName
    slots:
      - emic_name
      - name_language
      - standardized_name
      - alternative_names
      - endorsement_source
      - name_authority
      - valid_from
      - valid_to
      - name_validity_period
      - supersedes
      - superseded_by
      - was_derived_from
      - was_generated_by
      - refers_to_custodian
    slot_usage:
      emic_name:
        slot_uri: skos:prefLabel
        description: |
          The observed name as the custodian refers to itself in source materials,
          preserving the custodian's own naming convention. This is descriptive
          data, not an identifier - the custodian is identified by its hc_id.
        range: string
        required: true
      name_language:
        slot_uri: dcterms:language
        description: |
          The language or locale code (ISO 639-1 or BCP 47) of the emic name.
          Examples: 'nl', 'en', 'pt-BR'
        range: string
        pattern: "^[a-z]{2}(-[A-Z]{2})?$"
      standardized_name:
        slot_uri: skos:prefLabel
        description: "The canonical emic name accepted by custodian itself (REQUIRED)"
        range: string
        required: true
      alternative_names:
        slot_uri: skos:altLabel
        description: |
          Alternative names and label variants for this custodian name.

          SKOS: altLabel for alternative lexical labels.
          W3C Org: Recommended for trading names, colloquial names, abbreviations.

          Examples:
          - "BnF" (abbreviation for "Bibliothèque nationale de France")
          - "Rijks" (colloquial for "Rijksmuseum")
          - "National Library of France" (English translation)
          - Historical spellings and variants

          These are NOT the preferred/canonical name but are recognized variants
          that people use to refer to the same custodian.
        range: CustodianAppellation
        multivalued: true
        inlined_as_list: true
      endorsement_source:
        slot_uri: prov:hadPrimarySource
        description: "Source proving custodian acceptance of this name (REQUIRED)"
        range: uriorcurie
        required: true
      name_authority:
        slot_uri: prov:wasAttributedTo
        description: "Authority that authorized this name"
        range: string
      valid_from:
        slot_uri: schema:validFrom
        description: "Date when this name became official/valid"
        range: date
      valid_to:
        slot_uri: schema:validUntil
        description: "Date when this name ceased to be valid (null if current)"
        range: date
      name_validity_period:
        slot_uri: crm:P4_has_time-span
        description: |
          Temporal period during which this name was valid (with fuzzy boundaries).
          CIDOC-CRM: P4_has_time-span links to E52_Time-Span for uncertain validity periods.

          Use this when name validity dates are uncertain:
          - "Name adopted sometime in the 1920s"
          - "Name changed around 1950"
          - "Name used from approximately 1800 to 1850"

          For precise dates, use valid_from/valid_to instead.
        range: TimeSpan
        examples:
          - value:
              begin_of_the_begin: "1920-01-01"
              end_of_the_begin: "1929-12-31"
              begin_of_the_end: "1945-01-01"
              end_of_the_end: "1955-12-31"
            description: "Name adopted sometime in the 1920s, changed around 1950"
      supersedes:
        slot_uri: dcterms:replaces
        description: "Previous CustodianName replaced by this one"
        range: CustodianName
      superseded_by:
        slot_uri: dcterms:isReplacedBy
        description: "Subsequent CustodianName that replaced this name"
        range: CustodianName
      was_derived_from:
        slot_uri: prov:wasDerivedFrom
        description: |
          CustodianObservation(s) from which this name was derived (REQUIRED).
          PROV-O: wasDerivedFrom establishes observation→name derivation.

          A name can be derived from multiple observations through consolidation:
          - "Rijks" (letterhead) + "Rijksmuseum Amsterdam" (ISIL) → "Rijksmuseum"

          This is NOT inheritance (is_a) but transformation (derived_from).
        range: CustodianObservation
        multivalued: true
        required: true
      was_generated_by:
        slot_uri: prov:wasGeneratedBy
        description: |
          ReconstructionActivity that generated this standardized name (optional).

          If present: Name created through formal entity resolution process
          If null: Name extracted directly without reconstruction activity

          PROV-O: wasGeneratedBy links Entity (CustodianName) to generating Activity.
        range: ReconstructionActivity
        required: false
        inverse: generates
      refers_to_custodian:
        slot_uri: dcterms:references
        description: |
          The Custodian hub that this name identifies (REQUIRED).

          Links the standardized name back to the hub it represents.
          The hub may also link back via skos:prefLabel if this is the preferred name.
        range: Custodian
        required: true