- Added `fix_dual_class_link.py` to remove dual class link references from specified YAML files. - Created `fix_specific_ghosts.py` to apply specific replacements in YAML files based on defined mappings. - Introduced `migrate_staff_count.py` to migrate staff count references to a new structure in specified YAML files. - Developed `migrate_type_slots.py` to replace type-related slots with new identifiers across YAML files. - Implemented `scan_ghost_references.py` to identify and report ghost references to archived slots and classes in YAML files. - Added `verify_ontology_terms.py` to verify the presence of ontology terms in specified ontology files against schema definitions.
161 lines
11 KiB
YAML
161 lines
11 KiB
YAML
id: https://nde.nl/ontology/hc/class/CustodianName
|
|
name: CustodianName
|
|
title: Custodian Name Class
|
|
prefixes:
|
|
linkml: https://w3id.org/linkml/
|
|
hc: https://nde.nl/ontology/hc/
|
|
skos: http://www.w3.org/2004/02/skos/core#
|
|
schema: http://schema.org/
|
|
foaf: http://xmlns.com/foaf/0.1/
|
|
rdfs: http://www.w3.org/2000/01/rdf-schema#
|
|
dcterms: http://purl.org/dc/terms/
|
|
org: http://www.w3.org/ns/org#
|
|
tooi: https://identifier.overheid.nl/tooi/def/ont/
|
|
rico: https://www.ica.org/standards/RiC/ontology#
|
|
gleif: https://www.gleif.org/ontology/Base/
|
|
crm: http://www.cidoc-crm.org/cidoc-crm/
|
|
prov: http://www.w3.org/ns/prov#
|
|
imports:
|
|
- linkml:types
|
|
- ./Custodian
|
|
- ./CustodianObservation
|
|
- ./ReconstructionActivity
|
|
- ./TimeSpan
|
|
- ./ReconstructedEntity
|
|
- ../slots/has_or_had_label
|
|
- ../classes/Label
|
|
- ../classes/LabelType
|
|
- ../classes/LabelTypes
|
|
- ../slots/name_language
|
|
- ../slots/standardized_name
|
|
- ../slots/endorsement_source
|
|
- ../slots/name_authority
|
|
- ../slots/temporal_extent
|
|
- ../slots/name_validity_period
|
|
- ../slots/supersede_name
|
|
- ../slots/is_or_was_derived_from
|
|
- ../slots/is_or_was_generated_by
|
|
- ../slots/refers_to_custodian
|
|
- ../slots/specificity_annotation
|
|
- ../slots/has_or_had_score
|
|
- ./SpecificityAnnotation
|
|
- ./TemplateSpecificityScore
|
|
- ./TemplateSpecificityType
|
|
- ./TemplateSpecificityTypes
|
|
classes:
|
|
CustodianName:
|
|
is_a: ReconstructedEntity
|
|
class_uri: skos:Concept
|
|
description: "Standardized emic (insider) name DERIVED FROM CustodianObservation(s).\n\nCRITICAL: CustodianName is NOT a subclass of CustodianObservation!\n- CustodianObservation = Evidence seen in sources (input)\n- CustodianName = Standardized interpretation (output)\n- Relationship: CustodianName prov:wasDerivedFrom CustodianObservation\n\nCustodianName represents the CANONICAL LABEL - the standardized form\naccepted by the custodian itself for public identification.\n\nIMPORTANT: CustodianName \u2260 Legal Name\n- CustodianName = How custodian presents itself (emic, operational)\n- Legal Name = Formal registered name (in CustodianLegalStatus)\n- Example: \"Rijksmuseum\" (emic) vs \"Stichting Rijksmuseum\" (legal)\n\n===========================================================================\nMANDATORY RULE: Legal Form Terms MUST Be Filtered\n===========================================================================\n\nLegal form designations (Stichting, Foundation, Inc., Ltd., GmbH,\
|
|
\ etc.)\nMUST ALWAYS be removed from CustodianName, even when the custodian\nself-identifies with them. This is the ONE EXCEPTION to the emic principle.\n\nRATIONALE:\n1. Legal form is METADATA about the entity, not part of its identity\n2. Legal forms change (foundation\u2192corporation) but identity persists\n3. Enables consistent cross-jurisdictional comparison\n4. Prevents duplicate entries (\"X Foundation\" vs \"X\")\n5. Aligns with ISO 20275 (Legal Entity Identifier) principles\n\nEXAMPLES:\n- \"Stichting Rijksmuseum\" \u2192 CustodianName: \"Rijksmuseum\"\n- \"Hidde Nijland Stichting\" \u2192 CustodianName: \"Hidde Nijland\"\n- \"The Getty Foundation\" \u2192 CustodianName: \"The Getty\"\n- \"British Museum Trust Ltd\" \u2192 CustodianName: \"British Museum\"\n- \"Funda\xE7\xE3o Biblioteca Nacional\" \u2192 CustodianName: \"Biblioteca Nacional\"\n\nLEGAL FORM TERMS TO FILTER (partial list by jurisdiction):\n- Dutch: Stichting, Vereniging, Co\xF6peratie, B.V., N.V., V.O.F.\n\
|
|
- English: Foundation, Trust, Inc., Ltd., LLC, Corp., Association\n- German: Stiftung, Verein, e.V., GmbH, AG\n- French: Fondation, Association, S.A., S.A.R.L.\n- Spanish: Fundaci\xF3n, Asociaci\xF3n, S.A., S.L.\n- Portuguese: Funda\xE7\xE3o, Associa\xE7\xE3o, Ltda., S.A.\n- Italian: Fondazione, Associazione, S.p.A., S.r.l.\n\nSee: rules/LEGAL_FORM_FILTERING_RULE.md for comprehensive global list\n\n===========================================================================\nMANDATORY RULE: Special Characters MUST Be Excluded from Abbreviations\n===========================================================================\n\nWhen generating abbreviations for GHCID, special characters and symbols\nMUST be completely removed. Only alphabetic characters (A-Z) are permitted\nin the has_or_had_abbreviation component of the GHCID.\n\nRATIONALE:\n1. URL/URI safety - Special characters require encoding in URIs\n2. Filename safety - Characters like &, /, \\, : are invalid in filenames\n3. Parsing\
|
|
\ consistency - Avoids delimiter conflicts in data pipelines\n4. Cross-system compatibility - Ensures interoperability with all systems\n5. Human readability - Clean identifiers are easier to communicate\n\nCHARACTERS TO REMOVE (exhaustive list):\n- Ampersand: & (e.g., \"Records & Archives\" \u2192 \"RA\", not \"R&A\")\n- Slash: / (e.g., \"Art/Design Museum\" \u2192 \"ADM\", not \"A/DM\")\n- Backslash: \\\n- Plus: + (e.g., \"Culture+\" \u2192 \"C\")\n- At sign: @\n- Hash/Pound: #\n- Percent: %\n- Dollar: $\n- Asterisk: *\n- Parentheses: ( )\n- Brackets: [ ] { }\n- Pipe: |\n- Colon: :\n- Semicolon: ;\n- Quotation marks: \" ' ` \n- Comma: ,\n- Period: . (unless part of has_or_had_abbreviation like \"U.S.\" \u2192 \"US\")\n- Hyphen: - (skip, do not replace with letter)\n- Underscore: _\n- Equals: =\n- Question mark: ?\n- Exclamation: !\n- Tilde: ~\n- Caret: ^\n- Less/Greater than: < >\n\nEXAMPLES:\n- \"Department of Records & Information Management\" \u2192 \"DRIM\" (not \"DR&IM\")\n\
|
|
- \"Art + Culture Center\" \u2192 \"ACC\" (not \"A+CC\")\n- \"Museum/Gallery Amsterdam\" \u2192 \"MGA\" (not \"M/GA\")\n- \"Heritage@Digital\" \u2192 \"HD\" (not \"H@D\")\n- \"Archives (Historical)\" \u2192 \"AH\" (not \"A(H)\")\n\nSee: rules/ABBREVIATION_SPECIAL_CHAR_RULE.md for complete documentation\n\n===========================================================================\nMANDATORY RULE: Diacritics MUST Be Normalized to ASCII in Abbreviations\n===========================================================================\n\nWhen generating abbreviations for GHCID, diacritics (accented characters)\nMUST be normalized to their ASCII base letter equivalents. Only ASCII\nuppercase letters (A-Z) are permitted in the has_or_had_abbreviation component.\n\nRATIONALE:\n1. URI/URL safety - Non-ASCII requires percent-encoding\n2. Cross-system compatibility - ASCII is universally supported\n3. Parsing consistency - No special character handling needed\n4. Human readability - Easier to type\
|
|
\ and communicate\n\nDIACRITICS TO NORMALIZE (examples by language):\n- Czech: \u010C\u2192C, \u0158\u2192R, \u0160\u2192S, \u017D\u2192Z, \u011A\u2192E, \u016E\u2192U\n- Polish: \u0141\u2192L, \u0143\u2192N, \xD3\u2192O, \u015A\u2192S, \u0179\u2192Z, \u017B\u2192Z, \u0104\u2192A, \u0118\u2192E\n- German: \xC4\u2192A, \xD6\u2192O, \xDC\u2192U, \xDF\u2192SS\n- French: \xC9\u2192E, \xC8\u2192E, \xCA\u2192E, \xC7\u2192C, \xD4\u2192O\n- Spanish: \xD1\u2192N, \xC1\u2192A, \xC9\u2192E, \xCD\u2192I, \xD3\u2192O, \xDA\u2192U\n- Nordic: \xC5\u2192A, \xC4\u2192A, \xD6\u2192O, \xD8\u2192O, \xC6\u2192AE\n\nEXAMPLES:\n- \"Vlastiv\u011Bdn\xE9 muzeum\" (Czech) \u2192 \"VM\" (not \"VM\" with h\xE1\u010Dek)\n- \"\xD6sterreichische Nationalbibliothek\" (German) \u2192 \"ON\"\n- \"Biblioth\xE8que nationale\" (French) \u2192 \"BN\"\n\nREAL-WORLD EXAMPLE:\n- \u274C WRONG: CZ-VY-TEL-L-VHSPAO\u010CRZS (contains \u010C)\n- \u2705 CORRECT: CZ-VY-TEL-L-VHSPAOCRZS (ASCII only)\n\nIMPLEMENTATION:\n```python\n\
|
|
import unicodedata\nnormalized = unicodedata.normalize('NFD', text)\nascii_text = ''.join(c for c in normalized if unicodedata.category(c) != 'Mn')\n```\n\nSee: rules/ABBREVIATION_SPECIAL_CHAR_RULE.md for complete documentation\n\nCan be generated by:\n1. ReconstructionActivity (formal entity resolution) - was_generated_by link\n2. Direct extraction (simple standardization) - no was_generated_by link\n"
|
|
exact_mappings:
|
|
- skos:prefLabel
|
|
- schema:name
|
|
- foaf:name
|
|
close_mappings:
|
|
- rdfs:label
|
|
- dcterms:title
|
|
- org:legalName
|
|
- tooi:officieleNaamInclSoort
|
|
- rico:name
|
|
related_mappings:
|
|
- skos:altLabel
|
|
- schema:alternateName
|
|
- foaf:nick
|
|
- gleif:hasOtherName
|
|
slots:
|
|
- endorsement_source
|
|
- has_or_had_label
|
|
- name_authority
|
|
- name_language
|
|
- name_validity_period
|
|
- refers_to_custodian
|
|
- specificity_annotation
|
|
- standardized_name
|
|
- supersede_name
|
|
- superseded_by_name
|
|
- has_or_had_score
|
|
- temporal_extent
|
|
- is_or_was_derived_from
|
|
- is_or_was_generated_by
|
|
slot_usage:
|
|
has_or_had_label:
|
|
range: Label
|
|
inlined: true
|
|
multivalued: true
|
|
description: 'The name(s) of the custodian.
|
|
|
|
Includes:
|
|
|
|
- Emic name (has_or_had_type: EmicLabel) - MIGRATED from emic_name
|
|
|
|
- Alternative names (has_or_had_type: AlternativeName) - MIGRATED from has_or_had_alternative_name
|
|
|
|
'
|
|
examples:
|
|
- value:
|
|
has_or_had_label: Rijksmuseum
|
|
has_or_had_type: EmicLabel
|
|
description: Standardized emic name
|
|
- value:
|
|
has_or_had_label: State Museum
|
|
has_or_had_type: AlternativeName
|
|
description: Alternative name
|
|
name_language:
|
|
range: string
|
|
pattern: ^[a-z]{2}(-[A-Z]{2})?$
|
|
standardized_name:
|
|
range: string
|
|
required: true
|
|
endorsement_source:
|
|
range: uriorcurie
|
|
required: true
|
|
name_authority:
|
|
range: string
|
|
temporal_extent:
|
|
description: 'Name validity period using CIDOC-CRM TimeSpan.
|
|
|
|
MIGRATED from valid_from + valid_to per slot_fixes.yaml (Rule 53).
|
|
|
|
NOTE: name_validity_period slot is ALSO available for backward compatibility,
|
|
|
|
but new code should use temporal_extent for consistency with other classes.
|
|
|
|
'
|
|
range: TimeSpan
|
|
inlined: true
|
|
required: false
|
|
examples:
|
|
- value:
|
|
begin_of_the_begin: '1920-01-01'
|
|
end_of_the_end: '1950-12-31'
|
|
description: Name valid from 1920 to 1950
|
|
name_validity_period:
|
|
range: TimeSpan
|
|
examples:
|
|
- value:
|
|
begin_of_the_begin: '1920-01-01'
|
|
end_of_the_begin: '1929-12-31'
|
|
begin_of_the_end: '1945-01-01'
|
|
end_of_the_end: '1955-12-31'
|
|
description: Name adopted sometime in the 1920s, changed around 1950
|
|
is_or_was_derived_from:
|
|
range: CustodianObservation
|
|
multivalued: true
|
|
required: true
|
|
is_or_was_generated_by:
|
|
range: ReconstructionActivity
|
|
required: false
|
|
refers_to_custodian:
|
|
range: Custodian
|
|
required: true
|
|
annotations:
|
|
specificity_score: 0.1
|
|
specificity_rationale: Generic utility class/slot created during migration
|
|
custodian_types: "['*']"
|
|
custodian_types_rationale: Universal utility concept
|