- Add emic_name, name_language, standardized_name to CustodianName - Add scripts for enriching custodian emic names from Wikidata - Add YouTube and Google Maps enrichment scripts - Update DuckLake loader for new schema fields
308 lines
12 KiB
YAML
308 lines
12 KiB
YAML
id: https://nde.nl/ontology/hc/class/CustodianName
|
|
name: CustodianName
|
|
title: Custodian Name Class
|
|
|
|
imports:
|
|
- linkml:types
|
|
- ./Custodian
|
|
- ./CustodianObservation
|
|
- ./ReconstructionActivity
|
|
- ./TimeSpan
|
|
- ./ReconstructedEntity
|
|
|
|
classes:
|
|
CustodianName:
|
|
is_a: ReconstructedEntity
|
|
class_uri: skos:Concept
|
|
description: |
|
|
Standardized emic (insider) name DERIVED FROM CustodianObservation(s).
|
|
|
|
CRITICAL: CustodianName is NOT a subclass of CustodianObservation!
|
|
- CustodianObservation = Evidence seen in sources (input)
|
|
- CustodianName = Standardized interpretation (output)
|
|
- Relationship: CustodianName prov:wasDerivedFrom CustodianObservation
|
|
|
|
CustodianName represents the CANONICAL LABEL - the standardized form
|
|
accepted by the custodian itself for public identification.
|
|
|
|
IMPORTANT: CustodianName ≠ Legal Name
|
|
- CustodianName = How custodian presents itself (emic, operational)
|
|
- Legal Name = Formal registered name (in CustodianLegalStatus)
|
|
- Example: "Rijksmuseum" (emic) vs "Stichting Rijksmuseum" (legal)
|
|
|
|
===========================================================================
|
|
MANDATORY RULE: Legal Form Terms MUST Be Filtered
|
|
===========================================================================
|
|
|
|
Legal form designations (Stichting, Foundation, Inc., Ltd., GmbH, etc.)
|
|
MUST ALWAYS be removed from CustodianName, even when the custodian
|
|
self-identifies with them. This is the ONE EXCEPTION to the emic principle.
|
|
|
|
RATIONALE:
|
|
1. Legal form is METADATA about the entity, not part of its identity
|
|
2. Legal forms change (foundation→corporation) but identity persists
|
|
3. Enables consistent cross-jurisdictional comparison
|
|
4. Prevents duplicate entries ("X Foundation" vs "X")
|
|
5. Aligns with ISO 20275 (Legal Entity Identifier) principles
|
|
|
|
EXAMPLES:
|
|
- "Stichting Rijksmuseum" → CustodianName: "Rijksmuseum"
|
|
- "Hidde Nijland Stichting" → CustodianName: "Hidde Nijland"
|
|
- "The Getty Foundation" → CustodianName: "The Getty"
|
|
- "British Museum Trust Ltd" → CustodianName: "British Museum"
|
|
- "Fundação Biblioteca Nacional" → CustodianName: "Biblioteca Nacional"
|
|
|
|
LEGAL FORM TERMS TO FILTER (partial list by jurisdiction):
|
|
- Dutch: Stichting, Vereniging, Coöperatie, B.V., N.V., V.O.F.
|
|
- English: Foundation, Trust, Inc., Ltd., LLC, Corp., Association
|
|
- German: Stiftung, Verein, e.V., GmbH, AG
|
|
- French: Fondation, Association, S.A., S.A.R.L.
|
|
- Spanish: Fundación, Asociación, S.A., S.L.
|
|
- Portuguese: Fundação, Associação, Ltda., S.A.
|
|
- Italian: Fondazione, Associazione, S.p.A., S.r.l.
|
|
|
|
See: .opencode/LEGAL_FORM_FILTERING_RULE.md for comprehensive global list
|
|
|
|
===========================================================================
|
|
MANDATORY RULE: Special Characters MUST Be Excluded from Abbreviations
|
|
===========================================================================
|
|
|
|
When generating abbreviations for GHCID, special characters and symbols
|
|
MUST be completely removed. Only alphabetic characters (A-Z) are permitted
|
|
in the abbreviation component of the GHCID.
|
|
|
|
RATIONALE:
|
|
1. URL/URI safety - Special characters require encoding in URIs
|
|
2. Filename safety - Characters like &, /, \, : are invalid in filenames
|
|
3. Parsing consistency - Avoids delimiter conflicts in data pipelines
|
|
4. Cross-system compatibility - Ensures interoperability with all systems
|
|
5. Human readability - Clean identifiers are easier to communicate
|
|
|
|
CHARACTERS TO REMOVE (exhaustive list):
|
|
- Ampersand: & (e.g., "Records & Archives" → "RA", not "R&A")
|
|
- Slash: / (e.g., "Art/Design Museum" → "ADM", not "A/DM")
|
|
- Backslash: \
|
|
- Plus: + (e.g., "Culture+" → "C")
|
|
- At sign: @
|
|
- Hash/Pound: #
|
|
- Percent: %
|
|
- Dollar: $
|
|
- Asterisk: *
|
|
- Parentheses: ( )
|
|
- Brackets: [ ] { }
|
|
- Pipe: |
|
|
- Colon: :
|
|
- Semicolon: ;
|
|
- Quotation marks: " ' `
|
|
- Comma: ,
|
|
- Period: . (unless part of abbreviation like "U.S." → "US")
|
|
- Hyphen: - (skip, do not replace with letter)
|
|
- Underscore: _
|
|
- Equals: =
|
|
- Question mark: ?
|
|
- Exclamation: !
|
|
- Tilde: ~
|
|
- Caret: ^
|
|
- Less/Greater than: < >
|
|
|
|
EXAMPLES:
|
|
- "Department of Records & Information Management" → "DRIM" (not "DR&IM")
|
|
- "Art + Culture Center" → "ACC" (not "A+CC")
|
|
- "Museum/Gallery Amsterdam" → "MGA" (not "M/GA")
|
|
- "Heritage@Digital" → "HD" (not "H@D")
|
|
- "Archives (Historical)" → "AH" (not "A(H)")
|
|
|
|
See: .opencode/ABBREVIATION_SPECIAL_CHAR_RULE.md for complete documentation
|
|
|
|
===========================================================================
|
|
MANDATORY RULE: Diacritics MUST Be Normalized to ASCII in Abbreviations
|
|
===========================================================================
|
|
|
|
When generating abbreviations for GHCID, diacritics (accented characters)
|
|
MUST be normalized to their ASCII base letter equivalents. Only ASCII
|
|
uppercase letters (A-Z) are permitted in the abbreviation component.
|
|
|
|
RATIONALE:
|
|
1. URI/URL safety - Non-ASCII requires percent-encoding
|
|
2. Cross-system compatibility - ASCII is universally supported
|
|
3. Parsing consistency - No special character handling needed
|
|
4. Human readability - Easier to type and communicate
|
|
|
|
DIACRITICS TO NORMALIZE (examples by language):
|
|
- Czech: Č→C, Ř→R, Š→S, Ž→Z, Ě→E, Ů→U
|
|
- Polish: Ł→L, Ń→N, Ó→O, Ś→S, Ź→Z, Ż→Z, Ą→A, Ę→E
|
|
- German: Ä→A, Ö→O, Ü→U, ß→SS
|
|
- French: É→E, È→E, Ê→E, Ç→C, Ô→O
|
|
- Spanish: Ñ→N, Á→A, É→E, Í→I, Ó→O, Ú→U
|
|
- Nordic: Å→A, Ä→A, Ö→O, Ø→O, Æ→AE
|
|
|
|
EXAMPLES:
|
|
- "Vlastivědné muzeum" (Czech) → "VM" (not "VM" with háček)
|
|
- "Österreichische Nationalbibliothek" (German) → "ON"
|
|
- "Bibliothèque nationale" (French) → "BN"
|
|
|
|
REAL-WORLD EXAMPLE:
|
|
- ❌ WRONG: CZ-VY-TEL-L-VHSPAOČRZS (contains Č)
|
|
- ✅ CORRECT: CZ-VY-TEL-L-VHSPAOCRZS (ASCII only)
|
|
|
|
IMPLEMENTATION:
|
|
```python
|
|
import unicodedata
|
|
normalized = unicodedata.normalize('NFD', text)
|
|
ascii_text = ''.join(c for c in normalized if unicodedata.category(c) != 'Mn')
|
|
```
|
|
|
|
See: .opencode/ABBREVIATION_SPECIAL_CHAR_RULE.md for complete documentation
|
|
|
|
Can be generated by:
|
|
1. ReconstructionActivity (formal entity resolution) - was_generated_by link
|
|
2. Direct extraction (simple standardization) - no was_generated_by link
|
|
exact_mappings:
|
|
- skos:prefLabel
|
|
- schema:name
|
|
- foaf:name
|
|
close_mappings:
|
|
- rdfs:label
|
|
- dcterms:title
|
|
- org:legalName
|
|
- tooi:officieleNaamInclSoort
|
|
- rico:name
|
|
related_mappings:
|
|
- skos:altLabel
|
|
- schema:alternateName
|
|
- foaf:nick
|
|
- gleif:hasOtherName
|
|
slots:
|
|
- emic_name
|
|
- name_language
|
|
- standardized_name
|
|
- alternative_names
|
|
- endorsement_source
|
|
- name_authority
|
|
- valid_from
|
|
- valid_to
|
|
- name_validity_period
|
|
- supersedes
|
|
- superseded_by
|
|
- was_derived_from
|
|
- was_generated_by
|
|
- refers_to_custodian
|
|
slot_usage:
|
|
emic_name:
|
|
slot_uri: skos:prefLabel
|
|
description: |
|
|
The observed name as the custodian refers to itself in source materials,
|
|
preserving the custodian's own naming convention. This is descriptive
|
|
data, not an identifier - the custodian is identified by its hc_id.
|
|
range: string
|
|
required: true
|
|
name_language:
|
|
slot_uri: dcterms:language
|
|
description: |
|
|
The language or locale code (ISO 639-1 or BCP 47) of the emic name.
|
|
Examples: 'nl', 'en', 'pt-BR'
|
|
range: string
|
|
pattern: "^[a-z]{2}(-[A-Z]{2})?$"
|
|
standardized_name:
|
|
slot_uri: skos:prefLabel
|
|
description: "The canonical emic name accepted by custodian itself (REQUIRED)"
|
|
range: string
|
|
required: true
|
|
alternative_names:
|
|
slot_uri: skos:altLabel
|
|
description: |
|
|
Alternative names and label variants for this custodian name.
|
|
|
|
SKOS: altLabel for alternative lexical labels.
|
|
W3C Org: Recommended for trading names, colloquial names, abbreviations.
|
|
|
|
Examples:
|
|
- "BnF" (abbreviation for "Bibliothèque nationale de France")
|
|
- "Rijks" (colloquial for "Rijksmuseum")
|
|
- "National Library of France" (English translation)
|
|
- Historical spellings and variants
|
|
|
|
These are NOT the preferred/canonical name but are recognized variants
|
|
that people use to refer to the same custodian.
|
|
range: CustodianAppellation
|
|
multivalued: true
|
|
inlined_as_list: true
|
|
endorsement_source:
|
|
slot_uri: prov:hadPrimarySource
|
|
description: "Source proving custodian acceptance of this name (REQUIRED)"
|
|
range: uriorcurie
|
|
required: true
|
|
name_authority:
|
|
slot_uri: prov:wasAttributedTo
|
|
description: "Authority that authorized this name"
|
|
range: string
|
|
valid_from:
|
|
slot_uri: schema:validFrom
|
|
description: "Date when this name became official/valid"
|
|
range: date
|
|
valid_to:
|
|
slot_uri: schema:validUntil
|
|
description: "Date when this name ceased to be valid (null if current)"
|
|
range: date
|
|
name_validity_period:
|
|
slot_uri: crm:P4_has_time-span
|
|
description: |
|
|
Temporal period during which this name was valid (with fuzzy boundaries).
|
|
CIDOC-CRM: P4_has_time-span links to E52_Time-Span for uncertain validity periods.
|
|
|
|
Use this when name validity dates are uncertain:
|
|
- "Name adopted sometime in the 1920s"
|
|
- "Name changed around 1950"
|
|
- "Name used from approximately 1800 to 1850"
|
|
|
|
For precise dates, use valid_from/valid_to instead.
|
|
range: TimeSpan
|
|
examples:
|
|
- value:
|
|
begin_of_the_begin: "1920-01-01"
|
|
end_of_the_begin: "1929-12-31"
|
|
begin_of_the_end: "1945-01-01"
|
|
end_of_the_end: "1955-12-31"
|
|
description: "Name adopted sometime in the 1920s, changed around 1950"
|
|
supersedes:
|
|
slot_uri: dcterms:replaces
|
|
description: "Previous CustodianName replaced by this one"
|
|
range: CustodianName
|
|
superseded_by:
|
|
slot_uri: dcterms:isReplacedBy
|
|
description: "Subsequent CustodianName that replaced this name"
|
|
range: CustodianName
|
|
was_derived_from:
|
|
slot_uri: prov:wasDerivedFrom
|
|
description: |
|
|
CustodianObservation(s) from which this name was derived (REQUIRED).
|
|
PROV-O: wasDerivedFrom establishes observation→name derivation.
|
|
|
|
A name can be derived from multiple observations through consolidation:
|
|
- "Rijks" (letterhead) + "Rijksmuseum Amsterdam" (ISIL) → "Rijksmuseum"
|
|
|
|
This is NOT inheritance (is_a) but transformation (derived_from).
|
|
range: CustodianObservation
|
|
multivalued: true
|
|
required: true
|
|
was_generated_by:
|
|
slot_uri: prov:wasGeneratedBy
|
|
description: |
|
|
ReconstructionActivity that generated this standardized name (optional).
|
|
|
|
If present: Name created through formal entity resolution process
|
|
If null: Name extracted directly without reconstruction activity
|
|
|
|
PROV-O: wasGeneratedBy links Entity (CustodianName) to generating Activity.
|
|
range: ReconstructionActivity
|
|
required: false
|
|
inverse: generates
|
|
refers_to_custodian:
|
|
slot_uri: dcterms:references
|
|
description: |
|
|
The Custodian hub that this name identifies (REQUIRED).
|
|
|
|
Links the standardized name back to the hub it represents.
|
|
The hub may also link back via skos:prefLabel if this is the preferred name.
|
|
range: Custodian
|
|
required: true
|