id: https://w3id.org/heritage/custodian/provenance name: heritage-custodian-provenance title: Heritage Custodian Provenance Module description: >- Provenance tracking and organizational change event classes for heritage custodians. Implements W3C PROV-O patterns for data quality tracking and TOOI Wijzigingsgebeurtenis patterns for institutional lifecycle events. Tracks GHCID history and data lineage. license: https://creativecommons.org/publicdomain/zero/1.0/ version: 0.2.2 prefixes: linkml: https://w3id.org/linkml/ heritage: https://w3id.org/heritage/custodian/ prov: http://www.w3.org/ns/prov# tooi: https://identifier.overheid.nl/tooi/def/ont/ dcterms: http://purl.org/dc/terms/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# foaf: http://xmlns.com/foaf/0.1/ adms: http://www.w3.org/ns/adms# default_prefix: heritage default_range: string imports: - linkml:types - enums - core # ============================================================================= # PROVENANCE CLASSES # ============================================================================= classes: # PROV-O Proxy Classes # These define local equivalents of W3C PROV-O classes to avoid external dependencies # while maintaining RDF alignment via class_uri mappings ProvenanceEntity: description: >- Local proxy for prov:Entity. Represents a physical, digital, conceptual, or other kind of thing with fixed aspects. Used as mixin for classes that need PROV-O alignment (e.g., HeritageCustodian). class_uri: prov:Entity abstract: true notes: >- This is a mixin class providing PROV-O alignment. Do not instantiate directly. In RDF serialization, classes using this mixin will be typed as prov:Entity. ProvenanceActivity: description: >- Local proxy for prov:Activity. Represents something that occurs over a period of time and acts upon or with entities. Used as base for ChangeEvent. class_uri: prov:Activity abstract: true notes: >- This is an abstract base class providing PROV-O alignment. In RDF serialization, subclasses will be typed as prov:Activity. # TOOI Ontology Proxy Classes # Local proxy for Dutch government organizational ontology (TOOI) TOOIWijzigingsgebeurtenis: description: >- Local proxy for tooi:Wijzigingsgebeurtenis (Change Event in TOOI ontology). Represents organizational change events in Dutch government organizations. Used as mixin for ChangeEvent to align with TOOI patterns. class_uri: tooi:Wijzigingsgebeurtenis abstract: true notes: >- This is a mixin class providing TOOI ontology alignment for Dutch institutions. In RDF serialization, ChangeEvent will also be typed as tooi:Wijzigingsgebeurtenis. Provenance: description: Provenance information for data quality tracking slots: - data_source - data_tier - extraction_date - extraction_method - confidence_score - conversation_id - source_url - verified_date - verified_by - enrichment_history - provenance_notes EnrichmentHistoryEntry: description: >- Record of a data enrichment activity (e.g., Wikidata lookup, geocoding, identifier resolution). Tracks what was enriched, when, how, and with what quality metrics. slots: - enrichment_date - enrichment_method - enrichment_type - match_score - verified - enrichment_source - enrichment_notes slot_usage: enrichment_date: required: true description: When the enrichment was performed (ISO 8601 timestamp) enrichment_method: required: true description: >- Method used for enrichment (e.g., "Wikidata SPARQL fuzzy matching", "Nominatim geocoding API", "Manual identifier lookup") enrichment_type: required: true description: >- Type of enrichment performed (e.g., "wikidata_identifier", "geocoding", "viaf_identifier", "isil_code") match_score: required: false description: >- Fuzzy matching confidence score (0.0-1.0) for automated enrichments. Null for manual enrichments or non-matching operations. verified: required: true description: >- Whether the enrichment has been manually verified (true) or is automated/unverified (false) enrichment_source: required: false description: >- Source of the enrichment data (e.g., "https://www.wikidata.org", "https://nominatim.openstreetmap.org", "VIAF API") enrichment_notes: required: false description: >- Additional notes about the enrichment (e.g., "City verification passed", "False positive removed", "Matched alternative name") GHCIDHistoryEntry: description: >- Historical record tracking GHCID changes over time. Each entry represents a period when a specific GHCID was valid for this institution. slots: - ghcid - ghcid_numeric - valid_from - valid_to - reason - institution_name - location_city - location_country slot_usage: ghcid: required: true description: The GHCID string that was valid during this period ghcid_numeric: required: true description: Numeric hash of the GHCID (persistent identifier) valid_from: required: true description: When this GHCID became valid (ISO 8601 timestamp) valid_to: required: false description: When this GHCID was superseded (null = still current) reason: required: true description: Reason for this identifier or change (e.g., "Initial identifier", "Relocated to Amsterdam", "Name change") institution_name: required: true description: Institution name during this period location_city: required: true description: City location during this period location_country: required: true description: Country location during this period ChangeEvent: description: >- A significant organizational change event in an institution's lifecycle. Based on TOOI Wijzigingsgebeurtenis and W3C PROV-O Activity patterns. Tracks mergers, name changes, relocations, and other structural changes. is_a: ProvenanceActivity mixins: - TOOIWijzigingsgebeurtenis slots: - event_id - change_type - event_date - event_description - affected_organization - resulting_organization - related_organizations - source_documentation slot_usage: event_id: required: true identifier: true change_type: required: true event_date: required: true # ============================================================================= # SLOTS # ============================================================================= slots: # Provenance fields data_source: description: Source of this data record range: DataSourceEnum required: true data_tier: description: Data quality tier (authority level) range: DataTierEnum required: true extraction_date: description: Date the data was extracted or created range: datetime required: true extraction_method: description: Method used to extract data (NLP model, manual, API, etc.) range: string confidence_score: description: Confidence score for NLP-extracted data (0.0-1.0) range: float minimum_value: 0.0 maximum_value: 1.0 conversation_id: description: UUID of conversation if extracted from Claude conversation range: string source_url: description: URL of the source range: uri verified_date: description: Date the data was verified range: datetime verified_by: description: Person or system that verified the data range: string provenance_notes: description: >- Additional notes about data quality, extraction issues, conflicts resolved, or other provenance metadata. Use for documenting uncertainties, data conflicts, manual corrections, or contextual information about the data source. range: string required: false slot_uri: rdfs:comment aliases: - notes comments: - "Uses RDF Schema comment property for annotation alignment" - "For observations about data reliability, use confidence_score instead" - "For enrichment details, use enrichment_history instead" - "Accepts 'notes' as alias for backward compatibility with extracted data" # GHCID history fields ghcid: description: | Human-readable GHCID string. Format: {Country}-{Region}-{City}-{Type}-{Abbreviation}[-{name_suffix}] The optional name_suffix is used for collision resolution and consists of the institution's full official name in native language, converted to snake_case format (lowercase, underscores for spaces, no diacritics/punctuation). Examples: - NL-NH-AMS-M-RM (Rijksmuseum, no collision) - NL-NH-AMS-M-SM-stedelijk_museum_amsterdam (Stedelijk Museum, with collision suffix) - FR-IDF-PAR-M-MO-musee_dorsay (Musée d'Orsay, with collision suffix) range: string pattern: '^[A-Z]{2}-[A-Z0-9]{1,3}-[A-Z]{3}-[A-Z]-[A-Z0-9]{1,10}(-[a-z0-9_]+)?$' valid_from: description: Timestamp when this record/identifier became valid range: datetime valid_to: description: Timestamp when this record/identifier was superseded (null = still current) range: datetime required: false reason: description: Reason for identifier assignment or change range: string institution_name: description: Institution name at a specific point in time range: string location_city: description: City location at a specific point in time range: string location_country: description: Country location at a specific point in time range: string # ChangeEvent slots event_id: description: Unique identifier for this change event range: uriorcurie identifier: true slot_uri: dcterms:identifier change_type: description: Type of organizational change range: ChangeTypeEnum required: true slot_uri: rdf:type event_date: description: Date when the change event occurred range: date required: true slot_uri: prov:atTime event_description: description: Textual description of the change event range: string slot_uri: dcterms:description affected_organization: description: >- The organization that was affected by this change event. For mergers/acquisitions, this is the organization being absorbed. range: HeritageCustodian slot_uri: prov:entity resulting_organization: description: >- The organization resulting from this change event. For mergers, this is the surviving/new organization. range: HeritageCustodian slot_uri: prov:generated related_organizations: description: >- Other organizations involved in this change event. For mergers, these are the other merging parties. range: HeritageCustodian multivalued: true slot_uri: prov:wasAssociatedWith source_documentation: description: URL or reference to documentation of this change event range: uri slot_uri: dcterms:source # EnrichmentHistoryEntry slots enrichment_history: description: >- Chronological log of data enrichment activities performed on this record (e.g., Wikidata lookups, geocoding, identifier resolution) range: EnrichmentHistoryEntry multivalued: true inlined_as_list: true enrichment_date: description: Timestamp when the enrichment was performed range: datetime required: true slot_uri: prov:atTime comments: - Maps to PROV-O timestamp for enrichment activity enrichment_method: description: >- Method used for enrichment (e.g., "Wikidata SPARQL fuzzy matching", "Nominatim geocoding API", "Manual identifier lookup") range: string required: true slot_uri: prov:hadPlan comments: - Maps to PROV-O plan/method used for enrichment activity enrichment_type: description: >- Type of enrichment performed. Common values: "wikidata_identifier", "geocoding", "viaf_identifier", "isil_code", "ghcid_generation", "false_positive_removal" range: EnrichmentTypeEnum required: true slot_uri: rdf:type comments: - Controlled vocabulary for enrichment activity types match_score: description: >- Fuzzy matching confidence score (0.0-1.0) for automated enrichments. Null for manual enrichments or non-matching operations. range: float minimum_value: 0.0 maximum_value: 1.0 required: false slot_uri: adms:confidence comments: - Maps to ADMS (Asset Description Metadata Schema) confidence score verified: description: >- Whether the enrichment has been manually verified (true) or is automated/unverified (false) range: boolean required: true slot_uri: adms:status comments: - Maps to ADMS verification status enrichment_source: description: >- Source of the enrichment data (e.g., "https://www.wikidata.org", "https://nominatim.openstreetmap.org", "VIAF API") range: uri required: false slot_uri: dcterms:source comments: - Uses Dublin Core Terms source property enrichment_notes: description: >- Additional notes about the enrichment (e.g., "City verification passed", "False positive removed", "Matched alternative name") range: string required: false slot_uri: dcterms:description comments: - Human-readable description of enrichment details