glam/schemas/20251121/linkml/modules/classes/SourceRecord.yaml
kempersc f30b1777f4 Enhance schema definitions and introduce new classes for DigitalPlatformV2
- Added detailed descriptions for slots: collecting_scope, collection_access, custody_history, education_level, membership_size, and publication_activity to improve clarity and usability.
- Removed the publication_date slot due to migration to a new structure.
- Updated slot fixes with migration notes and adjustments for various slots, ensuring alignment with new ontology standards.
- Introduced new classes for DigitalPlatformV2, including DigitalPlatformV2DataQualityNotes, DigitalPlatformV2DataSource, DigitalPlatformV2KeyContact, DigitalPlatformV2OrganizationProfile, DigitalPlatformV2OrganizationStatus, DigitalPlatformV2PrimaryPlatform, DigitalPlatformV2Provenance, DigitalPlatformV2ServiceDetails, and DigitalPlatformV2TransformationMetadata, each with comprehensive attributes and descriptions.
- Added classes for EnrichmentProvenance and EnrichmentProvenanceEntry to track provenance for enrichment sources, including detailed attributes for verification and source tracking.
- Created LogoClaim, LogoEnrichment, and LogoEnrichmentSummary classes to manage logo and favicon data extracted from web scraping, with attributes for claims and summary statistics.
- Archived the publication_date slot to maintain historical records.
2026-01-18 00:59:51 +01:00

119 lines
4.1 KiB
YAML

# SourceRecord - Individual source record with claims
# Extracted from custodian_source.yaml per Rule 38 (modular schema files)
# Extraction date: 2026-01-08
id: https://nde.nl/ontology/hc/classes/SourceRecord
name: SourceRecord
title: SourceRecord
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
prov: http://www.w3.org/ns/prov#
xsd: http://www.w3.org/2001/XMLSchema#
pav: http://purl.org/pav/
dcat: http://www.w3.org/ns/dcat#
imports:
- linkml:types
- ../enums/DataTierEnum
default_range: string
classes:
SourceRecord:
description: >-
Individual source record with claims, representing a data extraction from a specific
source (API, registry, web scrape, etc.). Contains metadata about the source type,
data tier, fetch timestamp, and extracted claims. Used to track provenance of
individual data points.
Ontology mapping rationale:
- class_uri is prov:Entity because this represents a discrete data entity with
provenance (when fetched, from where, by what method)
- close_mappings includes dcat:Distribution as this is similar to a specific
manifestation/representation of data from a source
- related_mappings includes pav:retrievedFrom conceptually (the source was retrieved)
and prov:PrimarySource (the record may be from a primary source)
class_uri: prov:Entity
close_mappings:
- dcat:Distribution
related_mappings:
- prov:PrimarySource
attributes:
source_type:
range: string
description: Type identifier (nde_csv_registry, google_maps_api, etc.)
data_tier:
range: DataTierEnum
description: Quality tier of this source
fetch_timestamp:
range: datetime
description: When data was fetched
has_or_had_api_endpoint:
range: uri
description: API endpoint used
api_endpoint:
range: uri
description: API endpoint used (alias for has_or_had_api_endpoint for backward compatibility)
place_id:
range: string
description: Google Maps place ID
data_url:
range: uri
description: Data source URL
match_method:
range: string
description: Method used for matching
claims_extracted:
range: string
multivalued: true
inlined_as_list: true
description: List of claim fields extracted
entity_id:
range: string
description: Wikidata entity ID (Q-number)
wikidata_id:
range: string
description: Wikidata entity ID (Q-number) - alternative key to entity_id
source_url:
range: uri
description: Source URL for the data
extraction_source:
range: string
multivalued: true
inlined_as_list: true
description: List of extraction source methods (e.g., archiveslab_llm_extraction)
retrieved_at:
range: datetime
description: When data was retrieved (alias for fetch_timestamp)
search_result:
range: string
description: Result of search operation (found, not_found, etc.)
search_queries:
range: string
multivalued: true
inlined_as_list: true
description: Search queries attempted
note:
range: string
description: Additional notes about this source record
source_file:
range: string
description: Source file name
research_date:
range: string
description: Date of research (YYYY-MM-DD format)
url:
range: uri
description: URL of the source (website URL, etc.)
data_extracted:
range: string
multivalued: true
inlined_as_list: true
description: List of data types/fields extracted from this source
merge_note:
range: string
description: Note about merge operations involving this source record