- Removed obsolete slots: `has_or_had_custodian_observation`, `provider`, and `specificity_annotation`. - Updated `has_or_had_score` slot to use `SpecificityScore` class and modified its description and examples. - Added new slots: `end_seconds`, `end_time`, `has_archive_path`, `has_or_had_custodian_name`, `protocol_name`, and `protocol_version`. - Introduced a script `check_annotation_types.py` to validate the presence and structure of `custodian_types` in YAML files. - Added a script `update_specificity.py` to automate updates related to `SpecificityAnnotation` to `SpecificityScore`.
48 lines
2 KiB
YAML
48 lines
2 KiB
YAML
id: https://nde.nl/ontology/hc/classes/XPath
|
|
name: XPath
|
|
title: XPath
|
|
prefixes:
|
|
linkml: https://w3id.org/linkml/
|
|
hc: https://nde.nl/ontology/hc/
|
|
prov: http://www.w3.org/ns/prov#
|
|
schema: http://schema.org/
|
|
xsd: http://www.w3.org/2001/XMLSchema#
|
|
imports:
|
|
- linkml:types
|
|
default_range: string
|
|
classes:
|
|
XPath:
|
|
description: 'An XPath expression used to locate a specific element within an
|
|
HTML or XML document.
|
|
|
|
**CRITICAL PROVENANCE FIELD**: XPath expressions provide the essential link
|
|
between extracted data values and their original source location in archived
|
|
documents. Without an XPath, a claim extracted from a webpage is unverifiable.
|
|
|
|
**FORMAT**: Standard XPath 1.0 expressions **EXAMPLE**: `/html[1]/body[1]/div[6]/div[1]/table[3]/tbody[1]/tr[1]/td[1]/p[6]`
|
|
|
|
**USAGE CONTEXT**: Used with `has_or_had_provenance_path` slot to link provenance
|
|
records to specific locations in source documents.'
|
|
class_uri: prov:Location
|
|
close_mappings:
|
|
- schema:xpath
|
|
related_mappings:
|
|
- prov:atLocation
|
|
annotations:
|
|
custodian_types: '["*"]'
|
|
custodian_types_rationale: XPath provenance is relevant for any custodian type
|
|
where web content is extracted and archived.
|
|
custodian_types_primary: '*'
|
|
specificity_score: 0.7
|
|
specificity_rationale: High specificity - only relevant for web-extracted data
|
|
with HTML archival.
|
|
examples:
|
|
- value: "XPath:\n expression: \"/html[1]/body[1]/div[6]/div[1]/table[3]/tbody[1]/tr[1]/td[1]/p[6]\"\
|
|
\n matched_text: \"Historische Vereniging Nijeveen\"\n match_score: 1.0\n\
|
|
\ source_document: \"web/0021/historischeverenigingnijeveen.nl/rendered.html\"\
|
|
\n"
|
|
description: XPath extraction pointing to an institution name in archived HTML.
|
|
- value: "XPath:\n expression: \"//meta[@property='og:title']/@content\"\n matched_text:\
|
|
\ \"Amsterdam Museum - Official Website\"\n match_score: 0.95\n"
|
|
description: XPath to OpenGraph metadata in a webpage header.
|
|
slots: []
|