glam/schemas/20251121/linkml/modules/classes/ExtractionMetadata.yaml
kempersc 92c79067cd Refactor time-related classes and descriptions for clarity and consistency
- Updated titles and descriptions in TimeSlot, TimeSpan, TimeSpanType, and TimespanBlock for improved readability and understanding.
- Enhanced multilingual support with refined alt_descriptions and structured_aliases across various classes.
- Changed mapping types from broad_mappings to exact_mappings in WebClaimsBlock, WebCollection, WebPage, WebPlatform, WebSource, WorkExperience, and various YouTube-related classes for better alignment with schema definitions.
- Improved comments and modeling notes in VariantTypes to clarify usage and examples.
- General cleanup of unnecessary comments and formatting adjustments for consistency across YAML files.
2026-02-16 13:49:40 +01:00

67 lines
2.6 KiB
YAML

id: https://nde.nl/ontology/hc/class/ExtractionMetadata
name: ExtractionMetadata
title: Extraction Metadata Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
prov: http://www.w3.org/ns/prov#
dct: http://purl.org/dc/terms/
default_prefix: hc
imports:
- linkml:types
- ../enums/ProfileExtractionMethodEnum
- ../metadata
- ../slots/has_source
- ../slots/identified_by
- ../slots/retrieved_at
- ../slots/retrieved_by
- ../slots/has_method
- ../slots/has_url
- ../slots/has_expense
- ../slots/has_provenance
- ../slots/has_score
classes:
ExtractionMetadata:
class_uri: prov:Activity
description: Provenance metadata describing how and when extraction was performed.
close_mappings:
- schema:Action
- dct:ProvenanceStatement
slots:
- has_source
- identified_by
- retrieved_at
- retrieved_by
- has_method
- has_url
- has_expense
- has_provenance
- has_score
slot_usage:
retrieved_at:
range: datetime
required: true
has_method:
range: ProfileExtractionMethodEnum
required: true
has_url:
range: uri
has_expense:
range: float
minimum_value: 0.0
see_also:
- https://www.linkedin.com/in/...\
notes:
- |
Preserved from prior description (commit ee5e8e5a):
Preserved from prior description (commit ee5e8e5a):
"Provenance metadata for data extraction activities.\n\nRecords how, when, and by what agent data was extracted from \nexternal sources (LinkedIn, web scraping, APIs).\n\n**PROV-O Alignment**:\n- ExtractionMetadata IS a prov:Activity (the extraction process)\n- The extracted data IS the prov:Entity (output of the activity)\n- retrieved_by IS the prov:Agent (software/AI that performed extraction)\n- has_source/has_url IS prov:used (input to the activity)\n\n**Use Cases**:\n- LinkedIn profile extractions via Exa API\n- Web scraping provenance\n- Staff list parsing provenance\n- Connection network extraction\n\n**Example JSON Structure**:\n```json\n{\n \"extraction_metadata\": {\n \"has_source\": \"/path/to/source.json\",\n \"identified_by\": \"org_staff_0001_name\",\n \"retrieval_timestamp\": \"2025-12-12T22:00:00Z\",\n \"has_method\": \"exa_crawling_exa\",\n \"retrieved_by\": \"claude-opus-4.5\",\n \"has_url\": \"https://www.linkedin.com/in/...\"\
annotations:
specificity_score: 0.5
specificity_rationale: Provenance activity record for extraction pipelines and auditability.
custodian_types: '["*"]'
broad_mappings:
- prov:Activity