111 lines
4.4 KiB
YAML
111 lines
4.4 KiB
YAML
id: https://nde.nl/ontology/hc/class/ExtractionMetadata
|
|
name: ExtractionMetadata
|
|
title: Extraction Metadata Class
|
|
prefixes:
|
|
linkml: https://w3id.org/linkml/
|
|
hc: https://nde.nl/ontology/hc/
|
|
schema: http://schema.org/
|
|
prov: http://www.w3.org/ns/prov#
|
|
dct: http://purl.org/dc/terms/
|
|
default_prefix: hc
|
|
imports:
|
|
- linkml:types
|
|
- ../enums/ProfileExtractionMethodEnum
|
|
- ../metadata
|
|
- ../slots/has_source
|
|
- ../slots/identified_by
|
|
- ../slots/retrieved_at
|
|
- ../slots/retrieved_by
|
|
- ../slots/has_method
|
|
- ../slots/has_url
|
|
- ../slots/has_expense
|
|
- ../slots/has_provenance
|
|
- ../slots/has_score
|
|
classes:
|
|
ExtractionMetadata:
|
|
class_uri: prov:Activity
|
|
description: >-
|
|
Provenance record documenting the method, timing, agent, and source involved in harvesting data from external systems.
|
|
alt_descriptions:
|
|
nl: >-
|
|
Herkomstrecord dat de methode, timing, agent en bron documenteert die betrokken zijn bij het ophalen van gegevens uit externe systemen.
|
|
de: >-
|
|
Herkunftsdatensatz, der Methode, Zeitpunkt, Akteur und Quelle dokumentiert, die an der Extraktion von Daten aus externen Systemen beteiligt sind.
|
|
fr: >-
|
|
Enregistrement de provenance documentant la méthode, le timing, l'agent et la source impliqués dans l'extraction de données de systèmes externes.
|
|
es: >-
|
|
Registro de procedencia que documenta el método, tiempo, agente y fuente involucrados en la extracción de datos de sistemas externos.
|
|
ar: >-
|
|
سجل المنشأ الذي يوثق الطريقة والتوقيت والوكيل والمصدر المشاركين في استخراج البيانات من الأنظمة الخارجية.
|
|
id: >-
|
|
Catatan asal-usul yang mendokumentasikan metode, waktu, agen, dan sumber yang terlibat dalam ekstraksi data dari sistem eksternal.
|
|
zh: >-
|
|
记录从外部系统提取数据所涉及的方法、时间、代理和来源的来源记录。
|
|
structured_aliases:
|
|
nl:
|
|
- extractiemetadata
|
|
- ophaalprovenance
|
|
de:
|
|
- Extraktionsmetadaten
|
|
- Abrufherkunft
|
|
fr:
|
|
- métadonnées d'extraction
|
|
- provenance d'extraction
|
|
es:
|
|
- metadatos de extracción
|
|
- procedencia de extracción
|
|
ar:
|
|
- بيانات الاستخراج الوصفية
|
|
- منشأ الاستخراج
|
|
id:
|
|
- metadata ekstraksi
|
|
- asal-usul ekstraksi
|
|
zh:
|
|
- 提取元数据
|
|
- 提取来源
|
|
keywords:
|
|
- extraction metadata
|
|
- provenance
|
|
- data harvesting
|
|
- retrieval record
|
|
- ingestion
|
|
broad_mappings:
|
|
- prov:Activity
|
|
close_mappings:
|
|
- schema:Action
|
|
- dct:ProvenanceStatement
|
|
slots:
|
|
- has_source
|
|
- identified_by
|
|
- retrieved_at
|
|
- retrieved_by
|
|
- has_method
|
|
- has_url
|
|
- has_expense
|
|
- has_provenance
|
|
- has_score
|
|
slot_usage:
|
|
retrieved_at:
|
|
range: datetime
|
|
required: true
|
|
has_method:
|
|
range: ProfileExtractionMethodEnum
|
|
required: true
|
|
has_url:
|
|
range: uri
|
|
has_expense:
|
|
range: float
|
|
minimum_value: 0.0
|
|
see_also:
|
|
- https://www.linkedin.com/in/...\
|
|
notes:
|
|
- |
|
|
Preserved from prior description (commit ee5e8e5a):
|
|
|
|
Preserved from prior description (commit ee5e8e5a):
|
|
|
|
"Provenance metadata for data extraction activities.\n\nRecords how, when, and by what agent data was extracted from \nexternal sources (LinkedIn, web scraping, APIs).\n\n**PROV-O Alignment**:\n- ExtractionMetadata IS a prov:Activity (the extraction process)\n- The extracted data IS the prov:Entity (output of the activity)\n- retrieved_by IS the prov:Agent (software/AI that performed extraction)\n- has_source/has_url IS prov:used (input to the activity)\n\n**Use Cases**:\n- LinkedIn profile extractions via Exa API\n- Web scraping provenance\n- Staff list parsing provenance\n- Connection network extraction\n\n**Example JSON Structure**:\n```json\n{\n \"extraction_metadata\": {\n \"has_source\": \"/path/to/source.json\",\n \"identified_by\": \"org_staff_0001_name\",\n \"retrieval_timestamp\": \"2025-12-12T22:00:00Z\",\n \"has_method\": \"exa_crawling_exa\",\n \"retrieved_by\": \"claude-opus-4.5\",\n \"has_url\": \"https://www.linkedin.com/in/...\"\
|
|
annotations:
|
|
specificity_score: 0.5
|
|
specificity_rationale: Provenance activity record for extraction pipelines and auditability.
|
|
custodian_types: '["*"]'
|