id: https://nde.nl/ontology/hc/class/ExtractionMetadata name: ExtractionMetadata title: Extraction Metadata Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ prov: http://www.w3.org/ns/prov# dct: http://purl.org/dc/terms/ default_prefix: hc imports: - linkml:types - ../enums/ProfileExtractionMethodEnum - ../metadata - ../slots/has_source - ../slots/identified_by - ../slots/retrieved_at - ../slots/retrieved_by - ../slots/has_method - ../slots/has_url - ../slots/has_expense - ../slots/has_provenance - ../slots/has_score classes: ExtractionMetadata: class_uri: prov:Activity description: Provenance metadata describing how and when extraction was performed. exact_mappings: - prov:Activity close_mappings: - schema:Action - dct:ProvenanceStatement slots: - has_source - identified_by - retrieved_at - retrieved_by - has_method - has_url - has_expense - has_provenance - has_score slot_usage: retrieved_at: range: datetime required: true has_method: range: ProfileExtractionMethodEnum required: true has_url: range: uri has_expense: range: float minimum_value: 0.0 see_also: - https://www.linkedin.com/in/...\ notes: - | Preserved from prior description (commit ee5e8e5a): Preserved from prior description (commit ee5e8e5a): "Provenance metadata for data extraction activities.\n\nRecords how, when, and by what agent data was extracted from \nexternal sources (LinkedIn, web scraping, APIs).\n\n**PROV-O Alignment**:\n- ExtractionMetadata IS a prov:Activity (the extraction process)\n- The extracted data IS the prov:Entity (output of the activity)\n- retrieved_by IS the prov:Agent (software/AI that performed extraction)\n- has_source/has_url IS prov:used (input to the activity)\n\n**Use Cases**:\n- LinkedIn profile extractions via Exa API\n- Web scraping provenance\n- Staff list parsing provenance\n- Connection network extraction\n\n**Example JSON Structure**:\n```json\n{\n \"extraction_metadata\": {\n \"has_source\": \"/path/to/source.json\",\n \"identified_by\": \"org_staff_0001_name\",\n \"retrieval_timestamp\": \"2025-12-12T22:00:00Z\",\n \"has_method\": \"exa_crawling_exa\",\n \"retrieved_by\": \"claude-opus-4.5\",\n \"has_url\": \"https://www.linkedin.com/in/...\"\ annotations: specificity_score: 0.5 specificity_rationale: Provenance activity record for extraction pipelines and auditability. custodian_types: '["*"]'