id: https://nde.nl/ontology/hc/class/WebObservation name: WebObservation title: WebObservation Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ dcterms: http://purl.org/dc/terms/ prov: http://www.w3.org/ns/prov# pav: http://purl.org/pav/ foaf: http://xmlns.com/foaf/0.1/ xsd: http://www.w3.org/2001/XMLSchema# crm: http://www.cidoc-crm.org/cidoc-crm/ skos: http://www.w3.org/2004/02/skos/core# rdfs: http://www.w3.org/2000/01/rdf-schema# org: http://www.w3.org/ns/org# imports: - linkml:types - ./WebClaim - ../slots/is_or_was_archived_at - ../slots/extraction_note - ../slots/source_url - ../slots/retrieved_on - ../slots/content_hash - ../slots/warrants_or_warranted - ../slots/content_changed - ../slots/content_type - ../slots/has_or_had_method - ./CacheValidation - ./ETag - ../slots/has_or_had_status - ../slots/last_modified - ../slots/observation_id - ../slots/observed_entity - ../slots/page_title - ../slots/previous_observation - ../slots/retrieval_method - ../slots/retrieved_by - ../slots/specificity_annotation - ../slots/has_or_had_score - ./SpecificityAnnotation - ./TemplateSpecificityScore - ./TemplateSpecificityType - ./TemplateSpecificityTypes default_prefix: hc classes: WebObservation: class_uri: prov:Activity description: "A provenance record documenting the retrieval and observation of web content.\nTracks when, where, and how web-based information was obtained.\n\n**PURPOSE**:\n\nWebObservation provides transparent provenance for web-extracted data in the\nheritage custodian ontology. When information about funding calls, institutions,\nor other entities is extracted from web sources, a WebObservation record\ndocuments:\n\n- **What**: The source URL and content\n- **When**: Timestamp of retrieval\n- **Who/What**: Agent performing retrieval\n- **How**: Method of extraction\n- **Quality**: Confidence scores and notes\n\n**PROVENANCE CHAIN**:\n\n```\nWebObservation (Activity)\n \u2502\n \u251C\u2500\u2500 prov:used \u2500\u2500\u2192 SourceDocument (web page as Entity)\n \u2502 \u2502\n \u2502 \u2514\u2500\u2500 source_uri: https://example.org/call\n \u2502\n \u251C\u2500\u2500 prov:generated \u2500\u2500\u2192 CallForApplication\ \ (extracted Entity)\n \u2502\n \u251C\u2500\u2500 pav:retrievedFrom \u2500\u2500\u2192 URI of source\n \u251C\u2500\u2500 pav:retrievedOn \u2500\u2500\u2192 datetime\n \u2514\u2500\u2500 pav:retrievedBy \u2500\u2500\u2192 agent identifier\n```\n\n**PROV-O ALIGNMENT**:\n\nWebObservation is modelled as a `prov:Activity`:\n- Activities are \"something that occurs over a period of time and acts upon\n or with entities\"\n- The retrieval of a web page is an activity that uses a SourceDocument\n (the live web page) and generates extracted data\n\nKey PROV-O properties:\n- `prov:used` - The web page accessed\n- `prov:generated` - The extracted data entity\n- `prov:wasAssociatedWith` - The retrieval agent\n- `prov:atTime` - When the activity occurred\n\n**PAV ALIGNMENT**:\n\nPAV (Provenance, Authoring and Versioning) provides more specific properties:\n- `pav:retrievedFrom` - Source URL\n- `pav:retrievedOn` - Retrieval timestamp\n- `pav:retrievedBy` - Retrieval agent\n\ - `pav:sourceAccessedAt` - When source was consulted\n\n**CHANGE DETECTION**:\n\nWebObservation supports tracking changes over time:\n- Link to `previous_observation` for same URL\n- `content_changed` flag for quick change detection\n- `content_hash` for integrity verification\n- Compare `last_modified` and `etag` across observations\n\n**ARCHIVAL INTEGRATION**:\n\nFor long-term preservation, link to archived copies:\n- `is_or_was_archived_at` can point to Wayback Machine, Archive.today, etc.\n- Ensures cited web content remains accessible\n\n**EXAMPLES**:\n\n1. **EU Funding Portal Observation**\n - source_url: https://ec.europa.eu/.../topic-details/horizon-cl2-2025-heritage-01\n - retrieved_on: 2025-11-29T10:30:00Z\n - retrieved_by: \"glam-harvester/1.0\"\n - extraction_confidence: 0.95\n \n2. **Heritage Organisation Website**\n - source_url: https://www.heritagefund.org.uk/funding/medium-grants\n - retrieved_on: 2025-11-28T14:00:00Z\n - content_type: text/html\n \ \ - page_title: \"Medium grants - Heritage Fund\"\n \n3. **Wikidata SPARQL Query**\n - source_url: https://query.wikidata.org/sparql?query=...\n - retrieval_method: SPARQL API\n - content_type: application/sparql-results+json\n - observed_entities: [Q131381572, Q1375245, ...]\n" exact_mappings: - prov:Activity close_mappings: - pav:Retrieval - schema:Action related_mappings: - prov:Entity - pav:sourceAccessedAt - dcterms:source slots: - is_or_was_archived_at - warrants_or_warranted - content_changed - content_hash - content_type - has_or_had_method - extraction_note - has_or_had_status - last_modified - observation_id - observed_entity - page_title - previous_observation - retrieval_method - retrieved_by - retrieved_on - source_url - specificity_annotation - has_or_had_score slot_usage: has_or_had_method: range: CacheValidation description: Cache validation method (e.g. ETag). MIGRATED from etag per slot_fixes.yaml (Rule 53). has_or_had_status: # was: http_status_code - migrated per Rule 53/56 (2026-01-28) range: HTTPStatusCode description: | HTTP response status code (e.g. 200, 404). MIGRATED from http_status_code. Uses HTTPStatusCode class which inherits from Status. examples: - value: has_or_had_value: "200" has_or_had_label: "OK" description: Standard success response comments: - WebObservation is a prov:Activity documenting web content retrieval - Integrates PROV-O for provenance and PAV for retrieval-specific properties - Supports change detection via content_hash, previous_observation, content_changed - Links to archived copies via is_or_was_archived_at for long-term citation - observed_entities links observation to extracted data (prov:generated) see_also: - https://www.w3.org/TR/prov-o/ - http://purl.org/pav/ - https://www.w3.org/TR/prov-dm/ - https://web.archive.org/ examples: - value: observation_id: https://nde.nl/ontology/hc/observation/web/2025-11-29/eu-horizon-cl2-heritage source_url: https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/horizon-cl2-2025-heritage-01 retrieved_on: '2025-11-29T10:30:00Z' retrieved_by: claude-assistant retrieval_method: exa-search has_or_had_status: has_or_had_value: "200" content_type: text/html page_title: Horizon Europe - Cultural heritage, cultural and creative industries extraction_confidence: 0.92 extraction_notes: Extracted via Exa AI search. Call details structured and well-formatted. Budget and deadline clearly stated. Eligibility criteria parsed from HTML sections. observed_entity: - https://nde.nl/ontology/hc/call/ec/cl2-2025-heritage-01 is_or_was_archived_at: https://web.archive.org/web/20251129103000/https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/horizon-cl2-2025-heritage-01 description: Web observation of Horizon Europe CL2 2025 heritage call - value: observation_id: https://nde.nl/ontology/hc/observation/web/2025-11-28/nlhf-medium-grants source_url: https://www.heritagefund.org.uk/funding/medium-grants retrieved_on: '2025-11-28T14:00:00Z' retrieved_by: glam-harvester/1.0 retrieval_method: playwright-scraper has_or_had_status: has_or_had_value: "200" content_type: text/html page_title: Medium grants | The National Lottery Heritage Fund content_hash: sha256:a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456 last_modified: '2025-11-15T09:00:00Z' extraction_confidence: 0.88 extraction_notes: Extracted via Playwright scraper. Dynamic content fully rendered. Grant range and eligibility parsed from page sections. observed_entity: - https://nde.nl/ontology/hc/call/nlhf/medium-grants-2025-q4 previous_observation: https://nde.nl/ontology/hc/observation/web/2025-10-15/nlhf-medium-grants content_changed: true description: Web observation of National Lottery Heritage Fund grants page - value: observation_id: https://nde.nl/ontology/hc/observation/web/2025-11-29/wikidata-echoes source_url: https://query.wikidata.org/sparql retrieved_on: '2025-11-29T09:00:00Z' retrieved_by: wikidata-mcp-server retrieval_method: sparql-api has_or_had_status: has_or_had_value: "200" content_type: application/sparql-results+json extraction_confidence: 1.0 extraction_notes: SPARQL query for ECHOES/ECCCH Q-number (Q131381572). Structured API response with high confidence. observed_entity: - http://www.wikidata.org/entity/Q131381572 description: SPARQL query observation for Wikidata entity annotations: specificity_score: 0.1 specificity_rationale: Generic utility class/slot created during migration custodian_types: - '*' custodian_types_rationale: Universal utility concept