glam/schemas/20251121/linkml/modules/classes/WebObservation.yaml
kempersc fcd1c21c63 Add aliases and enhance slot definitions across various modules
- Added new aliases for existing slots to improve clarity and usability, including:
  - has_deadline: has_embargo_end_date
  - has_extent: has_extent_text
  - has_fonds: has_fond
  - has_laboratory: conservation_lab
  - has_language: has_iso_code639_1, has_iso_code639_3
  - has_legal_basis: legal_basis
  - has_light_exposure: max_light_lux
  - has_measurement_unit: has_unit
  - has_note: has_custodian_observation
  - has_occupation: occupation
  - has_operating_hours: has_operating_hours
  - has_position: position
  - has_quantity: has_artwork_count, link_count
  - has_roadmap: review_date
  - has_skill: skill
  - has_speaker: speaker_label
  - has_specification: specification_url
  - has_statement: rights_statement_url, rights_statement
  - has_type: custodian_only
  - has_user_category: serves_visitors_only
  - hold_record_set: record_count
  - identified_by: has_index_number
  - in_period: has_period
  - in_place: has_place
  - in_series: has_series
  - measure: has_measurement
  - measured_on: measurement_date
  - organized_by: has_organizer
  - originate_from: has_origin
  - part_of: suborganization_of
  - published_on: has_publication_date
  - receive_investment: has_investment
  - related_to: connection_heritage_type
  - require: preservation_requirement
  - safeguarded_by: current_keeper, record_holder_note
  - state: states_or_stated
  - take_comission: takes_or_took_comission
  - take_place_at: takes_or_took_place_at
  - transmit_through: transmits_or_transmitted_through
  - warrant: warrants_or_warranted

- Introduced a new slot definition for evaluated_through to capture evaluation methodologies and review statuses.
2026-02-14 14:41:49 +01:00

152 lines
8.4 KiB
YAML

id: https://nde.nl/ontology/hc/class/WebObservation
name: WebObservation
title: WebObservation Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
pav: http://purl.org/pav/
foaf: http://xmlns.com/foaf/0.1/
xsd: http://www.w3.org/2001/XMLSchema#
crm: http://www.cidoc-crm.org/cidoc-crm/
skos: http://www.w3.org/2004/02/skos/core#
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
imports:
- linkml:types
- ../slots/changed_through
- ../slots/encoded_as
- ../slots/has_content
- ../slots/has_method
- ../slots/has_note
- ../slots/has_score
- ../slots/has_status
- ../slots/archived_at
- ../slots/updated_at
- ../slots/identified_by
- ../slots/observe
- ../slots/has_title
- ../slots/preceded_by
- ../slots/retrieved_through
- ../slots/retrieved_by
- ../slots/retrieved_at
- ../slots/has_url
- ../slots/warrant
default_prefix: hc
classes:
WebObservation:
class_uri: prov:Activity
description: "A provenance record documenting the retrieval and observation of web content.\nTracks when, where, and how web-based information was obtained.\n\n**PURPOSE**:\n\nWebObservation provides transparent provenance for web-extracted data in the\nheritage custodian ontology. When information about funding calls, institutions,\nor other entities is extracted from web sources, a WebObservation record\ndocuments:\n\n- **What**: The source URL and content\n- **When**: Timestamp of retrieval\n- **Who/What**: Agent performing retrieval\n- **How**: Method of extraction\n- **Quality**: Confidence scores and notes\n\n**PROVENANCE CHAIN**:\n\n```\nWebObservation (Activity)\n \u2502\n \u251C\u2500\u2500 prov:used \u2500\u2500\u2192 SourceDocument (web page as Entity)\n \u2502 \u2502\n \u2502 \u2514\u2500\u2500 source_uri: https://example.org/call\n \u2502\n \u251C\u2500\u2500 prov:generated \u2500\u2500\u2192 CallForApplication\
\ (extracted Entity)\n \u2502\n \u251C\u2500\u2500 pav:retrievedFrom \u2500\u2500\u2192 URI of source\n \u251C\u2500\u2500 pav:retrievedOn \u2500\u2500\u2192 datetime\n \u2514\u2500\u2500 pav:retrievedBy \u2500\u2500\u2192 agent identifier\n```\n\n**PROV-O ALIGNMENT**:\n\nWebObservation is modelled as a `prov:Activity`:\n- Activities are \"something that occurs over a period of time and acts upon\n or with entities\"\n- The retrieval of a web page is an activity that uses a SourceDocument\n (the live web page) and generates extracted data\n\nKey PROV-O properties:\n- `prov:used` - The web page accessed\n- `prov:generated` - The extracted data entity\n- `prov:wasAssociatedWith` - The retrieval agent\n- `prov:atTime` - When the activity occurred\n\n**PAV ALIGNMENT**:\n\nPAV (Provenance, Authoring and Versioning) provides more specific properties:\n- `pav:retrievedFrom` - Source URL\n- `pav:retrievedOn` - Retrieval timestamp\n- `pav:retrievedBy` - Retrieval agent\n\
- `pav:sourceAccessedAt` - When source was consulted\n\n**CHANGE DETECTION**:\n\nWebObservation supports tracking changes over time:\n- Link to `previous_observation` for same URL\n- `content_changed` flag for quick change detection\n- `content_hash` for integrity verification\n- Compare `last_modified` and `etag` across observations\n\n**ARCHIVAL INTEGRATION**:\n\nFor long-term preservation, link to archived copies:\n- `archived_at` can point to Wayback Machine, Archive.today, etc.\n- Ensures cited web content remains accessible\n\n**EXAMPLES**:\n\n1. **EU Funding Portal Observation**\n - source_url: https://ec.europa.eu/.../has_topic-details/horizon-cl2-2025-heritage-01\n - retrieved_on: 2025-11-29T10:30:00Z\n - retrieved_by: \"glam-harvester/1.0\"\n - extraction_confidence: 0.95\n \n2. **Heritage Organisation Website**\n - source_url: https://www.heritagefund.org.uk/funding/medium-grants\n - retrieved_on: 2025-11-28T14:00:00Z\n - content_type: text/html\n \
\ - page_title: \"Medium grants - Heritage Fund\"\n \n3. **Wikidata SPARQL Query**\n - source_url: https://query.wikidata.org/sparql?query=...\n - retrieval_method: SPARQL API\n - content_type: application/sparql-results+json\n - observed_entities: [Q131381572, Q1375245, ...]\n"
exact_mappings:
- prov:Activity
close_mappings:
- pav:retrievedFrom
- schema:Action
related_mappings:
- prov:Entity
- pav:sourceAccessedAt
- dcterms:source
slots:
- archived_at
- warrant
- changed_through
- encoded_as
- has_content
- has_method
- has_note
- has_status
- updated_at
- identified_by
- observe
- has_title
- preceded_by
- retrieved_through
- retrieved_by
- retrieved_at
- has_url
- has_score
slot_usage:
has_method:
# range: string
has_status: # was: http_status_code - migrated per Rule 53/56 (2026-01-28)
range: HTTPStatusCode
examples:
- value:
has_value: "200"
has_label: "OK"
comments:
- WebObservation is a prov:Activity documenting web content retrieval
- Integrates PROV-O for provenance and PAV for retrieval-specific properties
- Supports change detection via content_hash, previous_observation, content_changed
- Links to archived copies via archived_at for long-term citation
- observed_entities links observation to extracted data (prov:generated)
see_also:
- https://www.w3.org/TR/prov-o/
- http://purl.org/pav/
- https://www.w3.org/TR/prov-dm/
- https://web.archive.org/
examples:
- value:
observation_id: https://nde.nl/ontology/hc/observation/web/2025-11-29/eu-horizon-cl2-heritage
source_url: https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/horizon-cl2-2025-heritage-01
retrieved_on: '2025-11-29T10:30:00Z'
retrieved_by: claude-assistant
retrieval_method: exa-search
has_status:
has_value: "200"
content_type: text/html
page_title: Horizon Europe - Cultural heritage, cultural and creative industries
has_score:
has_score: 0.92
extraction_notes: Extracted via Exa AI search. Call details structured and well-formatted. Budget and deadline clearly stated. Eligibility criteria parsed from HTML sections.
observe:
- https://nde.nl/ontology/hc/call/ec/cl2-2025-heritage-01
archived_at: https://web.archive.org/web/20251129103000/https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/horizon-cl2-2025-heritage-01
- value:
observation_id: https://nde.nl/ontology/hc/observation/web/2025-11-28/nlhf-medium-grants
source_url: https://www.heritagefund.org.uk/funding/medium-grants
retrieved_on: '2025-11-28T14:00:00Z'
retrieved_by: glam-harvester/1.0
retrieval_method: playwright-scraper
has_status:
has_value: "200"
content_type: text/html
page_title: Medium grants | The National Lottery Heritage Fund
content_hash: sha256:a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456
last_modified: '2025-11-15T09:00:00Z'
has_score:
has_score: 0.88
extraction_notes: Extracted via Playwright scraper. Dynamic content fully rendered. Grant range and eligibility parsed from page sections.
observe:
- https://nde.nl/ontology/hc/call/nlhf/medium-grants-2025-q4
previous_observation: https://nde.nl/ontology/hc/observation/web/2025-10-15/nlhf-medium-grants
content_changed: true
- value:
observation_id: https://nde.nl/ontology/hc/observation/web/2025-11-29/wikidata-echoes
source_url: https://query.wikidata.org/sparql
retrieved_on: '2025-11-29T09:00:00Z'
retrieved_by: wikidata-mcp-server
retrieval_method: sparql-api
has_status:
has_value: "200"
content_type: application/sparql-results+json
has_score:
has_score: 1.0
extraction_notes: SPARQL query for ECHOES/ECCCH Q-number (Q131381572). Structured API response with high confidence.
observe:
- http://www.wikidata.org/entity/Q131381572
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"