glam/schemas/20251121/linkml/modules/classes/OAIPMHEndpoint.yaml
2025-12-17 10:11:56 +01:00

417 lines
13 KiB
YAML

# OAI-PMH Endpoint Class
# Represents OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) endpoints
#
# OAI-PMH is the primary standard for metadata harvesting in heritage institutions.
# This class extends DataServiceEndpoint with OAI-PMH-specific attributes.
#
# Reference: http://www.openarchives.org/OAI/openarchivesprotocol.html
id: https://nde.nl/ontology/hc/class/OAIPMHEndpoint
name: oai_pmh_endpoint
title: OAIPMHEndpoint Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
dcat: http://www.w3.org/ns/dcat#
dcterms: http://purl.org/dc/terms/
schema: http://schema.org/
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
- ../metadata
- ./DataServiceEndpoint
classes:
OAIPMHEndpoint:
is_a: DataServiceEndpoint
class_uri: hc:OAIPMHEndpoint
description: |
OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) endpoint.
**Purpose:**
Models OAI-PMH endpoints for automated metadata harvesting from heritage repositories.
OAI-PMH is the dominant protocol for metadata aggregation in the GLAM sector.
**Protocol Overview:**
OAI-PMH defines six verbs:
1. **Identify** - Repository information
2. **ListMetadataFormats** - Available metadata formats
3. **ListSets** - Collection/set hierarchy
4. **ListIdentifiers** - Record identifiers
5. **ListRecords** - Full records with metadata
6. **GetRecord** - Single record by identifier
**Example - Nationaal Archief OAI-PMH:**
```yaml
oai_pmh_endpoint:
endpoint_name: "Nationaal Archief OAI-PMH"
base_url: "https://www.nationaalarchief.nl/onderzoeken/oai-pmh"
protocol_version: "2.0"
repository_name: "Nationaal Archief"
admin_email: "helpdesk@nationaalarchief.nl"
earliest_datestamp: "2010-01-01"
deleted_record_policy: NO
granularity: YYYY_MM_DD
metadata_prefixes:
- oai_dc
- ese
- edm
sets:
- name: "Fotografische documenten"
spec: "foto"
- name: "Kaarten en tekeningen"
spec: "kaarten"
```
**Harvesting Workflow:**
1. Call Identify to get repository info
2. Call ListMetadataFormats to discover available formats
3. Call ListSets to understand collection structure
4. Call ListRecords with resumptionToken for incremental harvesting
**See Also:**
- OAI-PMH Specification: http://www.openarchives.org/OAI/openarchivesprotocol.html
- Europeana Harvesting: https://pro.europeana.eu/page/harvesting-and-downloads
attributes:
base_url:
slot_uri: dcat:endpointURL
description: |
Base URL for OAI-PMH requests.
All OAI-PMH verbs are appended as query parameters to this URL.
Example:
- Base URL: "https://www.nationaalarchief.nl/onderzoeken/oai-pmh"
- Identify request: "https://www.nationaalarchief.nl/onderzoeken/oai-pmh?verb=Identify"
range: uri
required: true
protocol_version:
slot_uri: schema:version
description: |
OAI-PMH protocol version supported.
Current standard is version 2.0 (since 2002).
Earlier version 1.x is deprecated but may exist in legacy systems.
Values:
- "2.0" (current standard)
- "1.1" (legacy)
- "1.0" (legacy)
range: string
pattern: "^[12]\\.[0-9]+$"
repository_name:
slot_uri: schema:name
description: |
Human-readable name of the repository.
From OAI-PMH Identify response: <repositoryName>
Example: "Nationaal Archief"
range: string
admin_email:
slot_uri: schema:email
description: |
Email address of repository administrator.
From OAI-PMH Identify response: <adminEmail>
May be multiple emails.
Example: "helpdesk@nationaalarchief.nl"
range: string
multivalued: true
earliest_datestamp:
slot_uri: dcterms:temporal
description: |
Earliest datestamp available in the repository.
From OAI-PMH Identify response: <earliestDatestamp>
Records with dates before this are not available.
Format: YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ (depending on granularity)
Example: "2010-01-01"
range: string
deleted_record_policy:
slot_uri: schema:additionalProperty
description: |
Policy for reporting deleted records.
From OAI-PMH Identify response: <deletedRecord>
Values:
- NO: Repository does not maintain deleted records
- PERSISTENT: Deleted records are kept with "deleted" status
- TRANSIENT: Deleted records are kept for some time
range: OAIDeletedRecordPolicyEnum
granularity:
slot_uri: dcterms:accrualPeriodicity
description: |
Datestamp granularity supported by the repository.
From OAI-PMH Identify response: <granularity>
Values:
- YYYY_MM_DD: Day-level precision (YYYY-MM-DD)
- YYYY_MM_DD_THH_MM_SS_Z: Second-level precision (YYYY-MM-DDThh:mm:ssZ)
Determines format for `from` and `until` parameters in selective harvesting.
range: OAIGranularityEnum
metadata_prefixes:
slot_uri: dcterms:format
description: |
Metadata formats (prefixes) supported by this repository.
From OAI-PMH ListMetadataFormats response.
Common prefixes:
- **oai_dc**: Dublin Core (required by OAI-PMH specification)
- **ese**: Europeana Semantic Elements
- **edm**: Europeana Data Model
- **mods**: MODS (Metadata Object Description Schema)
- **marc21**: MARC 21
- **ead**: Encoded Archival Description
- **lido**: LIDO (museum objects)
- **dc**: Dublin Core (variant)
- **qdc**: Qualified Dublin Core
Example: ["oai_dc", "ese", "edm"]
range: string
multivalued: true
required: true
compression:
slot_uri: schema:encodingFormat
description: |
Compression methods supported.
From OAI-PMH Identify response: <compression>
Examples: ["gzip", "deflate"]
range: string
multivalued: true
sets:
slot_uri: dcat:theme
description: |
Sets (collections) available for selective harvesting.
From OAI-PMH ListSets response.
Structured as list of set specifications with names.
Example:
```yaml
sets:
- spec: "foto"
name: "Fotografische documenten"
- spec: "kaarten"
name: "Kaarten en tekeningen"
```
range: OAIPMHSet
multivalued: true
inlined_as_list: true
sample_identifier:
slot_uri: dcterms:identifier
description: |
Example identifier format used by this repository.
From OAI-PMH Identify response: <sampleIdentifier> (optional in spec)
Helps understand the identifier scheme.
Example: "oai:nationaalarchief.nl:2.04.87.01"
range: string
description:
slot_uri: dcterms:description
description: |
Repository description from OAI-PMH Identify response.
May contain structured XML (oai-identifier, eprints, friends, etc.)
or free-text description.
range: string
supports_resumption_token:
slot_uri: schema:additionalProperty
description: |
Whether the repository supports resumption tokens for large result sets.
Required by OAI-PMH spec for ListRecords/ListIdentifiers/ListSets.
Most compliant repositories support this.
range: boolean
batch_size:
slot_uri: schema:maxValue
description: |
Typical number of records per response (before resumption token).
Not part of OAI-PMH spec but useful for harvesting optimization.
Example: 100
range: integer
total_records:
slot_uri: schema:numberOfItems
description: |
Total number of records in the repository (approximate).
From completeListSize attribute in OAI-PMH response (optional).
Example: 1500000
range: integer
last_harvested:
slot_uri: schema:dateModified
description: |
Date when this endpoint was last successfully harvested.
Useful for tracking incremental harvesting.
ISO 8601 format.
Example: "2025-12-01T10:30:00Z"
range: datetime
slot_usage:
protocol:
description: Protocol is always OAI-PMH for this endpoint type. Value should be OAI_PMH.
response_formats:
description: |
For OAI-PMH, always ["application/xml"].
OAI-PMH responses are always XML.
ifabsent: "string(application/xml)"
comments:
- "Primary protocol for metadata harvesting in heritage sector"
- "All compliant repositories MUST support oai_dc (Dublin Core) format"
- "Use incremental harvesting with from/until parameters for efficiency"
see_also:
- "http://www.openarchives.org/OAI/openarchivesprotocol.html"
- "https://www.openarchives.org/OAI/2.0/guidelines.htm"
OAIPMHSet:
class_uri: hc:OAIPMHSet
description: |
Represents an OAI-PMH Set (collection) available for selective harvesting.
Sets provide hierarchical organization of records, allowing harvesters
to request records from specific collections.
Set hierarchies use colon separator: "category:subcategory:item"
attributes:
spec:
slot_uri: dcterms:identifier
description: |
Set specification (setSpec).
Machine-readable identifier used in OAI-PMH requests.
Hierarchy indicated by colons (e.g., "photo:portraits:19th-century").
Example: "foto"
range: string
required: true
name:
slot_uri: schema:name
description: |
Human-readable set name (setName).
Example: "Fotografische documenten"
range: string
required: true
description:
slot_uri: dcterms:description
description: |
Optional description of the set contents.
range: string
parent_spec:
slot_uri: schema:isPartOf
description: |
Parent set specification for hierarchical sets.
If spec is "photo:portraits", parent_spec would be "photo".
range: string
record_count:
slot_uri: schema:numberOfItems
description: |
Approximate number of records in this set.
Not part of OAI-PMH spec but useful if available.
range: integer
enums:
OAIDeletedRecordPolicyEnum:
description: |
OAI-PMH deleted record support policy.
Determines how the repository handles records that have been deleted.
permissible_values:
NO:
description: |
Repository does not maintain information about deletions.
No "deleted" status will ever be returned.
Harvesters cannot detect deleted records.
TRANSIENT:
description: |
Repository maintains deleted records for an unspecified period.
Harvesters may see "deleted" status, but it's not guaranteed
to persist indefinitely.
PERSISTENT:
description: |
Repository persistently maintains deleted records.
Once a record is assigned "deleted" status, it remains
queryable with that status.
Recommended for reliable incremental harvesting.
OAIGranularityEnum:
description: |
OAI-PMH datestamp granularity.
Determines the precision of date/time values used in
selective harvesting (from/until parameters).
permissible_values:
YYYY_MM_DD:
description: |
Day-level granularity.
Format: YYYY-MM-DD
Example: "2025-12-14"
YYYY_MM_DD_THH_MM_SS_Z:
description: |
Second-level granularity.
Format: YYYY-MM-DDThh:mm:ssZ
Example: "2025-12-14T10:30:00Z"
Note: Must use UTC (Z suffix).