- Apply Rule 39: RiC-O style hasOrHad*/isOrWas* for temporal slots - Apply Rule 43: Singular noun convention (keywords → keyword) - Update slot references to match renamed slot files - Maintain schema integrity across all class definitions
390 lines
12 KiB
YAML
390 lines
12 KiB
YAML
id: https://nde.nl/ontology/hc/class/OAIPMHEndpoint
|
|
name: oai_pmh_endpoint
|
|
title: OAIPMHEndpoint Class
|
|
prefixes:
|
|
linkml: https://w3id.org/linkml/
|
|
hc: https://nde.nl/ontology/hc/
|
|
dcat: http://www.w3.org/ns/dcat#
|
|
dcterms: http://purl.org/dc/terms/
|
|
schema: http://schema.org/
|
|
xsd: http://www.w3.org/2001/XMLSchema#
|
|
imports:
|
|
- linkml:types
|
|
- ../metadata
|
|
- ./DataServiceEndpoint
|
|
- ../slots/protocol
|
|
- ../slots/response_format
|
|
- ../slots/specificity_annotation
|
|
- ../slots/template_specificity
|
|
- ./SpecificityAnnotation
|
|
- ./TemplateSpecificityScores
|
|
classes:
|
|
OAIPMHEndpoint:
|
|
is_a: DataServiceEndpoint
|
|
class_uri: hc:OAIPMHEndpoint
|
|
description: |
|
|
OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) endpoint.
|
|
|
|
**Purpose:**
|
|
|
|
Models OAI-PMH endpoints for automated metadata harvesting from heritage repositories.
|
|
OAI-PMH is the dominant protocol for metadata aggregation in the GLAM sector.
|
|
|
|
**Protocol Overview:**
|
|
|
|
OAI-PMH defines six verbs:
|
|
1. **Identify** - Repository information
|
|
2. **ListMetadataFormats** - Available metadata formats
|
|
3. **ListSets** - Collection/set hierarchy
|
|
4. **ListIdentifiers** - Record identifiers
|
|
5. **ListRecords** - Full records with metadata
|
|
6. **GetRecord** - Single record by identifier
|
|
|
|
**Example - Nationaal Archief OAI-PMH:**
|
|
|
|
```yaml
|
|
oai_pmh_endpoint:
|
|
endpoint_name: "Nationaal Archief OAI-PMH"
|
|
base_url: "https://www.nationaalarchief.nl/onderzoeken/oai-pmh"
|
|
protocol_version: "2.0"
|
|
repository_name: "Nationaal Archief"
|
|
admin_email: "helpdesk@nationaalarchief.nl"
|
|
earliest_datestamp: "2010-01-01"
|
|
deleted_record_policy: NO
|
|
granularity: YYYY_MM_DD
|
|
metadata_prefixes:
|
|
- oai_dc
|
|
- ese
|
|
- edm
|
|
sets:
|
|
- name: "Fotografische documenten"
|
|
spec: "foto"
|
|
- name: "Kaarten en tekeningen"
|
|
spec: "kaarten"
|
|
```
|
|
|
|
**Harvesting Workflow:**
|
|
|
|
1. Call Identify to get repository info
|
|
2. Call ListMetadataFormats to discover available formats
|
|
3. Call ListSets to understand collection structure
|
|
4. Call ListRecords with resumptionToken for incremental harvesting
|
|
|
|
**See Also:**
|
|
|
|
- OAI-PMH Specification: http://www.openarchives.org/OAI/openarchivesprotocol.html
|
|
- Europeana Harvesting: https://pro.europeana.eu/page/harvesting-and-downloads
|
|
attributes:
|
|
base_url:
|
|
slot_uri: dcat:endpointURL
|
|
description: "Base URL for OAI-PMH requests.\n\nAll OAI-PMH verbs are appended\
|
|
\ as query parameters to this URL.\n\nExample: \n- Base URL: \"https://www.nationaalarchief.nl/onderzoeken/oai-pmh\"\
|
|
\n- Identify request: \"https://www.nationaalarchief.nl/onderzoeken/oai-pmh?verb=Identify\"\
|
|
\n"
|
|
range: uri
|
|
required: true
|
|
protocol_version:
|
|
slot_uri: schema:version
|
|
description: |
|
|
OAI-PMH protocol version supported.
|
|
|
|
Current standard is version 2.0 (since 2002).
|
|
Earlier version 1.x is deprecated but may exist in legacy systems.
|
|
|
|
Values:
|
|
- "2.0" (current standard)
|
|
- "1.1" (legacy)
|
|
- "1.0" (legacy)
|
|
range: string
|
|
pattern: ^[12]\.[0-9]+$
|
|
repository_name:
|
|
slot_uri: schema:name
|
|
description: |
|
|
Human-readable name of the repository.
|
|
|
|
From OAI-PMH Identify response: <repositoryName>
|
|
|
|
Example: "Nationaal Archief"
|
|
range: string
|
|
admin_email:
|
|
slot_uri: schema:email
|
|
description: |
|
|
Email address of repository administrator.
|
|
|
|
From OAI-PMH Identify response: <adminEmail>
|
|
|
|
May be multiple emails.
|
|
|
|
Example: "helpdesk@nationaalarchief.nl"
|
|
range: string
|
|
multivalued: true
|
|
earliest_datestamp:
|
|
slot_uri: dcterms:temporal
|
|
description: |
|
|
Earliest datestamp available in the repository.
|
|
|
|
From OAI-PMH Identify response: <earliestDatestamp>
|
|
|
|
Records with dates before this are not available.
|
|
|
|
Format: YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ (depending on granularity)
|
|
|
|
Example: "2010-01-01"
|
|
range: string
|
|
deleted_record_policy:
|
|
slot_uri: schema:additionalProperty
|
|
description: |
|
|
Policy for reporting deleted records.
|
|
|
|
From OAI-PMH Identify response: <deletedRecord>
|
|
|
|
Values:
|
|
- NO: Repository does not maintain deleted records
|
|
- PERSISTENT: Deleted records are kept with "deleted" status
|
|
- TRANSIENT: Deleted records are kept for some time
|
|
range: OAIDeletedRecordPolicyEnum
|
|
granularity:
|
|
slot_uri: dcterms:accrualPeriodicity
|
|
description: |
|
|
Datestamp granularity supported by the repository.
|
|
|
|
From OAI-PMH Identify response: <granularity>
|
|
|
|
Values:
|
|
- YYYY_MM_DD: Day-level precision (YYYY-MM-DD)
|
|
- YYYY_MM_DD_THH_MM_SS_Z: Second-level precision (YYYY-MM-DDThh:mm:ssZ)
|
|
|
|
Determines format for `from` and `until` parameters in selective harvesting.
|
|
range: OAIGranularityEnum
|
|
metadata_prefixes:
|
|
slot_uri: dcterms:format
|
|
description: |
|
|
Metadata formats (prefixes) supported by this repository.
|
|
|
|
From OAI-PMH ListMetadataFormats response.
|
|
|
|
Common prefixes:
|
|
- **oai_dc**: Dublin Core (required by OAI-PMH specification)
|
|
- **ese**: Europeana Semantic Elements
|
|
- **edm**: Europeana Data Model
|
|
- **mods**: MODS (Metadata Object Description Schema)
|
|
- **marc21**: MARC 21
|
|
- **ead**: Encoded Archival Description
|
|
- **lido**: LIDO (museum objects)
|
|
- **dc**: Dublin Core (variant)
|
|
- **qdc**: Qualified Dublin Core
|
|
|
|
Example: ["oai_dc", "ese", "edm"]
|
|
range: string
|
|
multivalued: true
|
|
required: true
|
|
compression:
|
|
slot_uri: schema:encodingFormat
|
|
description: |
|
|
Compression methods supported.
|
|
|
|
From OAI-PMH Identify response: <compression>
|
|
|
|
Examples: ["gzip", "deflate"]
|
|
range: string
|
|
multivalued: true
|
|
sets:
|
|
slot_uri: dcat:theme
|
|
description: "Sets (collections) available for selective harvesting.\n\nFrom\
|
|
\ OAI-PMH ListSets response.\n\nStructured as list of set specifications\
|
|
\ with names.\n\nExample:\n```yaml\nsets:\n - spec: \"foto\"\n name:\
|
|
\ \"Fotografische documenten\"\n - spec: \"kaarten\" \n name: \"Kaarten\
|
|
\ en tekeningen\"\n```\n"
|
|
range: OAIPMHSet
|
|
multivalued: true
|
|
inlined_as_list: true
|
|
sample_identifier:
|
|
slot_uri: dcterms:identifier
|
|
description: |
|
|
Example identifier format used by this repository.
|
|
|
|
From OAI-PMH Identify response: <sampleIdentifier> (optional in spec)
|
|
|
|
Helps understand the identifier scheme.
|
|
|
|
Example: "oai:nationaalarchief.nl:2.04.87.01"
|
|
range: string
|
|
description:
|
|
slot_uri: dcterms:description
|
|
description: |
|
|
Repository description from OAI-PMH Identify response.
|
|
|
|
May contain structured XML (oai-identifier, eprints, friends, etc.)
|
|
or free-text description.
|
|
range: string
|
|
supports_resumption_token:
|
|
slot_uri: schema:additionalProperty
|
|
description: |
|
|
Whether the repository supports resumption tokens for large result sets.
|
|
|
|
Required by OAI-PMH spec for ListRecords/ListIdentifiers/ListSets.
|
|
Most compliant repositories support this.
|
|
range: boolean
|
|
batch_size:
|
|
slot_uri: schema:maxValue
|
|
description: |
|
|
Typical number of records per response (before resumption token).
|
|
|
|
Not part of OAI-PMH spec but useful for harvesting optimization.
|
|
|
|
Example: 100
|
|
range: integer
|
|
total_records:
|
|
slot_uri: schema:numberOfItems
|
|
description: |
|
|
Total number of records in the repository (approximate).
|
|
|
|
From completeListSize attribute in OAI-PMH response (optional).
|
|
|
|
Example: 1500000
|
|
range: integer
|
|
last_harvested:
|
|
slot_uri: schema:dateModified
|
|
description: |
|
|
Date when this endpoint was last successfully harvested.
|
|
|
|
Useful for tracking incremental harvesting.
|
|
|
|
ISO 8601 format.
|
|
|
|
Example: "2025-12-01T10:30:00Z"
|
|
range: datetime
|
|
slot_usage:
|
|
protocol:
|
|
description: Protocol is always OAI-PMH for this endpoint type. Value should
|
|
be OAI_PMH.
|
|
response_format:
|
|
description: |
|
|
For OAI-PMH, always ["application/xml"].
|
|
OAI-PMH responses are always XML.
|
|
ifabsent: string(application/xml)
|
|
specificity_annotation:
|
|
range: SpecificityAnnotation
|
|
inlined: true
|
|
template_specificity:
|
|
range: TemplateSpecificityScores
|
|
inlined: true
|
|
comments:
|
|
- Primary protocol for metadata harvesting in heritage sector
|
|
- All compliant repositories MUST support oai_dc (Dublin Core) format
|
|
- Use incremental harvesting with from/until parameters for efficiency
|
|
see_also:
|
|
- http://www.openarchives.org/OAI/openarchivesprotocol.html
|
|
- https://www.openarchives.org/OAI/2.0/guidelines.htm
|
|
slots:
|
|
- specificity_annotation
|
|
- template_specificity
|
|
OAIPMHSet:
|
|
class_uri: hc:OAIPMHSet
|
|
description: |
|
|
Represents an OAI-PMH Set (collection) available for selective harvesting.
|
|
|
|
Sets provide hierarchical organization of records, allowing harvesters
|
|
to request records from specific collections.
|
|
|
|
Set hierarchies use colon separator: "category:subcategory:item"
|
|
attributes:
|
|
spec:
|
|
slot_uri: dcterms:identifier
|
|
description: |
|
|
Set specification (setSpec).
|
|
|
|
Machine-readable identifier used in OAI-PMH requests.
|
|
|
|
Hierarchy indicated by colons (e.g., "photo:portraits:19th-century").
|
|
|
|
Example: "foto"
|
|
range: string
|
|
required: true
|
|
name:
|
|
slot_uri: schema:name
|
|
description: |
|
|
Human-readable set name (setName).
|
|
|
|
Example: "Fotografische documenten"
|
|
range: string
|
|
required: true
|
|
description:
|
|
slot_uri: dcterms:description
|
|
description: |
|
|
Optional description of the set contents.
|
|
range: string
|
|
parent_spec:
|
|
slot_uri: schema:isPartOf
|
|
description: |
|
|
Parent set specification for hierarchical sets.
|
|
|
|
If spec is "photo:portraits", parent_spec would be "photo".
|
|
range: string
|
|
record_count:
|
|
slot_uri: schema:numberOfItems
|
|
description: |
|
|
Approximate number of records in this set.
|
|
|
|
Not part of OAI-PMH spec but useful if available.
|
|
range: integer
|
|
slots:
|
|
- specificity_annotation
|
|
- template_specificity
|
|
slot_usage:
|
|
specificity_annotation:
|
|
range: SpecificityAnnotation
|
|
inlined: true
|
|
template_specificity:
|
|
range: TemplateSpecificityScores
|
|
inlined: true
|
|
enums:
|
|
OAIDeletedRecordPolicyEnum:
|
|
description: |
|
|
OAI-PMH deleted record support policy.
|
|
|
|
Determines how the repository handles records that have been deleted.
|
|
permissible_values:
|
|
false:
|
|
description: |
|
|
Repository does not maintain information about deletions.
|
|
|
|
No "deleted" status will ever be returned.
|
|
Harvesters cannot detect deleted records.
|
|
TRANSIENT:
|
|
description: |
|
|
Repository maintains deleted records for an unspecified period.
|
|
|
|
Harvesters may see "deleted" status, but it's not guaranteed
|
|
to persist indefinitely.
|
|
PERSISTENT:
|
|
description: |
|
|
Repository persistently maintains deleted records.
|
|
|
|
Once a record is assigned "deleted" status, it remains
|
|
queryable with that status.
|
|
|
|
Recommended for reliable incremental harvesting.
|
|
OAIGranularityEnum:
|
|
description: |
|
|
OAI-PMH datestamp granularity.
|
|
|
|
Determines the precision of date/time values used in
|
|
selective harvesting (from/until parameters).
|
|
permissible_values:
|
|
YYYY_MM_DD:
|
|
description: |
|
|
Day-level granularity.
|
|
|
|
Format: YYYY-MM-DD
|
|
|
|
Example: "2025-12-14"
|
|
YYYY_MM_DD_THH_MM_SS_Z:
|
|
description: |
|
|
Second-level granularity.
|
|
|
|
Format: YYYY-MM-DDThh:mm:ssZ
|
|
|
|
Example: "2025-12-14T10:30:00Z"
|
|
|
|
Note: Must use UTC (Z suffix).
|