Fix LinkML URI conflicts and generate RDF outputs

- Fix scope_note → finding_aid_scope_note in FindingAid.yaml
- Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead)
- Remove duplicate rico_record_set_type from class_metadata_slots.yaml
- Fix range types for equals_string compatibility (uriorcurie → string)
- Move class names from close_mappings to see_also in 10 RecordSetTypes files
- Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context
- Sync schemas to frontend/public/schemas/

Files: 1,151 changed (includes prior CustodianType migration)
This commit is contained in:
kempersc 2026-01-07 12:32:59 +01:00
parent 6c6810fa43
commit 98c42bf272
1132 changed files with 47267 additions and 18837 deletions

View file

@ -0,0 +1,143 @@
# RiC-O RecordSetType Alignment Rules
## RiC-O 1.1 Structure (Actual)
### rico:RecordSetType CLASS
**Location**: RiC-O_1-1.rdf lines 29199-29252
```turtle
rico:RecordSetType a owl:Class ;
rdfs:subClassOf rico:Type ;
rdfs:comment "A broad categorization of the type of Record Set."@en .
```
This is a **class** meant to be instantiated with specific record set type concepts.
### RiC-O Provided Instances (Named Individuals)
RiC-O 1.1 provides **four named individuals** in the `recordSetTypes` vocabulary:
| Individual | URI | Description |
|------------|-----|-------------|
| **Fonds** | `rico-rst:Fonds` | Organic whole of records from one creator |
| **Series** | `rico-rst:Series` | Documents arranged by filing system |
| **File** | `rico-rst:File` | Unit of documents grouped together |
| **Collection** | `rico-rst:Collection` | Artificial assemblage without provenance |
**Key**: These are **instances** of BOTH `rico:RecordSetType` AND `skos:Concept`:
```turtle
rico-rst:Fonds a rico:RecordSetType, skos:Concept ;
skos:inScheme rico-rst: ;
skos:topConceptOf rico-rst: ;
skos:definition "The whole of the records... organically created..."@en .
```
### Full URIs
- **Namespace**: `https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#`
- **Prefix**: `rico-rst:` (commonly used)
- `rico-rst:Fonds` = `https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Fonds`
## Our Approach: Classes as RecordSetType Subclasses
We create **LinkML classes** with `class_uri: rico:RecordSetType`. These classes are themselves record set type definitions that can be instantiated.
### Why Classes Instead of Instances?
1. **Extensibility**: Classes allow slots and inheritance patterns
2. **LinkML idiom**: LinkML works with class definitions
3. **Validation**: Classes enable property constraints
4. **Documentation**: Rich documentation in class definitions
### Correct Mapping Predicates
Since `rico-rst:Fonds`, `rico-rst:Series`, `rico-rst:Collection`, `rico-rst:File` are **individuals** (not classes), we cannot use them in `broad_mappings` (which implies class hierarchy).
**Instead, use**:
| Predicate | Use For |
|-----------|---------|
| `related_mappings` | Conceptual relationship to RiC-O individuals |
| `see_also` | Reference to related RiC-O concepts |
| Custom annotation | `rico_organizational_principle` with value `fonds`, `series`, `collection`, `file` |
### Correct Pattern
```yaml
UniversityAdministrativeFonds:
is_a: AcademicArchiveRecordSetType
class_uri: rico:RecordSetType
description: |
A rico:RecordSetType for university administrative records organized as a fonds.
**RiC-O Alignment**:
This class is a specialized rico:RecordSetType. Records classified with this
type follow the fonds organizational principle as defined by rico-rst:Fonds
(respect des fonds / provenance-based organization).
# CORRECT: Use related_mappings for conceptual relationship to individual
related_mappings:
- https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Fonds
# CORRECT: Use see_also for reference
see_also:
- rico:RecordSetType
- https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Fonds
annotations:
# CORRECT: Document organizational principle as annotation
rico_organizational_principle: fonds
rico_organizational_principle_uri: https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Fonds
rico_note: >-
This RecordSetType classifies record sets following the fonds principle.
The rico-rst:Fonds individual defines the standard archival concept of fonds.
```
### INCORRECT Pattern (Do Not Use)
```yaml
# WRONG - rico:Fonds is NOT a class, cannot use in broad_mappings
broad_mappings:
- rico:Fonds # ❌ This individual, not a class!
# WRONG - Using shorthand that doesn't resolve
broad_mappings:
- rico-rst:Fonds # ❌ Prefix not defined in LinkML
```
## Prefixes to Include
When referencing the RiC-O recordSetTypes vocabulary:
```yaml
prefixes:
rico: https://www.ica.org/standards/RiC/ontology#
rico-rst: https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#
```
## Summary
| RiC-O Concept | Type | Use In |
|---------------|------|--------|
| `rico:RecordSetType` | CLASS | `class_uri`, `exact_mappings` |
| `rico-rst:Fonds` | INDIVIDUAL | `related_mappings`, `see_also`, annotation |
| `rico-rst:Series` | INDIVIDUAL | `related_mappings`, `see_also`, annotation |
| `rico-rst:Collection` | INDIVIDUAL | `related_mappings`, `see_also`, annotation |
| `rico-rst:File` | INDIVIDUAL | `related_mappings`, `see_also`, annotation |
## Files to Update
All `*RecordSetTypes.yaml` files need correction:
- `AcademicArchiveRecordSetTypes.yaml`
- `MunicipalArchiveRecordSetTypes.yaml`
- `ChurchArchiveRecordSetTypes.yaml`
- `CompanyArchiveRecordSetTypes.yaml`
- `RegionalArchiveRecordSetTypes.yaml`
- (and any future files)
---
**Created**: 2026-01-05
**Agent**: opencode-claude-sonnet-4

View file

@ -0,0 +1,317 @@
# Rule 38: Slot Centralization and Semantic URI Requirements
🚨 **CRITICAL**: All LinkML slots MUST be centralized in `schemas/20251121/linkml/modules/slots/` and MUST have semantically sound `slot_uri` predicates from base ontologies.
---
## 1. Slot Centralization is Mandatory
**Location**: All slot definitions MUST be in `schemas/20251121/linkml/modules/slots/`
**File Naming**: `{slot_name}.yaml` (snake_case)
**Import Pattern**: Classes import slots via relative imports:
```yaml
# In modules/classes/Collection.yaml
imports:
- ../slots/collection_name
- ../slots/collection_type_ref
- ../slots/parent_collection
```
### Why Centralization?
1. **UML Visualization**: The frontend's schema service loads slots from `modules/slots/` to determine aggregation edges. Inline slots in class files are NOT properly parsed for visualization.
2. **Reusability**: Slots can be used by multiple classes without duplication.
3. **Semantic Consistency**: Single source of truth for slot semantics prevents drift.
4. **Maintainability**: Changes to slot semantics propagate automatically to all classes.
### Anti-Pattern: Inline Slot Definitions
```yaml
# ❌ WRONG - Slots defined inline in class file
classes:
Collection:
slots:
- collection_name
- parent_collection
slots: # ← This section in a class file is WRONG
collection_name:
range: string
```
```yaml
# ✅ CORRECT - Slots imported from centralized files
# In modules/classes/Collection.yaml
imports:
- ../slots/collection_name
- ../slots/parent_collection
classes:
Collection:
slots:
- collection_name
- parent_collection
```
---
## 2. Every Slot MUST Have `slot_uri`
**`slot_uri`** provides the semantic meaning of the slot in a linked data context. It maps your slot to a predicate from an established ontology.
### Required Slot File Structure
```yaml
# Global slot definition for {slot_name}
# Used by: {list of classes}
id: https://nde.nl/ontology/hc/slot/{slot_name}
name: {slot_name}
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
# Add ontology prefixes as needed
rico: https://www.ica.org/standards/RiC/ontology#
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
slots:
{slot_name}:
slot_uri: {ontology_prefix}:{predicate} # ← REQUIRED
description: |
Description of the slot's semantic meaning.
{OntologyName}: {predicate} - "{definition from ontology}"
range: {ClassName or primitive}
required: true/false
multivalued: true/false
# Optional mappings for additional semantic relationships
exact_mappings:
- schema:alternatePredicate
close_mappings:
- dct:relatedPredicate
examples:
- value: {example}
description: {explanation}
```
### Ontology Sources for `slot_uri`
Consult these base ontology files in `/data/ontology/`:
| Ontology | File | Namespace | Use Cases |
|----------|------|-----------|-----------|
| **RiC-O** | `RiC-O_1-1.rdf` | `rico:` | Archival records, record sets, custody |
| **CIDOC-CRM** | `CIDOC_CRM_v7.1.3.rdf` | `crm:` | Cultural heritage objects, events |
| **Schema.org** | `schemaorg.owl` | `schema:` | Web semantics, general properties |
| **SKOS** | `skos.rdf` | `skos:` | Labels, concepts, mappings |
| **Dublin Core** | `dublin_core_elements.rdf` | `dcterms:` | Metadata properties |
| **PROV-O** | `prov-o.ttl` | `prov:` | Provenance tracking |
| **PAV** | `pav.rdf` | `pav:` | Provenance, authoring, versioning |
| **TOOI** | `tooiont.ttl` | `tooi:` | Dutch government organizations |
| **CPOV** | `core-public-organisation-ap.ttl` | `cpov:` | EU public sector |
| **ORG** | `org.rdf` | `org:` | Organizations, units, roles |
| **FOAF** | `foaf.ttl` | `foaf:` | People, agents, social network |
| **GLEIF** | `gleif_base.ttl` | `gleif_base:` | Legal entities |
### Example: Correct Slot with `slot_uri`
```yaml
# modules/slots/preferred_label.yaml
id: https://nde.nl/ontology/hc/slot/preferred_label
name: preferred_label_slot
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
schema: http://schema.org/
rdfs: http://www.w3.org/2000/01/rdf-schema#
slots:
preferred_label:
slot_uri: skos:prefLabel # ← REQUIRED
description: |
The primary display name for this entity.
SKOS: prefLabel - "A preferred lexical label for a resource."
This is the CANONICAL name - the standardized label accepted by the
entity itself for public representation.
range: string
required: false
exact_mappings:
- schema:name
- rdfs:label
examples:
- value: "Rijksmuseum"
description: Primary display name for the Rijksmuseum
```
---
## 3. Mappings Can Apply to Both Classes AND Slots
LinkML provides SKOS-based mapping predicates that work on **both classes and slots**:
| Mapping Type | Predicate | Use Case |
|--------------|-----------|----------|
| `exact_mappings` | `skos:exactMatch` | Identical meaning |
| `close_mappings` | `skos:closeMatch` | Very similar meaning |
| `related_mappings` | `skos:relatedMatch` | Semantically related |
| `narrow_mappings` | `skos:narrowMatch` | More specific |
| `broad_mappings` | `skos:broadMatch` | More general |
### When to Use Mappings vs. slot_uri
| Scenario | Use |
|----------|-----|
| **Primary semantic identity** | `slot_uri` (exactly one) |
| **Equivalent predicates in other ontologies** | `exact_mappings` (multiple allowed) |
| **Similar but not identical predicates** | `close_mappings` |
| **Related predicates with different scope** | `narrow_mappings` / `broad_mappings` |
### Example: Slot with Multiple Mappings
```yaml
slots:
website:
slot_uri: gleif_base:hasWebsite # Primary predicate
range: uri
description: |
Official website URL of the organization or entity.
gleif_base:hasWebsite - "A website associated with something"
exact_mappings:
- schema:url # Identical meaning in Schema.org
close_mappings:
- foaf:homepage # Similar but specifically "main" page
```
### Example: Class with Multiple Mappings
```yaml
classes:
Collection:
class_uri: rico:RecordSet # Primary class
exact_mappings:
- crm:E78_Curated_Holding # CIDOC-CRM equivalent
close_mappings:
- bf:Collection # BIBFRAME close match
narrow_mappings:
- edm:ProvidedCHO # Europeana (narrower - cultural heritage objects)
```
---
## 4. Workflow for Creating a New Slot
### Step 1: Search Base Ontologies
Before creating a slot, search for existing predicates:
```bash
# Search for relevant predicates
rg "website|homepage|url" /data/ontology/*.ttl /data/ontology/*.rdf /data/ontology/*.owl
# Check specific ontology
rg "rdfs:label|rdfs:comment" /data/ontology/schemaorg.owl | grep -i "name"
```
### Step 2: Document Ontology Alignment
In the slot file, document WHY you chose that predicate:
```yaml
slots:
source_url:
slot_uri: pav:retrievedFrom
description: |
URL of the web page from which data was retrieved.
pav:retrievedFrom - "The URI from which the resource was retrieved."
Chosen over:
- schema:url (too generic - refers to the entity's URL, not source)
- dct:source (refers to intellectual source, not retrieval location)
- prov:wasDerivedFrom (refers to entity derivation, not retrieval)
```
### Step 3: Create Centralized Slot File
```bash
# Create new slot file
touch schemas/20251121/linkml/modules/slots/new_slot_name.yaml
```
### Step 4: Update Manifest
Run the manifest regeneration script or manually add to manifest:
```bash
cd schemas/20251121/linkml
python3 scripts/regenerate_manifest.py
```
### Step 5: Import in Class Files
Add the import to classes that use this slot.
---
## 5. Validation Checklist
Before committing slot changes:
- [ ] Slot file is in `modules/slots/`
- [ ] Slot has `slot_uri` pointing to an established ontology predicate
- [ ] Predicate is from `data/ontology/` files or standard vocabularies
- [ ] Description includes ontology definition
- [ ] Rationale documented if multiple predicates were considered
- [ ] `exact_mappings`/`close_mappings` added for equivalent predicates
- [ ] Manifest updated to include new slot file
- [ ] Classes using the slot have been updated with import
- [ ] Frontend slot files synced: `frontend/public/schemas/20251121/linkml/modules/slots/`
---
## 6. Common Slot URI Mappings
| Slot Concept | Recommended `slot_uri` | Alternative Mappings |
|--------------|------------------------|---------------------|
| Preferred name | `skos:prefLabel` | `schema:name`, `rdfs:label` |
| Alternative names | `skos:altLabel` | `schema:alternateName` |
| Description | `dcterms:description` | `schema:description`, `rdfs:comment` |
| Identifier | `dcterms:identifier` | `schema:identifier` |
| Website URL | `gleif_base:hasWebsite` | `schema:url`, `foaf:homepage` |
| Source URL | `pav:retrievedFrom` | `prov:wasDerivedFrom` |
| Created date | `dcterms:created` | `schema:dateCreated`, `prov:generatedAtTime` |
| Modified date | `dcterms:modified` | `schema:dateModified` |
| Language | `schema:inLanguage` | `dcterms:language` |
| Part of | `dcterms:isPartOf` | `rico:isOrWasPartOf`, `schema:isPartOf` |
| Has part | `dcterms:hasPart` | `rico:hasOrHadPart`, `schema:hasPart` |
| Location | `schema:location` | `locn:address`, `crm:P53_has_former_or_current_location` |
| Start date | `schema:startDate` | `prov:startedAtTime`, `rico:hasBeginningDate` |
| End date | `schema:endDate` | `prov:endedAtTime`, `rico:hasEndDate` |
---
## See Also
- [LinkML slot_uri documentation](https://linkml.io/linkml-model/latest/docs/slot_uri/)
- [LinkML mappings documentation](https://linkml.io/linkml-model/latest/docs/mappings/)
- [LinkML URIs and Mappings guide](https://linkml.io/linkml/schemas/uris-and-mappings.html)
- Rule 1: Ontology Files Are Your Primary Reference
- Rule 0: LinkML Schemas Are the Single Source of Truth
---
**Version**: 1.0.0
**Created**: 2026-01-06
**Author**: OpenCODE

View file

@ -23,7 +23,7 @@ This is NOT a simple data extraction project. This is an **ontology engineering
## 🚨 CRITICAL RULES FOR ALL AGENTS
This section summarizes 37 critical rules. Each rule has complete documentation in `.opencode/` files.
This section summarizes 38 critical rules. Each rule has complete documentation in `.opencode/` files.
### Rule 0: LinkML Schemas Are the Single Source of Truth
@ -734,6 +734,42 @@ classes:
---
### Rule 38: Slot Centralization and Semantic URI Requirements
🚨 **CRITICAL**: All LinkML slots MUST be centralized in `schemas/20251121/linkml/modules/slots/` and MUST have semantically sound `slot_uri` predicates from base ontologies.
**Key Requirements**:
1. **Centralization**: All slots MUST be defined in `modules/slots/`, never inline in class files
2. **slot_uri**: Every slot MUST have a `slot_uri` from base ontologies (`data/ontology/`)
3. **Mappings**: Use `exact_mappings`, `close_mappings`, `related_mappings`, `narrow_mappings`, `broad_mappings` for additional semantic relationships
**Why This Matters**:
- **Frontend UML visualization** depends on centralized slots for edge rendering
- **Semantic URIs** enable linked data interoperability and RDF serialization
- **Mapping annotations** connect to SKOS-based vocabulary alignment standards
**Common slot_uri Sources**:
| Ontology | Prefix | Example Predicates |
|----------|--------|-------------------|
| SKOS | `skos:` | `prefLabel`, `altLabel`, `definition`, `note` |
| Schema.org | `schema:` | `name`, `description`, `url`, `dateCreated` |
| Dublin Core | `dcterms:` | `identifier`, `title`, `creator`, `date` |
| PROV-O | `prov:` | `wasGeneratedBy`, `wasAttributedTo`, `atTime` |
| RiC-O | `rico:` | `hasRecordSetType`, `isOrWasPartOf` |
| CIDOC-CRM | `crm:` | `P1_is_identified_by`, `P2_has_type` |
**Workflow for New Slots**:
1. Search `data/ontology/` for existing predicate
2. Create file in `modules/slots/` with `slot_uri`
3. Add mappings to related predicates in other ontologies
4. Update `manifest.json` with new slot file
**See**: `.opencode/rules/slot-centralization-and-semantic-uri-rule.md` for complete documentation
---
## Appendix: Full Rule Content (No .opencode Equivalent)
The following rules have no separate .opencode file and are preserved in full:

View file

@ -1 +1 @@
{"root":["../../src/app.tsx","../../src/main.tsx","../../src/vite-env.d.ts","../../src/components/changepassworddialog.tsx","../../src/components/sparqlexplorer.tsx","../../src/components/ontology/datamapviewer.tsx","../../src/components/ontology/linkmlschemaviewer.tsx","../../src/components/ontology/ontologyviewer.tsx","../../src/config/api.ts","../../src/context/authcontext.tsx","../../src/lib/semantic-cache.ts","../../src/lib/linkml/custodian-data-mappings.ts","../../src/lib/linkml/schema-loader.ts","../../src/lib/ontology/ontology-loader.ts","../../src/pages/browsepage.tsx","../../src/pages/chatpage.tsx","../../src/pages/loginpage.tsx","../../src/pages/mappage.tsx","../../src/pages/ontologypage.tsx","../../src/pages/rulespage.tsx","../../src/pages/statspage.tsx","../../src/services/authapi.ts"],"version":"5.9.3"}
{"root":["../../src/app.tsx","../../src/main.tsx","../../src/vite-env.d.ts","../../src/components/changepassworddialog.tsx","../../src/components/debugpanel.tsx","../../src/components/sparqlexplorer.tsx","../../src/components/ontology/datamapviewer.tsx","../../src/components/ontology/linkmlschemaviewer.tsx","../../src/components/ontology/ontologyviewer.tsx","../../src/config/api.ts","../../src/context/authcontext.tsx","../../src/lib/semantic-cache.ts","../../src/lib/linkml/custodian-data-mappings.ts","../../src/lib/linkml/schema-loader.ts","../../src/lib/ontology/ontology-loader.ts","../../src/pages/browsepage.tsx","../../src/pages/chatpage.tsx","../../src/pages/loginpage.tsx","../../src/pages/mappage.tsx","../../src/pages/ontologypage.tsx","../../src/pages/rulespage.tsx","../../src/pages/statspage.tsx","../../src/services/authapi.ts"],"version":"5.9.3"}

1
apps/archief-assistent/node_modules/@types/d3 generated vendored Symbolic link
View file

@ -0,0 +1 @@
../../../../node_modules/.pnpm/@types+d3@7.4.3/node_modules/@types/d3

1
apps/archief-assistent/node_modules/d3 generated vendored Symbolic link
View file

@ -0,0 +1 @@
../../../node_modules/.pnpm/d3@7.9.0/node_modules/d3

1
apps/archief-assistent/node_modules/lucide-react generated vendored Symbolic link
View file

@ -0,0 +1 @@
../../../node_modules/.pnpm/lucide-react@0.511.0_react@19.2.3/node_modules/lucide-react

View file

@ -28,9 +28,12 @@
"react-markdown": "^10.1.0",
"react-router-dom": "^7.9.6",
"rehype-raw": "^7.0.0",
"remark-gfm": "^4.0.1"
"remark-gfm": "^4.0.1",
"lucide-react": "^0.511.0",
"d3": "^7.9.0"
},
"devDependencies": {
"@types/d3": "^7.4.3",
"@types/js-yaml": "^4.0.9",
"@types/node": "^24.10.1",
"@types/react": "^19.2.5",

View file

@ -0,0 +1,675 @@
/**
* DebugPanel.css
*
* Styles for the enhanced debug panel with tabs for:
* - Raw Results (JSON view)
* - Knowledge Graph (D3 visualization)
* - Embeddings (PCA projection)
*/
/* Main container */
.debug-panel {
background: var(--color-surface, #1e1e1e);
border: 1px solid var(--color-border, #333);
border-radius: 8px;
margin-top: 12px;
overflow: hidden;
font-size: 13px;
}
/* Tab navigation */
.debug-panel__tabs {
display: flex;
gap: 0;
border-bottom: 1px solid var(--color-border, #333);
background: var(--color-surface-elevated, #252525);
}
.debug-panel__tab {
display: flex;
align-items: center;
gap: 6px;
padding: 8px 14px;
border: none;
background: transparent;
color: var(--color-text-secondary, #888);
cursor: pointer;
transition: all 0.15s ease;
font-size: 12px;
font-weight: 500;
border-bottom: 2px solid transparent;
margin-bottom: -1px;
}
.debug-panel__tab:hover {
color: var(--color-text-primary, #fff);
background: rgba(255, 255, 255, 0.05);
}
.debug-panel__tab--active {
color: var(--color-accent, #3b82f6);
border-bottom-color: var(--color-accent, #3b82f6);
background: rgba(59, 130, 246, 0.08);
}
.debug-panel__tab svg {
flex-shrink: 0;
}
/* Tab content */
.debug-panel__content {
max-height: 350px;
overflow-y: auto;
}
/* ============================================
Raw Results Tab Styles
============================================ */
.debug-panel__raw-results {
padding: 12px;
}
/* Toolbar */
.debug-panel__toolbar {
display: flex;
gap: 10px;
align-items: center;
margin-bottom: 12px;
padding-bottom: 10px;
border-bottom: 1px solid var(--color-border, #333);
}
.debug-panel__search {
flex: 1;
display: flex;
align-items: center;
gap: 6px;
background: var(--color-surface-elevated, #252525);
border: 1px solid var(--color-border, #333);
border-radius: 6px;
padding: 6px 10px;
}
.debug-panel__search svg {
color: var(--color-text-secondary, #888);
flex-shrink: 0;
}
.debug-panel__search-input {
flex: 1;
border: none;
background: transparent;
color: var(--color-text-primary, #fff);
font-size: 12px;
outline: none;
}
.debug-panel__search-input::placeholder {
color: var(--color-text-secondary, #666);
}
.debug-panel__search-clear {
display: flex;
align-items: center;
justify-content: center;
width: 16px;
height: 16px;
border: none;
background: var(--color-border, #444);
color: var(--color-text-secondary, #888);
border-radius: 50%;
cursor: pointer;
transition: all 0.15s ease;
}
.debug-panel__search-clear:hover {
background: var(--color-text-secondary, #666);
color: var(--color-text-primary, #fff);
}
/* Copy button */
.debug-panel__copy-btn {
display: flex;
align-items: center;
gap: 6px;
padding: 6px 12px;
border: 1px solid var(--color-border, #444);
background: var(--color-surface-elevated, #252525);
color: var(--color-text-secondary, #aaa);
border-radius: 6px;
cursor: pointer;
font-size: 12px;
transition: all 0.15s ease;
}
.debug-panel__copy-btn:hover {
background: rgba(255, 255, 255, 0.08);
border-color: var(--color-text-secondary, #666);
color: var(--color-text-primary, #fff);
}
.debug-panel__copy-btn--copied {
border-color: #10b981;
color: #10b981;
background: rgba(16, 185, 129, 0.1);
}
/* Collapsible sections */
.debug-panel__section {
margin-bottom: 10px;
border: 1px solid var(--color-border, #333);
border-radius: 6px;
overflow: hidden;
}
.debug-panel__section-header {
display: flex;
justify-content: space-between;
align-items: center;
width: 100%;
padding: 10px 12px;
border: none;
background: var(--color-surface-elevated, #252525);
color: var(--color-text-primary, #fff);
cursor: pointer;
transition: background 0.15s ease;
}
.debug-panel__section-header:hover {
background: rgba(255, 255, 255, 0.05);
}
.debug-panel__section-title {
display: flex;
align-items: center;
gap: 8px;
font-size: 12px;
font-weight: 500;
}
.debug-panel__section-count {
color: var(--color-text-secondary, #888);
font-weight: 400;
font-size: 11px;
}
/* JSON display */
.debug-panel__json {
margin: 0;
padding: 12px;
background: var(--color-surface, #1a1a1a);
font-family: 'JetBrains Mono', 'Fira Code', 'Consolas', monospace;
font-size: 11px;
line-height: 1.5;
color: #93c5fd;
white-space: pre-wrap;
word-break: break-word;
overflow-x: auto;
max-height: 200px;
overflow-y: auto;
}
/* Show all toggle */
.debug-panel__show-all {
display: block;
width: 100%;
padding: 8px;
border: 1px dashed var(--color-border, #444);
background: transparent;
color: var(--color-text-secondary, #888);
border-radius: 6px;
cursor: pointer;
font-size: 12px;
transition: all 0.15s ease;
}
.debug-panel__show-all:hover {
border-color: var(--color-accent, #3b82f6);
color: var(--color-accent, #3b82f6);
}
/* Empty state */
.debug-panel__empty {
text-align: center;
padding: 40px 20px;
color: var(--color-text-secondary, #666);
font-size: 13px;
}
/* ============================================
Knowledge Graph Tab Styles
============================================ */
.debug-panel__graph {
position: relative;
padding: 12px;
}
.debug-panel__graph-stats {
display: flex;
gap: 16px;
margin-bottom: 10px;
font-size: 11px;
color: var(--color-text-secondary, #888);
}
.debug-panel__graph-svg {
width: 100%;
height: 250px;
background: var(--color-surface, #1a1a1a);
border-radius: 6px;
border: 1px solid var(--color-border, #333);
}
.debug-panel__node-info {
position: absolute;
bottom: 60px;
left: 20px;
display: flex;
flex-direction: column;
gap: 4px;
background: rgba(30, 30, 30, 0.95);
border: 1px solid var(--color-border, #444);
border-radius: 6px;
padding: 10px 14px;
font-size: 12px;
max-width: 200px;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
}
.debug-panel__node-info strong {
color: var(--color-text-primary, #fff);
font-weight: 600;
}
.debug-panel__node-type {
color: var(--color-accent, #3b82f6);
font-size: 11px;
text-transform: capitalize;
}
.debug-panel__node-score {
color: var(--color-text-secondary, #888);
font-size: 10px;
}
.debug-panel__graph-hint {
text-align: center;
margin-top: 8px;
font-size: 11px;
color: var(--color-text-secondary, #666);
}
/* ============================================
Embeddings Tab Styles
============================================ */
.debug-panel__embeddings {
position: relative;
padding: 12px;
}
.debug-panel__embeddings-stats {
display: flex;
gap: 16px;
margin-bottom: 10px;
font-size: 11px;
color: var(--color-text-secondary, #888);
}
.debug-panel__embeddings-canvas {
width: 100%;
height: 200px;
border-radius: 6px;
border: 1px solid var(--color-border, #333);
cursor: crosshair;
}
.debug-panel__point-info {
position: absolute;
bottom: 60px;
left: 20px;
display: flex;
flex-direction: column;
gap: 4px;
background: rgba(30, 30, 30, 0.95);
border: 1px solid var(--color-border, #444);
border-radius: 6px;
padding: 10px 14px;
font-size: 12px;
max-width: 200px;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
pointer-events: none;
}
.debug-panel__point-info strong {
color: var(--color-text-primary, #fff);
font-weight: 600;
}
.debug-panel__point-type {
color: #8b5cf6;
font-size: 11px;
text-transform: capitalize;
}
.debug-panel__point-score {
color: var(--color-text-secondary, #888);
font-size: 10px;
}
/* ============================================
Graph Controls (Zoom, Cluster, Export)
============================================ */
.debug-panel__graph-controls {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 10px;
padding-bottom: 8px;
border-bottom: 1px solid var(--color-border, #333);
}
.debug-panel__graph-buttons {
display: flex;
gap: 4px;
align-items: center;
}
.debug-panel__icon-btn {
display: flex;
align-items: center;
justify-content: center;
gap: 4px;
padding: 6px 8px;
border: 1px solid var(--color-border, #444);
background: var(--color-surface-elevated, #252525);
color: var(--color-text-secondary, #888);
border-radius: 4px;
cursor: pointer;
font-size: 11px;
transition: all 0.15s ease;
}
.debug-panel__icon-btn:hover {
background: rgba(255, 255, 255, 0.08);
border-color: var(--color-text-secondary, #666);
color: var(--color-text-primary, #fff);
}
.debug-panel__icon-btn--active {
background: rgba(59, 130, 246, 0.15);
border-color: var(--color-accent, #3b82f6);
color: var(--color-accent, #3b82f6);
}
.debug-panel__export-group {
display: flex;
gap: 2px;
margin-left: 8px;
padding-left: 8px;
border-left: 1px solid var(--color-border, #333);
}
.debug-panel__zoom-level {
font-family: 'JetBrains Mono', 'Fira Code', monospace;
font-size: 10px;
color: var(--color-text-secondary, #666);
min-width: 40px;
text-align: right;
}
/* Node close button */
.debug-panel__node-close {
position: absolute;
top: 4px;
right: 4px;
display: flex;
align-items: center;
justify-content: center;
width: 18px;
height: 18px;
border: none;
background: rgba(255, 255, 255, 0.1);
color: var(--color-text-secondary, #888);
border-radius: 50%;
cursor: pointer;
transition: all 0.15s ease;
}
.debug-panel__node-close:hover {
background: rgba(255, 255, 255, 0.2);
color: var(--color-text-primary, #fff);
}
.debug-panel__node-id {
color: var(--color-text-secondary, #666);
font-size: 9px;
font-family: 'JetBrains Mono', 'Fira Code', monospace;
word-break: break-all;
}
/* Graph legend */
.debug-panel__graph-legend {
position: absolute;
top: 58px;
right: 20px;
display: flex;
flex-wrap: wrap;
gap: 8px;
background: rgba(30, 30, 30, 0.9);
border: 1px solid var(--color-border, #333);
border-radius: 4px;
padding: 6px 10px;
max-width: 180px;
}
.debug-panel__legend-item {
display: flex;
align-items: center;
gap: 4px;
font-size: 9px;
color: var(--color-text-secondary, #888);
text-transform: capitalize;
}
.debug-panel__legend-dot {
width: 8px;
height: 8px;
border-radius: 50%;
flex-shrink: 0;
}
/* ============================================
Embeddings Controls
============================================ */
.debug-panel__embeddings-controls {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 10px;
padding-bottom: 8px;
border-bottom: 1px solid var(--color-border, #333);
}
.debug-panel__embeddings-buttons {
display: flex;
gap: 4px;
align-items: center;
}
.debug-panel__embeddings-legend {
display: flex;
flex-wrap: wrap;
gap: 10px;
margin-top: 10px;
padding-top: 8px;
border-top: 1px solid var(--color-border, #333);
}
/* ============================================
Virtual Scrolling / Load More
============================================ */
.debug-panel__load-more {
text-align: center;
padding: 12px;
color: var(--color-text-secondary, #666);
font-size: 11px;
background: var(--color-surface-elevated, #252525);
border-radius: 4px;
margin-top: 8px;
}
/* ============================================
Timeline Tab Styles
============================================ */
.debug-panel__timeline {
position: relative;
padding: 12px;
}
.debug-panel__timeline-stats {
display: flex;
gap: 16px;
margin-bottom: 10px;
font-size: 11px;
color: var(--color-text-secondary, #888);
}
.debug-panel__timeline-svg {
width: 100%;
height: 200px;
background: var(--color-surface, #1a1a1a);
border-radius: 6px;
border: 1px solid var(--color-border, #333);
}
.debug-panel__timeline-axis text {
fill: var(--color-text-secondary, #888);
font-size: 10px;
}
.debug-panel__timeline-axis line,
.debug-panel__timeline-axis path {
stroke: var(--color-border, #444);
}
/* Event info popup */
.debug-panel__event-info {
position: absolute;
bottom: 60px;
left: 20px;
display: flex;
flex-direction: column;
gap: 4px;
background: rgba(30, 30, 30, 0.95);
border: 1px solid var(--color-border, #444);
border-radius: 6px;
padding: 10px 14px;
font-size: 12px;
max-width: 220px;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
}
.debug-panel__event-info strong {
color: var(--color-text-primary, #fff);
font-weight: 600;
padding-right: 20px;
}
.debug-panel__event-close {
position: absolute;
top: 4px;
right: 4px;
display: flex;
align-items: center;
justify-content: center;
width: 18px;
height: 18px;
border: none;
background: rgba(255, 255, 255, 0.1);
color: var(--color-text-secondary, #888);
border-radius: 50%;
cursor: pointer;
transition: all 0.15s ease;
}
.debug-panel__event-close:hover {
background: rgba(255, 255, 255, 0.2);
color: var(--color-text-primary, #fff);
}
.debug-panel__event-date {
color: var(--color-accent, #3b82f6);
font-size: 11px;
}
.debug-panel__event-type {
color: #f59e0b;
font-size: 11px;
text-transform: capitalize;
}
.debug-panel__event-desc {
color: var(--color-text-secondary, #888);
font-size: 11px;
margin: 4px 0 0 0;
line-height: 1.4;
}
/* ============================================
Scrollbar Styling
============================================ */
.debug-panel__content::-webkit-scrollbar,
.debug-panel__json::-webkit-scrollbar {
width: 6px;
}
.debug-panel__content::-webkit-scrollbar-track,
.debug-panel__json::-webkit-scrollbar-track {
background: transparent;
}
.debug-panel__content::-webkit-scrollbar-thumb,
.debug-panel__json::-webkit-scrollbar-thumb {
background: var(--color-border, #444);
border-radius: 3px;
}
.debug-panel__content::-webkit-scrollbar-thumb:hover,
.debug-panel__json::-webkit-scrollbar-thumb:hover {
background: var(--color-text-secondary, #666);
}
/* ============================================
Responsive adjustments
============================================ */
@media (max-width: 768px) {
.debug-panel__tabs {
overflow-x: auto;
}
.debug-panel__tab {
padding: 8px 10px;
font-size: 11px;
}
.debug-panel__tab span {
display: none;
}
.debug-panel__toolbar {
flex-wrap: wrap;
}
.debug-panel__copy-btn span {
display: none;
}
}

File diff suppressed because it is too large Load diff

View file

@ -50,6 +50,11 @@ import type { CachedResponse, CacheStats } from '../lib/semantic-cache'
import { SPARQLExplorer } from '../components/SPARQLExplorer'
import type { SPARQLResult } from '../components/SPARQLExplorer'
// Import Debug Panel component
import { DebugPanel } from '../components/DebugPanel'
import type { DebugPanelTab } from '../components/DebugPanel'
import { Code } from 'lucide-react'
// NA Color palette
const naColors = {
primary: '#007bc7',
@ -367,6 +372,11 @@ function ChatPage() {
})
const [cacheStats, setCacheStats] = useState<CacheStats | null>(null)
// Debug Panel state
const [showDebugPanel, setShowDebugPanel] = useState(false)
const [debugPanelTab, setDebugPanelTab] = useState<DebugPanelTab>('raw')
const [debugResults, setDebugResults] = useState<Record<string, unknown>[]>([])
// Derive provider from selected model
const selectedModelInfo = LLM_MODELS.find(m => m.id === selectedModel) || LLM_MODELS[0]
const llmProvider = selectedModelInfo.provider
@ -692,7 +702,8 @@ function ChatPage() {
}))
// Parse institutions from retrieved_results (metadata is nested)
const institutions: Institution[] = ((data.retrieved_results || []) as Record<string, unknown>[]).map((r: Record<string, unknown>) => {
const retrievedResults = (data.retrieved_results || []) as Record<string, unknown>[]
const institutions: Institution[] = retrievedResults.map((r: Record<string, unknown>) => {
const metadata = (r.metadata || {}) as Record<string, unknown>
const scores = (r.scores || {}) as Record<string, number>
return {
@ -705,6 +716,9 @@ function ChatPage() {
score: scores.combined as number | undefined,
}
})
// Store retrieved results for Debug Panel
setDebugResults(retrievedResults)
// ========================================
// STORE IN CACHE (after successful response)
// ========================================
@ -1105,6 +1119,25 @@ function ChatPage() {
</Container>
</Box>
{/* Debug Panel - collapsible section showing RAG results */}
<Collapse in={showDebugPanel && debugResults.length > 0}>
<Box sx={{
borderTop: '1px solid #e0e0e0',
maxHeight: '400px',
overflow: 'auto',
}}>
<Container maxWidth="md" sx={{ py: 2 }}>
<DebugPanel
results={debugResults}
activeTab={debugPanelTab}
onTabChange={setDebugPanelTab}
t={(key) => key}
language="nl"
/>
</Container>
</Box>
</Collapse>
{/* Input Area */}
<Box
component="form"
@ -1148,6 +1181,21 @@ function ChatPage() {
<DeleteSweepIcon sx={{ fontSize: 16 }} />
</IconButton>
</Tooltip>
{/* Debug Panel Toggle */}
<Tooltip title={showDebugPanel ? "Debug paneel verbergen" : "Debug paneel tonen"} placement="top">
<IconButton
size="small"
onClick={() => setShowDebugPanel(!showDebugPanel)}
sx={{
width: 24,
height: 24,
color: showDebugPanel ? naColors.primary : 'text.secondary',
'&:hover': { color: naColors.primary },
}}
>
<Code size={16} />
</IconButton>
</Tooltip>
</Box>
)}

View file

@ -3,6 +3,28 @@ Heritage RAG Backend
Multi-source retrieval-augmented generation system for heritage custodian data.
Combines Qdrant vector search, Oxigraph SPARQL, TypeDB, and PostGIS.
New modules (v1.1.0):
- temporal_resolver: Temporal conflict resolution for historical facts
- semantic_router: Signal-based query routing (no LLM)
- event_retriever: Hypergraph-based event retrieval
"""
__version__ = "1.0.0"
__version__ = "1.1.0"
# Lazy imports to avoid circular dependencies
def get_temporal_resolver():
from .temporal_resolver import get_temporal_resolver
return get_temporal_resolver()
def get_signal_extractor():
from .semantic_router import get_signal_extractor
return get_signal_extractor()
def get_decision_router():
from .semantic_router import get_decision_router
return get_decision_router()
def create_event_retriever(*args, **kwargs):
from .event_retriever import create_event_retriever
return create_event_retriever(*args, **kwargs)

View file

@ -32,6 +32,20 @@ import httpx
from dspy import Example, Prediction, History
from dspy.streaming import StatusMessage, StreamListener, StatusMessageProvider
# Semantic routing (Signal-Decision pattern) for fast LLM-free query classification
from .semantic_router import (
QuerySignals,
RouteConfig,
get_signal_extractor,
get_decision_router,
)
# Temporal intent extraction for detailed temporal constraint detection
from .temporal_intent import (
TemporalConstraint,
get_temporal_extractor,
)
logger = logging.getLogger(__name__)
@ -1670,14 +1684,34 @@ class HeritageQueryRouter(dspy.Module):
If provided, routing uses this LM instead of the global default.
Recommended: Use a fast model like glm-4.5-flash or gpt-4o-mini
for routing while keeping quality models for answer generation.
signal_threshold: Confidence threshold (0.0-1.0) for signal-based routing.
When semantic signal extraction confidence >= this threshold,
skip LLM classification and use signal-based routing (faster).
Set to 1.0 to always use LLM classification.
Default: 0.8 (skip LLM when signals are clear).
"""
def __init__(self, use_schema_aware: Optional[bool] = None, fast_lm: Optional[dspy.LM] = None):
def __init__(
self,
use_schema_aware: Optional[bool] = None,
fast_lm: Optional[dspy.LM] = None,
signal_threshold: float = 0.8
):
super().__init__()
# Store fast LM for routing (None means use global default)
self.fast_lm = fast_lm
# Signal-Decision pattern: fast LLM-free routing for high-confidence queries
self.signal_extractor = get_signal_extractor()
self.decision_router = get_decision_router()
self.signal_threshold = signal_threshold
# Temporal intent extraction for detailed constraint detection
self.temporal_extractor = get_temporal_extractor()
logger.info(f"HeritageQueryRouter signal threshold: {signal_threshold}")
# Determine whether to use schema-aware signature
if use_schema_aware is None:
use_schema_aware = SCHEMA_LOADER_AVAILABLE
@ -1710,6 +1744,9 @@ class HeritageQueryRouter(dspy.Module):
def forward(self, question: str, language: str = "nl", history: History | None = None) -> Prediction:
"""Classify query and determine routing.
Uses Signal-Decision pattern: fast signal extraction first, then LLM only
when signals are ambiguous (confidence < signal_threshold).
Args:
question: User's current question
language: Language code (nl, en, etc.)
@ -1726,10 +1763,81 @@ class HeritageQueryRouter(dspy.Module):
- target_role_category: Staff role category (when entity_type='person')
- target_staff_role: Specific staff role (when entity_type='person')
- target_custodian_type: Custodian type (when entity_type='institution')
- signal_based: Whether routing was signal-based (True) or LLM-based (False)
- route_config: RouteConfig from semantic router (when signal_based=True)
- temporal_constraint: TemporalConstraint with type, dates, and recommended
SPARQL template (when intent='temporal' and signal_based=True)
"""
if history is None:
history = History(messages=[])
# ===== SIGNAL EXTRACTION (Fast, no LLM) =====
signals = self.signal_extractor.extract_signals(question)
logger.debug(
f"Signal extraction: entity_type={signals.entity_type}, intent={signals.intent}, "
f"confidence={signals.confidence:.2f}, temporal={signals.has_temporal_constraint}"
)
# ===== HIGH-CONFIDENCE SIGNALS: Skip LLM =====
if signals.confidence >= self.signal_threshold:
logger.info(
f"Signal-based routing (confidence={signals.confidence:.2f} >= {self.signal_threshold}): "
f"entity_type={signals.entity_type}, intent={signals.intent}"
)
# Get route configuration from semantic decision router
route_config = self.decision_router.route(signals)
# Map signal intent to source mapping
recommended_sources = self.source_mapping.get(
signals.intent, ["qdrant", "sparql"]
)
# ===== TEMPORAL CONSTRAINT EXTRACTION =====
# When temporal signals detected, extract detailed constraints for SPARQL template selection
temporal_constraint: TemporalConstraint | None = None
if signals.has_temporal_constraint:
temporal_constraint = self.temporal_extractor.extract(question)
logger.debug(
f"Temporal constraint: type={temporal_constraint.constraint_type}, "
f"template={temporal_constraint.recommended_template}, "
f"dates={temporal_constraint.date_start}/{temporal_constraint.date_end}"
)
# Build prediction from signals (no LLM call)
prediction = Prediction(
intent=signals.intent,
entities=signals.institution_mentions + signals.person_mentions,
sources=recommended_sources,
reasoning=f"Signal-based routing (confidence={signals.confidence:.2f})",
resolved_question=question, # No reference resolution without LLM
entity_type=signals.entity_type,
target_role_category='UNKNOWN', # Requires LLM for detailed classification
target_staff_role='UNKNOWN',
target_custodian_type='UNKNOWN',
target_custodian_slug=None,
# Signal-based routing metadata
signal_based=True,
signals=signals,
route_config=route_config,
# Temporal constraint for SPARQL template selection
temporal_constraint=temporal_constraint,
)
# For person queries, try to extract institution slug
if signals.entity_type == 'person' and signals.institution_mentions:
target_custodian_slug = extract_institution_slug_from_query(question)
if target_custodian_slug:
prediction.target_custodian_slug = target_custodian_slug
logger.info(f"Signal-based: extracted institution slug '{target_custodian_slug}'")
return prediction
# ===== LOW-CONFIDENCE SIGNALS: Use LLM =====
logger.info(
f"LLM-based routing (confidence={signals.confidence:.2f} < {self.signal_threshold})"
)
# Use fast LM for routing if configured, otherwise use global default
if self.fast_lm:
with dspy.settings.context(lm=self.fast_lm):
@ -1804,6 +1912,12 @@ class HeritageQueryRouter(dspy.Module):
target_custodian_type=target_custodian_type,
# Institution filter for person queries
target_custodian_slug=target_custodian_slug,
# LLM-based routing metadata
signal_based=False,
signals=signals, # Include signals even for LLM routing (for debugging)
route_config=None,
# Temporal constraint (extracted on demand for LLM-based routing if temporal intent)
temporal_constraint=None, # Could extract here too if result.intent == 'temporal'
)
return prediction
@ -4215,15 +4329,26 @@ class HeritageRAGPipeline(dspy.Module):
result_ghcid = uri.split('/hc/')[-1]
break
if result_ghcid:
# Build nested metadata structure for frontend consistency
sparql_city = sparql_result.get('city', '')
sparql_address = sparql_result.get('address', '')
sparql_website = sparql_result.get('website', '')
sparql_only_result = {
'ghcid': result_ghcid,
'name': sparql_result.get('name', result_ghcid.split('/')[-1].replace('-', ' ').title()),
'type': 'institution',
'source': 'sparql',
# Nested metadata for frontend Knowledge Graph
'metadata': {
'city': sparql_city,
'address': sparql_address,
'website': sparql_website,
},
# Also keep flat fields for backward compatibility
'city': sparql_city,
'address': sparql_address,
'website': sparql_website,
}
for field in ['address', 'website', 'city']:
if field in sparql_result:
sparql_only_result[field] = sparql_result[field]
inst_results.append(type('SPARQLResult', (), sparql_only_result)())
if inst_results:
@ -4247,10 +4372,24 @@ class HeritageRAGPipeline(dspy.Module):
city = getattr(inst, 'city', '')
lat = getattr(inst, 'latitude', None)
lon = getattr(inst, 'longitude', None)
# Build dict for frontend
ghcid = getattr(inst, 'ghcid', None)
address = getattr(inst, 'address', '')
website = getattr(inst, 'website', '')
# Build dict with nested metadata for frontend Knowledge Graph
retrieved_results.append({
"type": "institution",
"ghcid": ghcid,
"name": name,
# Nested metadata for frontend consistency
"metadata": {
"institution_type": inst_type,
"city": city,
"address": address,
"website": website,
"latitude": lat,
"longitude": lon,
},
# Also keep flat fields for backward compatibility
"institution_type": inst_type,
"city": city,
"latitude": lat,
@ -4757,15 +4896,26 @@ class HeritageRAGPipeline(dspy.Module):
result_ghcid = uri.split('/hc/')[-1]
break
if result_ghcid and result_ghcid not in existing_ghcids:
# Build nested metadata structure for frontend consistency
sparql_city = sparql_result.get('city', '')
sparql_address = sparql_result.get('address', '')
sparql_website = sparql_result.get('website', '')
sparql_only_result = {
'ghcid': result_ghcid,
'name': sparql_result.get('name', result_ghcid.split('/')[-1].replace('-', ' ').title()),
'type': 'institution',
'source': 'sparql',
# Nested metadata for frontend Knowledge Graph
'metadata': {
'city': sparql_city,
'address': sparql_address,
'website': sparql_website,
},
# Also keep flat fields for backward compatibility
'city': sparql_city,
'address': sparql_address,
'website': sparql_website,
}
for field in ['address', 'website', 'city']:
if field in sparql_result:
sparql_only_result[field] = sparql_result[field]
filtered_results.append(type('SPARQLResult', (), sparql_only_result)())
sparql_only_count += 1
@ -4784,15 +4934,26 @@ class HeritageRAGPipeline(dspy.Module):
result_ghcid = uri.split('/hc/')[-1]
break
if result_ghcid:
# Build nested metadata structure for frontend consistency
sparql_city = sparql_result.get('city', '')
sparql_address = sparql_result.get('address', '')
sparql_website = sparql_result.get('website', '')
sparql_only_result = {
'ghcid': result_ghcid,
'name': sparql_result.get('name', result_ghcid.split('/')[-1].replace('-', ' ').title()),
'type': 'institution',
'source': 'sparql',
# Nested metadata for frontend Knowledge Graph
'metadata': {
'city': sparql_city,
'address': sparql_address,
'website': sparql_website,
},
# Also keep flat fields for backward compatibility
'city': sparql_city,
'address': sparql_address,
'website': sparql_website,
}
for field in ['address', 'website', 'city']:
if field in sparql_result:
sparql_only_result[field] = sparql_result[field]
inst_results.append(type('SPARQLResult', (), sparql_only_result)())
if inst_results:
@ -4813,11 +4974,30 @@ class HeritageRAGPipeline(dspy.Module):
name = getattr(inst, 'name', 'Unknown')
inst_type = getattr(inst, 'type', '')
city = getattr(inst, 'city', '')
ghcid = getattr(inst, 'ghcid', None)
address = getattr(inst, 'address', '')
website = getattr(inst, 'website', '')
lat = getattr(inst, 'latitude', None)
lon = getattr(inst, 'longitude', None)
# Build dict with nested metadata for frontend Knowledge Graph
retrieved_results.append({
"type": "institution",
"ghcid": ghcid,
"name": name,
# Nested metadata for frontend consistency
"metadata": {
"institution_type": inst_type,
"city": city,
"address": address,
"website": website,
"latitude": lat,
"longitude": lon,
},
# Also keep flat fields for backward compatibility
"institution_type": inst_type,
"city": city,
"latitude": lat,
"longitude": lon,
})
entry = f"- {name}"

View file

@ -0,0 +1,393 @@
"""
Heritage Event Retrieval using Hypergraph Patterns
Retrieves organizational change events (mergers, foundings, etc.) using
multi-factor scoring: entity overlap + semantic similarity + temporal relevance.
Based on: docs/plan/external_design_patterns/04_temporal_semantic_hypergraph.md
"""
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional, Callable, Any
import logging
import numpy as np
logger = logging.getLogger(__name__)
@dataclass
class HeritageEvent:
"""Hyperedge representing a heritage organizational event."""
event_id: str
event_type: str
event_date: datetime
participants: dict[str, str] # role -> GHCID
description: str
affected_collections: list[str] = field(default_factory=list)
resulting_entities: list[str] = field(default_factory=list)
confidence: float = 1.0
embedding: Optional[list[float]] = None
class EventRetriever:
"""
Retrieve heritage events using hypergraph patterns.
Uses multi-factor scoring:
- Entity overlap (entities mentioned in query match event participants)
- Semantic similarity (query embedding vs event description)
- Temporal relevance (how close event date is to query date)
- Graph connectivity (how connected the event is in the knowledge graph)
"""
def __init__(
self,
oxigraph_query_fn: Callable[[str], list[dict]],
qdrant_search_fn: Callable[[str, int], list[dict]],
embed_fn: Callable[[str], list[float]]
):
"""
Args:
oxigraph_query_fn: Function to execute SPARQL queries
qdrant_search_fn: Function to search Qdrant events collection
embed_fn: Function to embed text
"""
self.sparql = oxigraph_query_fn
self.vector_search = qdrant_search_fn
self.embed = embed_fn
def retrieve(
self,
query: str,
query_entities: list[str] = None,
query_time: datetime = None,
event_type: str = None,
limit: int = 10,
weights: dict = None
) -> list[tuple[HeritageEvent, float]]:
"""
Retrieve events using multi-factor scoring.
Args:
query: Natural language query
query_entities: GHCIDs mentioned in query
query_time: Temporal constraint
event_type: Filter by event type (MERGER, FOUNDING, CLOSURE, etc.)
limit: Max results
weights: Scoring weights for each factor
Returns:
List of (event, score) tuples ordered by relevance
"""
if weights is None:
weights = {
"entity": 0.3,
"semantic": 0.4,
"temporal": 0.2,
"graph": 0.1
}
# Phase 1: Candidate generation
candidates = {}
# Entity-based candidates from SPARQL
if query_entities:
sparql_candidates = self._get_entity_candidates(query_entities, event_type)
candidates.update(sparql_candidates)
# Semantic candidates from Qdrant
vector_candidates = self._get_semantic_candidates(query, limit * 2)
candidates.update(vector_candidates)
if not candidates:
logger.info(f"No event candidates found for query: {query}")
return []
# Phase 2: Score all candidates
scored = []
for event_id, event in candidates.items():
score = self._score_event(
event, query, query_entities, query_time, weights
)
scored.append((event, score))
# Sort and return top-k
scored.sort(key=lambda x: x[1], reverse=True)
return scored[:limit]
def retrieve_by_type(
self,
event_type: str,
start_date: datetime = None,
end_date: datetime = None,
limit: int = 50
) -> list[HeritageEvent]:
"""
Retrieve events of a specific type within a date range.
Simpler retrieval for structured queries (no scoring).
"""
date_filter = ""
if start_date:
date_filter += f'FILTER(?date >= "{start_date.isoformat()}"^^xsd:date) '
if end_date:
date_filter += f'FILTER(?date <= "{end_date.isoformat()}"^^xsd:date) '
sparql = f"""
PREFIX hc: <https://nde.nl/ontology/hc/>
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?event ?eventType ?date ?description WHERE {{
?event a hc:OrganizationalChangeEvent ;
hc:eventType ?eventType ;
hc:eventDate ?date .
OPTIONAL {{ ?event schema:description ?description }}
FILTER(?eventType = "{event_type}")
{date_filter}
}}
ORDER BY ?date
LIMIT {limit}
"""
results = self.sparql(sparql)
events = []
for row in results:
event = HeritageEvent(
event_id=row.get("event", ""),
event_type=row.get("eventType", event_type),
event_date=datetime.fromisoformat(row["date"]) if row.get("date") else datetime.now(),
participants={},
description=row.get("description", "")
)
events.append(event)
return events
def _get_entity_candidates(
self,
ghcids: list[str],
event_type: str = None
) -> dict[str, HeritageEvent]:
"""Get events involving specified entities via SPARQL."""
ghcid_filter = ", ".join(f'"{g}"' for g in ghcids)
event_type_filter = f'FILTER(?eventType = "{event_type}")' if event_type else ""
sparql = f"""
PREFIX hc: <https://nde.nl/ontology/hc/>
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX schema: <http://schema.org/>
SELECT DISTINCT ?event ?eventType ?date ?description ?participant ?role WHERE {{
?event a hc:OrganizationalChangeEvent ;
hc:eventType ?eventType ;
hc:eventDate ?date .
OPTIONAL {{ ?event schema:description ?description }}
# Get participants
?event ?role ?participant .
FILTER(STRSTARTS(STR(?role), "http://www.cidoc-crm.org/cidoc-crm/P") ||
STRSTARTS(STR(?role), "https://nde.nl/ontology/hc/"))
{event_type_filter}
}}
"""
results = self.sparql(sparql)
return self._results_to_events(results)
def _get_semantic_candidates(
self,
query: str,
limit: int
) -> dict[str, HeritageEvent]:
"""Get events via semantic similarity."""
try:
results = self.vector_search(query, limit)
except Exception as e:
logger.warning(f"Vector search failed: {e}")
return {}
events = {}
for r in results:
payload = r.get("payload", {}) if isinstance(r, dict) else {}
event_id = r.get("id", str(id(r)))
try:
event_date = datetime.fromisoformat(
payload.get("event_date", datetime.now().isoformat())
)
except (ValueError, TypeError):
event_date = datetime.now()
event = HeritageEvent(
event_id=event_id,
event_type=payload.get("event_type", "UNKNOWN"),
event_date=event_date,
participants=payload.get("participants", {}),
description=payload.get("description", ""),
confidence=r.get("score", 0.5)
)
events[event.event_id] = event
return events
def _score_event(
self,
event: HeritageEvent,
query: str,
query_entities: list[str],
query_time: datetime,
weights: dict
) -> float:
"""Compute multi-factor relevance score."""
scores = {}
# Entity overlap
if query_entities:
event_entities = set(event.participants.values())
overlap = len(event_entities.intersection(set(query_entities)))
scores["entity"] = overlap / max(len(query_entities), 1)
else:
scores["entity"] = 0.5 # Neutral
# Semantic similarity
try:
query_emb = self.embed(query)
if event.embedding:
scores["semantic"] = self._cosine_similarity(query_emb, event.embedding)
elif event.description:
desc_emb = self.embed(event.description)
scores["semantic"] = self._cosine_similarity(query_emb, desc_emb)
else:
scores["semantic"] = 0.5
except Exception as e:
logger.warning(f"Embedding failed: {e}")
scores["semantic"] = 0.5
# Temporal relevance
if query_time and event.event_date:
days_diff = abs((query_time - event.event_date).days)
scores["temporal"] = 1.0 / (1.0 + days_diff / 365.0)
else:
scores["temporal"] = 0.5 # Neutral
# Graph connectivity (placeholder - would use SPARQL for full implementation)
scores["graph"] = 0.5
# Weighted sum
final_score = sum(weights.get(k, 0) * scores.get(k, 0.5) for k in weights)
return final_score
def _cosine_similarity(self, a: list[float], b: list[float]) -> float:
"""Compute cosine similarity between two vectors."""
a_np = np.array(a)
b_np = np.array(b)
norm_product = np.linalg.norm(a_np) * np.linalg.norm(b_np)
if norm_product == 0:
return 0.0
return float(np.dot(a_np, b_np) / norm_product)
def _results_to_events(self, results: list[dict]) -> dict[str, HeritageEvent]:
"""Convert SPARQL results to HeritageEvent objects."""
events = {}
# Group by event ID
by_event: dict[str, dict[str, Any]] = {}
for row in results:
event_id = row.get("event")
if not event_id:
continue
if event_id not in by_event:
by_event[event_id] = {
"event_type": row.get("eventType", "UNKNOWN"),
"date": row.get("date"),
"description": row.get("description", ""),
"participants": {}
}
role = row.get("role", "")
if "/" in role:
role = role.split("/")[-1] # Extract role from URI
participant = row.get("participant")
if role and participant:
by_event[event_id]["participants"][role] = participant
# Convert to HeritageEvent objects
for event_id, data in by_event.items():
try:
event_date = datetime.fromisoformat(data["date"]) if data["date"] else datetime.now()
except (ValueError, TypeError):
event_date = datetime.now()
events[event_id] = HeritageEvent(
event_id=event_id,
event_type=data["event_type"],
event_date=event_date,
participants=data["participants"],
description=data["description"]
)
return events
# Factory function for creating EventRetriever with default dependencies
def create_event_retriever(
oxigraph_endpoint: str = "http://localhost:7878/query",
qdrant_collection: str = "heritage_events"
) -> EventRetriever:
"""
Create EventRetriever with standard GLAM dependencies.
This is a convenience factory that wires up the retriever with
default Oxigraph and Qdrant connections.
"""
# Import here to avoid circular dependencies
import requests
def sparql_query(query: str) -> list[dict]:
"""Execute SPARQL query against Oxigraph."""
response = requests.post(
oxigraph_endpoint,
data=query,
headers={
"Content-Type": "application/sparql-query",
"Accept": "application/json"
},
timeout=30
)
response.raise_for_status()
data = response.json()
# Convert bindings to simple dict format
results = []
for binding in data.get("results", {}).get("bindings", []):
row = {}
for key, val in binding.items():
row[key] = val.get("value")
results.append(row)
return results
def qdrant_search(query: str, limit: int) -> list[dict]:
"""Search Qdrant events collection."""
# Placeholder - would use actual Qdrant client
logger.warning("Qdrant search not implemented - using empty results")
return []
def embed(text: str) -> list[float]:
"""Embed text using default embedding model."""
# Placeholder - would use actual embedding model
logger.warning("Embedding not implemented - using random vector")
return list(np.random.randn(384))
return EventRetriever(
oxigraph_query_fn=sparql_query,
qdrant_search_fn=qdrant_search,
embed_fn=embed
)

View file

@ -0,0 +1,372 @@
"""
Semantic Routing for Heritage RAG
Implements Signal-Decision architecture for fast, accurate query routing.
Based on: docs/plan/external_design_patterns/04_temporal_semantic_hypergraph.md
Key concepts:
- Signal extraction (no LLM) for fast query analysis
- Decision routing based on extracted signals
- Falls back to LLM classification for low-confidence cases
"""
from dataclasses import dataclass, field
from typing import Literal, Optional
import re
import logging
logger = logging.getLogger(__name__)
@dataclass
class QuerySignals:
"""Semantic signals extracted from query."""
# Primary classification
# Using str instead of Literal for runtime flexibility
entity_type: str # "person", "institution", "collection", "event", "mixed"
intent: str # "geographic", "statistical", "relational", "temporal", etc.
# Extracted entities
institution_mentions: list[str] = field(default_factory=list)
person_mentions: list[str] = field(default_factory=list)
location_mentions: list[str] = field(default_factory=list)
# Query characteristics
language: str = "nl"
has_temporal_constraint: bool = False
has_geographic_constraint: bool = False
requires_aggregation: bool = False
# Confidence
confidence: float = 0.85
@dataclass
class RouteConfig:
"""Configuration for query routing."""
primary_backend: str
secondary_backend: Optional[str] = None
qdrant_collection: Optional[str] = None
use_temporal_templates: bool = False
qdrant_filters: dict = field(default_factory=dict)
sparql_variant: Optional[str] = None
class SemanticSignalExtractor:
"""
Extract semantic signals from queries without LLM calls.
Uses:
- Keyword patterns for entity type detection
- Embedding similarity for intent classification
- Regex for entity extraction
"""
# Entity type indicators
PERSON_INDICATORS = [
"wie", "who", "curator", "archivist", "archivaris", "bibliothecaris",
"directeur", "director", "medewerker", "staff", "employee",
"werkt", "works", "persoon", "person", "hoofd", "manager"
]
INSTITUTION_INDICATORS = [
"museum", "musea", "archief", "archieven", "bibliotheek", "bibliotheken",
"galerie", "gallery", "instelling", "institution", "organisatie"
]
AGGREGATION_INDICATORS = [
"hoeveel", "how many", "count", "aantal", "total", "totaal",
"per", "verdeling", "distribution", "gemiddelde", "average"
]
# NOTE: Short words like "in" removed - too many false positives
# "in" matches "interessant", "instituut", etc.
GEOGRAPHIC_INDICATORS = [
"nabij", "near", "waar", "where", "locatie", "location",
"provincie", "province", "stad", "city", "regio", "region"
]
# NOTE: Short words like "na" removed - too many false positives
# "na" matches "nationaal", "naam", etc.
# Use word boundary matching for remaining short indicators
TEMPORAL_INDICATORS = [
"wanneer", "when", "voor", "before", "tussen", "between",
"oudste", "oldest", "nieuwste", "newest",
"opgericht", "founded", "gesloten", "closed", "fusie", "merger",
"geschiedenis", "history", "tijdlijn", "timeline"
]
# Short indicators that require word boundary matching
TEMPORAL_INDICATORS_SHORT = ["na", "after"] # Require \b matching
GEOGRAPHIC_INDICATORS_SHORT = ["in"] # Require \b matching
# Year pattern for temporal detection
YEAR_PATTERN = re.compile(r'\b(1[0-9]{3}|20[0-2][0-9])\b') # 1000-2029
# Known Dutch cities and provinces for location extraction
KNOWN_LOCATIONS = [
"Amsterdam", "Rotterdam", "Den Haag", "Utrecht", "Groningen",
"Noord-Holland", "Zuid-Holland", "Noord-Brabant", "Limburg",
"Gelderland", "Friesland", "Overijssel", "Drenthe", "Zeeland",
"Flevoland", "Haarlem", "Leiden", "Maastricht", "Eindhoven",
"Arnhem", "Nijmegen", "Enschede", "Tilburg", "Breda", "Delft"
]
def __init__(self):
self._intent_embeddings = None
self._model = None
# Precompile word boundary patterns for short indicators
self._temporal_short_patterns = [
re.compile(rf'\b{ind}\b', re.IGNORECASE)
for ind in self.TEMPORAL_INDICATORS_SHORT
]
self._geographic_short_patterns = [
re.compile(rf'\b{ind}\b', re.IGNORECASE)
for ind in self.GEOGRAPHIC_INDICATORS_SHORT
]
def _has_word_boundary_match(self, query: str, patterns: list) -> bool:
"""Check if any pattern matches with word boundaries."""
return any(p.search(query) for p in patterns)
def extract_signals(self, query: str) -> QuerySignals:
"""
Extract all semantic signals from query.
Fast operation - no LLM calls.
"""
query_lower = query.lower()
# Entity type detection
entity_type = self._detect_entity_type(query_lower)
# Intent classification
intent = self._classify_intent(query, query_lower)
# Entity extraction
institutions = self._extract_institutions(query)
persons = self._extract_persons(query)
locations = self._extract_locations(query)
# Constraint detection (with word boundary matching for short indicators)
has_temporal = (
any(ind in query_lower for ind in self.TEMPORAL_INDICATORS) or
self._has_word_boundary_match(query, self._temporal_short_patterns) or
bool(self.YEAR_PATTERN.search(query)) # Year mention implies temporal
)
has_geographic = (
any(ind in query_lower for ind in self.GEOGRAPHIC_INDICATORS) or
self._has_word_boundary_match(query, self._geographic_short_patterns) or
bool(locations)
)
requires_aggregation = any(ind in query_lower for ind in self.AGGREGATION_INDICATORS)
# Language detection
language = self._detect_language(query)
# Confidence based on signal clarity
confidence = self._compute_confidence(entity_type, intent, query_lower)
return QuerySignals(
entity_type=entity_type,
intent=intent,
institution_mentions=institutions,
person_mentions=persons,
location_mentions=locations,
language=language,
has_temporal_constraint=has_temporal,
has_geographic_constraint=has_geographic,
requires_aggregation=requires_aggregation,
confidence=confidence
)
def _detect_entity_type(self, query_lower: str) -> str:
"""Detect primary entity type in query."""
person_score = sum(1 for p in self.PERSON_INDICATORS if p in query_lower)
institution_score = sum(1 for p in self.INSTITUTION_INDICATORS if p in query_lower)
if person_score > 0 and institution_score > 0:
return "mixed"
elif person_score > institution_score:
return "person"
elif institution_score > 0:
return "institution"
else:
return "institution" # Default
def _classify_intent(self, query: str, query_lower: str) -> str:
"""Classify query intent."""
# Quick rule-based classification
if any(ind in query_lower for ind in self.AGGREGATION_INDICATORS):
return "statistical"
# Temporal: check long indicators, short indicators with word boundary, AND year patterns
if (any(ind in query_lower for ind in self.TEMPORAL_INDICATORS) or
self._has_word_boundary_match(query, self._temporal_short_patterns) or
bool(self.YEAR_PATTERN.search(query))): # Year implies temporal intent
return "temporal"
if "vergelijk" in query_lower or "compare" in query_lower:
return "comparative"
if any(ind in query_lower for ind in ["wat is", "what is", "tell me about", "vertel"]):
return "entity_lookup"
# Geographic: check both long indicators and short with word boundary
if (any(ind in query_lower for ind in self.GEOGRAPHIC_INDICATORS) or
self._has_word_boundary_match(query, self._geographic_short_patterns)):
return "geographic"
# Default based on question type
if query_lower.startswith(("welke", "which", "wat", "what")):
return "exploration"
return "exploration"
def _extract_institutions(self, query: str) -> list[str]:
"""Extract institution mentions from query."""
# Known institution patterns
patterns = [
r"(?:het\s+)?(\w+\s+(?:Museum|Archief|Bibliotheek|Galerie))",
r"(Rijksmuseum|Nationaal Archief|KB|Koninklijke Bibliotheek)",
r"(Noord-Hollands Archief|Stadsarchief Amsterdam|Gemeentearchief)",
r"(\w+archief|\w+museum|\w+bibliotheek)",
]
mentions = []
for pattern in patterns:
for match in re.finditer(pattern, query, re.IGNORECASE):
mentions.append(match.group(1))
return list(set(mentions))
def _extract_persons(self, query: str) -> list[str]:
"""Extract person mentions from query."""
# Basic person name pattern (capitalized words with optional tussenvoegsel)
pattern = r"\b([A-Z][a-z]+\s+(?:van\s+(?:de\s+)?|de\s+)?[A-Z][a-z]+)\b"
matches = re.findall(pattern, query)
return matches
def _extract_locations(self, query: str) -> list[str]:
"""Extract location mentions from query."""
mentions = []
query_lower = query.lower()
for loc in self.KNOWN_LOCATIONS:
if loc.lower() in query_lower:
mentions.append(loc)
return mentions
def _detect_language(self, query: str) -> str:
"""Detect query language."""
dutch_indicators = ["welke", "hoeveel", "waar", "wanneer", "wie", "het", "de", "zijn", "er"]
english_indicators = ["which", "how many", "where", "when", "who", "the", "are", "there"]
query_lower = query.lower()
dutch_score = sum(1 for w in dutch_indicators if w in query_lower)
english_score = sum(1 for w in english_indicators if w in query_lower)
return "nl" if dutch_score >= english_score else "en"
def _compute_confidence(self, entity_type: str, intent: str, query_lower: str) -> float:
"""Compute confidence in signal extraction."""
confidence = 0.7 # Base
# Boost for clear entity type
if entity_type != "mixed":
confidence += 0.1
# Boost for clear intent indicators
if any(ind in query_lower for ind in self.AGGREGATION_INDICATORS + self.TEMPORAL_INDICATORS):
confidence += 0.1
# Boost for clear question structure
if query_lower.startswith(("welke", "which", "hoeveel", "how many", "waar", "where")):
confidence += 0.05
return min(confidence, 0.95)
class SemanticDecisionRouter:
"""
Route queries to backends based on signals.
"""
def route(self, signals: QuerySignals) -> RouteConfig:
"""
Determine routing based on signals.
"""
# Person queries → Qdrant persons collection
if signals.entity_type == "person":
config = RouteConfig(
primary_backend="qdrant",
secondary_backend="sparql",
qdrant_collection="heritage_persons",
)
# Add institution filter if mentioned
if signals.institution_mentions:
config.qdrant_filters["custodian_slug"] = self._to_slug(
signals.institution_mentions[0]
)
return config
# Statistical queries → DuckLake
if signals.requires_aggregation:
return RouteConfig(
primary_backend="ducklake",
secondary_backend="sparql",
)
# Temporal queries → Temporal SPARQL templates
if signals.has_temporal_constraint:
return RouteConfig(
primary_backend="sparql",
secondary_backend="qdrant",
use_temporal_templates=True,
qdrant_collection="heritage_custodians",
)
# Geographic queries → SPARQL with location filter
if signals.has_geographic_constraint:
return RouteConfig(
primary_backend="sparql",
secondary_backend="qdrant",
qdrant_collection="heritage_custodians",
)
# Default: hybrid SPARQL + Qdrant
return RouteConfig(
primary_backend="qdrant",
secondary_backend="sparql",
qdrant_collection="heritage_custodians",
)
def _to_slug(self, institution_name: str) -> str:
"""Convert institution name to slug format."""
import unicodedata
normalized = unicodedata.normalize('NFD', institution_name)
ascii_name = ''.join(c for c in normalized if unicodedata.category(c) != 'Mn')
slug = ascii_name.lower()
slug = re.sub(r"[''`\",.:;!?()[\]{}]", '', slug)
slug = re.sub(r'[\s_]+', '-', slug)
slug = re.sub(r'-+', '-', slug).strip('-')
return slug
# Singleton instances
_signal_extractor: Optional[SemanticSignalExtractor] = None
_decision_router: Optional[SemanticDecisionRouter] = None
def get_signal_extractor() -> SemanticSignalExtractor:
"""Get or create singleton signal extractor instance."""
global _signal_extractor
if _signal_extractor is None:
_signal_extractor = SemanticSignalExtractor()
return _signal_extractor
def get_decision_router() -> SemanticDecisionRouter:
"""Get or create singleton decision router instance."""
global _decision_router
if _decision_router is None:
_decision_router = SemanticDecisionRouter()
return _decision_router

View file

@ -0,0 +1,311 @@
"""
Temporal Query Intent Extraction for Heritage RAG
Extracts temporal constraints from natural language queries to enable
temporal SPARQL template selection and conflict resolution.
Based on: docs/plan/external_design_patterns/04_temporal_semantic_hypergraph.md
"""
import dspy
from dataclasses import dataclass, field
from typing import Optional, Literal
from datetime import datetime
import re
import logging
logger = logging.getLogger(__name__)
@dataclass
class TemporalConstraint:
"""Extracted temporal constraint from a query."""
constraint_type: Literal[
"point_in_time", # "in 1990", "on January 1, 2000"
"before", # "before 2000", "vóór de fusie"
"after", # "after 1995", "na de renovatie"
"between", # "between 1990 and 2000"
"oldest", # "oldest museum", "oudste archief"
"newest", # "newest library", "nieuwste bibliotheek"
"founding", # "when was X founded", "opgericht"
"closure", # "when did X close", "gesloten"
"change_event", # "merger", "split", "relocation"
"timeline", # "history of", "geschiedenis van"
"none" # No temporal constraint detected
]
# Extracted dates (ISO format or year)
date_start: Optional[str] = None
date_end: Optional[str] = None
# For relative references
reference_event: Optional[str] = None # e.g., "de fusie", "the merger"
# Confidence
confidence: float = 0.8
# Recommended SPARQL template
recommended_template: Optional[str] = None
class TemporalConstraintExtractor:
"""
Fast extraction of temporal constraints without LLM.
Uses pattern matching for common temporal expressions.
Falls back to LLM for complex/ambiguous cases.
"""
# Year patterns
YEAR_PATTERN = re.compile(r'\b(1[0-9]{3}|20[0-2][0-9])\b') # 1000-2029
DATE_PATTERN = re.compile(
r'\b(\d{1,2}[-/]\d{1,2}[-/]\d{2,4}|\d{4}[-/]\d{2}[-/]\d{2})\b'
)
# Dutch temporal keywords
BEFORE_KEYWORDS_NL = ["voor", "vóór", "voordat", "eerder dan"]
AFTER_KEYWORDS_NL = ["na", "nadat", "later dan", "sinds"]
BETWEEN_KEYWORDS_NL = ["tussen", "van", "tot"]
OLDEST_KEYWORDS_NL = ["oudste", "eerste", "oorspronkelijke"]
NEWEST_KEYWORDS_NL = ["nieuwste", "laatste", "meest recente"]
# English temporal keywords
BEFORE_KEYWORDS_EN = ["before", "prior to", "earlier than"]
AFTER_KEYWORDS_EN = ["after", "following", "since", "later than"]
BETWEEN_KEYWORDS_EN = ["between", "from", "to"]
OLDEST_KEYWORDS_EN = ["oldest", "first", "original", "earliest"]
NEWEST_KEYWORDS_EN = ["newest", "latest", "most recent"]
# Event keywords
FOUNDING_KEYWORDS = ["opgericht", "gesticht", "founded", "established", "created"]
CLOSURE_KEYWORDS = ["gesloten", "opgeheven", "closed", "dissolved", "terminated"]
MERGER_KEYWORDS = ["fusie", "samenvoeging", "merger", "merged", "combined"]
TIMELINE_KEYWORDS = [
"geschiedenis", "tijdlijn", "history", "timeline", "evolution",
"door de jaren", "over time", "changes"
]
# Template mapping
TEMPLATE_MAP = {
"point_in_time": "point_in_time_state",
"before": "point_in_time_state",
"after": "point_in_time_state",
"between": "events_in_period",
"oldest": "find_by_founding",
"newest": "find_by_founding",
"founding": "institution_timeline",
"closure": "institution_timeline",
"change_event": "events_in_period",
"timeline": "institution_timeline",
}
def extract(self, query: str) -> TemporalConstraint:
"""
Extract temporal constraint from query.
Fast operation using pattern matching.
"""
query_lower = query.lower()
# 1. Check for timeline/history queries
if any(kw in query_lower for kw in self.TIMELINE_KEYWORDS):
return TemporalConstraint(
constraint_type="timeline",
confidence=0.9,
recommended_template="institution_timeline"
)
# 2. Check for superlatives (oldest/newest)
if any(kw in query_lower for kw in self.OLDEST_KEYWORDS_NL + self.OLDEST_KEYWORDS_EN):
return TemporalConstraint(
constraint_type="oldest",
confidence=0.9,
recommended_template="find_by_founding"
)
if any(kw in query_lower for kw in self.NEWEST_KEYWORDS_NL + self.NEWEST_KEYWORDS_EN):
return TemporalConstraint(
constraint_type="newest",
confidence=0.9,
recommended_template="find_by_founding"
)
# 3. Check for change event keywords
if any(kw in query_lower for kw in self.MERGER_KEYWORDS):
return TemporalConstraint(
constraint_type="change_event",
reference_event="merger",
confidence=0.85,
recommended_template="events_in_period"
)
if any(kw in query_lower for kw in self.FOUNDING_KEYWORDS):
return TemporalConstraint(
constraint_type="founding",
confidence=0.85,
recommended_template="institution_timeline"
)
if any(kw in query_lower for kw in self.CLOSURE_KEYWORDS):
return TemporalConstraint(
constraint_type="closure",
confidence=0.85,
recommended_template="institution_timeline"
)
# 4. Extract years from query
years = self.YEAR_PATTERN.findall(query)
if len(years) >= 2:
# "between 1990 and 2000"
years_sorted = sorted([int(y) for y in years])
return TemporalConstraint(
constraint_type="between",
date_start=f"{years_sorted[0]}-01-01",
date_end=f"{years_sorted[-1]}-12-31",
confidence=0.85,
recommended_template="events_in_period"
)
if len(years) == 1:
year = years[0]
# Check for before/after indicators with word boundary
before_match = any(
re.search(rf'\b{kw}\b', query_lower)
for kw in self.BEFORE_KEYWORDS_NL + self.BEFORE_KEYWORDS_EN
)
after_match = any(
re.search(rf'\b{kw}\b', query_lower)
for kw in self.AFTER_KEYWORDS_NL + self.AFTER_KEYWORDS_EN
)
if before_match:
return TemporalConstraint(
constraint_type="before",
date_end=f"{year}-01-01",
confidence=0.85,
recommended_template="point_in_time_state"
)
if after_match:
return TemporalConstraint(
constraint_type="after",
date_start=f"{year}-12-31",
confidence=0.85,
recommended_template="point_in_time_state"
)
# Default: point in time
return TemporalConstraint(
constraint_type="point_in_time",
date_start=f"{year}-01-01",
date_end=f"{year}-12-31",
confidence=0.8,
recommended_template="point_in_time_state"
)
# 5. No clear temporal constraint
return TemporalConstraint(
constraint_type="none",
confidence=0.7
)
def get_template_for_constraint(
self,
constraint: TemporalConstraint
) -> Optional[str]:
"""Get recommended SPARQL template ID for temporal constraint."""
return self.TEMPLATE_MAP.get(constraint.constraint_type)
# DSPy Signature for complex temporal extraction
class TemporalQueryIntent(dspy.Signature):
"""
Extract temporal constraints from a heritage institution query.
Use this for complex queries where pattern matching fails.
"""
query: str = dspy.InputField(desc="Natural language query about heritage institutions")
language: str = dspy.InputField(desc="Query language: 'nl' or 'en'", default="nl")
constraint_type: str = dspy.OutputField(
desc="Type of temporal constraint: point_in_time, before, after, between, "
"oldest, newest, founding, closure, change_event, timeline, none"
)
date_start: str = dspy.OutputField(
desc="Start date in ISO format (YYYY-MM-DD) or empty string if not applicable"
)
date_end: str = dspy.OutputField(
desc="End date in ISO format (YYYY-MM-DD) or empty string if not applicable"
)
reference_event: str = dspy.OutputField(
desc="Referenced event (e.g., 'fusie', 'merger') or empty string"
)
confidence: float = dspy.OutputField(
desc="Confidence score 0.0-1.0"
)
class TemporalIntentExtractorModule(dspy.Module):
"""
DSPy module for temporal intent extraction.
Uses fast pattern matching first, falls back to LLM for complex cases.
"""
def __init__(self, confidence_threshold: float = 0.75):
super().__init__()
self.fast_extractor = TemporalConstraintExtractor()
self.llm_extractor = dspy.ChainOfThought(TemporalQueryIntent)
self.confidence_threshold = confidence_threshold
def forward(self, query: str, language: str = "nl") -> TemporalConstraint:
"""
Extract temporal constraint from query.
Args:
query: Natural language query
language: Query language ('nl' or 'en')
Returns:
TemporalConstraint with extracted information
"""
# Try fast extraction first
constraint = self.fast_extractor.extract(query)
# If confidence is high enough, use fast result
if constraint.confidence >= self.confidence_threshold:
logger.debug(f"Fast temporal extraction: {constraint.constraint_type} (conf={constraint.confidence})")
return constraint
# Fall back to LLM for low confidence cases
logger.debug(f"LLM temporal extraction (fast conf={constraint.confidence})")
try:
result = self.llm_extractor(query=query, language=language)
return TemporalConstraint(
constraint_type=result.constraint_type or "none",
date_start=result.date_start if result.date_start else None,
date_end=result.date_end if result.date_end else None,
reference_event=result.reference_event if result.reference_event else None,
confidence=float(result.confidence) if result.confidence else 0.7,
recommended_template=self.fast_extractor.TEMPLATE_MAP.get(result.constraint_type)
)
except Exception as e:
logger.warning(f"LLM temporal extraction failed: {e}")
# Return fast extraction result as fallback
return constraint
# Singleton instance
_temporal_extractor: Optional[TemporalConstraintExtractor] = None
def get_temporal_extractor() -> TemporalConstraintExtractor:
"""Get or create singleton temporal extractor instance."""
global _temporal_extractor
if _temporal_extractor is None:
_temporal_extractor = TemporalConstraintExtractor()
return _temporal_extractor

View file

@ -0,0 +1,258 @@
"""
Temporal Conflict Resolution for Heritage Data
Handles cases where multiple facts exist for the same property at overlapping times.
Based on: docs/plan/external_design_patterns/04_temporal_semantic_hypergraph.md
Strategies:
1. Temporal ordering: Use fact valid at query time
2. Recency: Prefer more recent sources
3. Authority: Prefer authoritative sources (Tier 1)
4. Confidence: Use higher confidence facts
"""
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import logging
logger = logging.getLogger(__name__)
@dataclass
class TemporalFact:
"""A fact with temporal validity."""
property: str
value: str
valid_from: datetime
valid_to: Optional[datetime]
source: str
confidence: float = 1.0
ghcid: Optional[str] = None
@dataclass
class ConflictResolution:
"""Result of conflict resolution."""
property: str
authoritative_value: str
valid_for_date: datetime
conflict_type: str
explanation: str
alternative_values: list[TemporalFact] = field(default_factory=list)
class TemporalConflictResolver:
"""
Resolve conflicts between temporal facts.
Uses a multi-factor scoring system:
- Source authority (Tier 1-4)
- Confidence scores
- Temporal recency
"""
SOURCE_AUTHORITY = {
"TIER_1_AUTHORITATIVE": 1.0,
"TIER_2_VERIFIED": 0.8,
"TIER_3_CROWD_SOURCED": 0.6,
"TIER_4_INFERRED": 0.4,
}
def resolve_conflicts(
self,
ghcid: str,
facts: list[TemporalFact],
query_date: Optional[datetime] = None
) -> list[ConflictResolution]:
"""
Resolve all conflicts in a set of facts.
Args:
ghcid: Institution identifier
facts: All facts about the institution
query_date: Point in time for resolution (default: now)
Returns:
List of conflict resolutions with authoritative values
"""
if query_date is None:
query_date = datetime.now()
# Group facts by property
by_property: dict[str, list[TemporalFact]] = {}
for fact in facts:
by_property.setdefault(fact.property, []).append(fact)
resolutions = []
for prop, prop_facts in by_property.items():
# Find facts valid at query_date
valid_facts = [
f for f in prop_facts
if f.valid_from <= query_date and
(f.valid_to is None or f.valid_to > query_date)
]
if len(valid_facts) <= 1:
# No conflict
continue
# Multiple valid facts - resolve conflict
resolution = self._resolve_property_conflict(
prop, valid_facts, query_date
)
resolutions.append(resolution)
return resolutions
def get_authoritative_value(
self,
ghcid: str,
property: str,
facts: list[TemporalFact],
query_date: Optional[datetime] = None
) -> Optional[str]:
"""
Get the authoritative value for a single property.
Convenience method for single-property lookups.
"""
if query_date is None:
query_date = datetime.now()
# Filter facts for this property
prop_facts = [f for f in facts if f.property == property]
if not prop_facts:
return None
# Find facts valid at query_date
valid_facts = [
f for f in prop_facts
if f.valid_from <= query_date and
(f.valid_to is None or f.valid_to > query_date)
]
if not valid_facts:
return None
if len(valid_facts) == 1:
return valid_facts[0].value
# Resolve conflict
resolution = self._resolve_property_conflict(property, valid_facts, query_date)
return resolution.authoritative_value
def _resolve_property_conflict(
self,
property: str,
facts: list[TemporalFact],
query_date: datetime
) -> ConflictResolution:
"""
Resolve conflict for a single property.
"""
# Score each fact
scored = []
for fact in facts:
score = self._compute_authority_score(fact)
scored.append((fact, score))
# Sort by score (descending)
scored.sort(key=lambda x: x[1], reverse=True)
winner = scored[0][0]
alternatives = [f for f, s in scored[1:]]
# Determine conflict type
if all(f.value == winner.value for f in facts):
conflict_type = "redundant" # Same value from multiple sources
elif self._is_name_change(facts):
conflict_type = "name_change"
elif self._is_location_change(facts, property):
conflict_type = "location_change"
else:
conflict_type = "data_inconsistency"
explanation = self._generate_explanation(
property, winner, alternatives, conflict_type, query_date
)
return ConflictResolution(
property=property,
authoritative_value=winner.value,
valid_for_date=query_date,
conflict_type=conflict_type,
explanation=explanation,
alternative_values=alternatives
)
def _compute_authority_score(self, fact: TemporalFact) -> float:
"""Compute authority score for a fact."""
# Base authority from source tier
authority = self.SOURCE_AUTHORITY.get(fact.source, 0.5)
# Boost for confidence
authority *= fact.confidence
# Recency bonus (facts with recent valid_from get slight boost)
days_old = (datetime.now() - fact.valid_from).days
recency_factor = 1.0 / (1.0 + days_old / 365.0) # Decay over years
authority *= (0.8 + 0.2 * recency_factor)
return authority
def _is_name_change(self, facts: list[TemporalFact]) -> bool:
"""Check if conflict represents a name change."""
# Name changes typically have non-overlapping validity
facts_sorted = sorted(facts, key=lambda f: f.valid_from)
for i in range(len(facts_sorted) - 1):
if facts_sorted[i].valid_to == facts_sorted[i+1].valid_from:
return True
return False
def _is_location_change(self, facts: list[TemporalFact], property: str) -> bool:
"""Check if conflict represents a location change."""
return property in ["city", "address", "location", "settlementName", "subregionCode"]
def _generate_explanation(
self,
property: str,
winner: TemporalFact,
alternatives: list[TemporalFact],
conflict_type: str,
query_date: datetime
) -> str:
"""Generate human-readable explanation of resolution."""
if conflict_type == "name_change":
return (
f"The institution name changed over time. "
f"At {query_date.strftime('%Y-%m-%d')}, the authoritative name was '{winner.value}'. "
f"Previous names: {', '.join(f.value for f in alternatives)}."
)
elif conflict_type == "location_change":
return (
f"The institution relocated. "
f"At {query_date.strftime('%Y-%m-%d')}, it was located at '{winner.value}'."
)
elif conflict_type == "redundant":
return f"Multiple sources confirm: {winner.value}"
else:
return (
f"Data conflict for {property}. "
f"Using '{winner.value}' from {winner.source} (confidence: {winner.confidence:.2f}). "
f"Alternative values exist in other sources."
)
# Singleton instance
_resolver: Optional[TemporalConflictResolver] = None
def get_temporal_resolver() -> TemporalConflictResolver:
"""Get or create singleton resolver instance."""
global _resolver
if _resolver is None:
_resolver = TemporalConflictResolver()
return _resolver

View file

@ -0,0 +1,493 @@
"""
Tests for Semantic Routing (Signal-Decision Pattern)
Tests the SemanticSignalExtractor and SemanticDecisionRouter classes
which enable fast LLM-free query routing for high-confidence queries.
"""
import pytest
from .semantic_router import (
QuerySignals,
RouteConfig,
SemanticSignalExtractor,
SemanticDecisionRouter,
get_signal_extractor,
get_decision_router,
)
class TestSemanticSignalExtractor:
"""Tests for SemanticSignalExtractor class."""
@pytest.fixture
def extractor(self):
return SemanticSignalExtractor()
# ===== Entity Type Detection =====
def test_detect_person_query(self, extractor):
"""Person indicators should detect person entity type."""
# Query with clear person indicator and no institution indicator
signals = extractor.extract_signals("Wie werkt daar als medewerker?")
assert signals.entity_type == "person"
def test_detect_person_query_with_institution_is_mixed(self, extractor):
"""Person query mentioning institution should be mixed."""
signals = extractor.extract_signals("Wie is de archivaris bij het Noord-Hollands Archief?")
# "archief" is an institution indicator, so this is mixed
assert signals.entity_type == "mixed"
def test_detect_person_query_with_organisatie_is_mixed(self, extractor):
"""Person query with 'organisatie' should be mixed."""
signals = extractor.extract_signals("Wie is de directeur van deze organisatie?")
# "organisatie" is an institution indicator
assert signals.entity_type == "mixed"
def test_detect_institution_query(self, extractor):
"""Institution indicators should detect institution entity type."""
signals = extractor.extract_signals("Welke musea zijn er in Amsterdam?")
assert signals.entity_type == "institution"
def test_detect_mixed_query(self, extractor):
"""Mixed indicators should detect mixed entity type."""
signals = extractor.extract_signals("Welke curatoren werken bij musea in Utrecht?")
assert signals.entity_type == "mixed"
def test_default_to_institution(self, extractor):
"""Ambiguous queries should default to institution."""
signals = extractor.extract_signals("Vertel me over cultureel erfgoed")
assert signals.entity_type == "institution"
# ===== Intent Classification =====
def test_statistical_intent(self, extractor):
"""Aggregation indicators should classify as statistical."""
signals = extractor.extract_signals("Hoeveel archieven zijn er in Nederland?")
assert signals.intent == "statistical"
assert signals.requires_aggregation is True
def test_temporal_intent(self, extractor):
"""Temporal indicators should classify as temporal."""
signals = extractor.extract_signals("Wanneer is het Rijksmuseum opgericht?")
assert signals.intent == "temporal"
assert signals.has_temporal_constraint is True
def test_temporal_intent_with_oldest(self, extractor):
"""Oldest/newest queries should be temporal."""
signals = extractor.extract_signals("Wat is het oudste museum in Nederland?")
assert signals.intent == "temporal"
assert signals.has_temporal_constraint is True
def test_geographic_intent(self, extractor):
"""Geographic indicators should classify as geographic."""
# "waar" (where) is a geographic indicator
signals = extractor.extract_signals("Waar staat dit museum?")
assert signals.intent == "geographic"
assert signals.has_geographic_constraint is True
def test_geographic_intent_with_location(self, extractor):
"""Location mentions should trigger geographic constraint."""
signals = extractor.extract_signals("Vertel me over musea in Amsterdam")
assert signals.has_geographic_constraint is True
def test_temporal_indicator_substring_fixed(self, extractor):
"""Verify fix: substring matching no longer causes false positives.
'nationaal' contains 'na' but should NOT trigger temporal (uses word boundaries).
This tests that the fix for substring matching is working.
"""
signals = extractor.extract_signals("In welke stad ligt het Nationaal Archief?")
# After fix: should NOT be temporal (no word-boundary match for "na")
# "In" at start is a word boundary match for geographic indicator
assert signals.intent == "geographic"
assert signals.has_temporal_constraint is False
def test_entity_lookup_intent(self, extractor):
"""Entity lookup indicators should classify correctly."""
signals = extractor.extract_signals("Wat is het Rijksmuseum?")
assert signals.intent == "entity_lookup"
def test_comparative_intent(self, extractor):
"""Comparative queries should be classified correctly."""
signals = extractor.extract_signals("Vergelijk het Rijksmuseum met het Van Gogh Museum")
assert signals.intent == "comparative"
def test_exploration_default_intent(self, extractor):
"""Default to exploration for open questions without clear indicators."""
# Query without geographic, temporal, or aggregation indicators
# Note: "in" is a geographic indicator, so avoid words containing it
signals = extractor.extract_signals("Welke schilderijen vallen op?")
assert signals.intent == "exploration"
def test_geographic_indicator_substring_fixed(self, extractor):
"""Verify fix: 'in' no longer matches inside words.
'interessant' contains 'in' but should NOT trigger geographic.
This tests that the word boundary fix is working.
"""
signals = extractor.extract_signals("Welke schilderijen zijn interessant?")
# After fix: should be exploration, not geographic
assert signals.intent == "exploration"
assert signals.has_geographic_constraint is False
def test_word_boundary_in_works_correctly(self, extractor):
"""Verify 'in' as standalone word DOES trigger geographic."""
signals = extractor.extract_signals("Welke musea zijn er in Amsterdam?")
# "in" as standalone word should trigger geographic
assert signals.intent == "geographic"
assert signals.has_geographic_constraint is True
def test_word_boundary_na_works_correctly(self, extractor):
"""Verify 'na' as standalone word DOES trigger temporal."""
# Dutch: "Na de fusie..." = "After the merger..."
signals = extractor.extract_signals("Wat gebeurde er na de fusie met het archief?")
# "na" as standalone word should trigger temporal
assert signals.intent == "temporal"
assert signals.has_temporal_constraint is True
# ===== Entity Extraction =====
def test_extract_institution_mention(self, extractor):
"""Should extract institution names from query."""
signals = extractor.extract_signals("Vertel me over het Noord-Hollands Archief")
assert len(signals.institution_mentions) >= 1
# Should find "Noord-Hollands Archief" or similar
def test_extract_location_mention(self, extractor):
"""Should extract known Dutch locations."""
signals = extractor.extract_signals("Welke musea zijn er in Amsterdam?")
assert "Amsterdam" in signals.location_mentions
assert signals.has_geographic_constraint is True
def test_extract_multiple_locations(self, extractor):
"""Should extract multiple locations."""
signals = extractor.extract_signals("Archieven in Utrecht en Haarlem")
assert "Utrecht" in signals.location_mentions
assert "Haarlem" in signals.location_mentions
# ===== Language Detection =====
def test_detect_dutch_language(self, extractor):
"""Dutch queries should be detected."""
signals = extractor.extract_signals("Hoeveel musea zijn er in Nederland?")
assert signals.language == "nl"
def test_detect_english_language(self, extractor):
"""English queries should be detected."""
signals = extractor.extract_signals("How many museums are there in Amsterdam?")
assert signals.language == "en"
# ===== Confidence Scoring =====
def test_high_confidence_clear_query(self, extractor):
"""Clear queries should have high confidence."""
signals = extractor.extract_signals("Hoeveel archieven zijn er in Noord-Holland?")
assert signals.confidence >= 0.8
def test_moderate_confidence_ambiguous_query(self, extractor):
"""Ambiguous queries should have moderate confidence."""
signals = extractor.extract_signals("erfgoed informatie")
assert signals.confidence < 0.9
def test_confidence_capped_at_095(self, extractor):
"""Confidence should not exceed 0.95."""
signals = extractor.extract_signals("Hoeveel musea zijn er in Amsterdam?")
assert signals.confidence <= 0.95
class TestSemanticDecisionRouter:
"""Tests for SemanticDecisionRouter class."""
@pytest.fixture
def router(self):
return SemanticDecisionRouter()
def test_person_query_routes_to_qdrant_persons(self, router):
"""Person queries should route to heritage_persons collection."""
signals = QuerySignals(
entity_type="person",
intent="entity_lookup",
institution_mentions=["Noord-Hollands Archief"],
)
config = router.route(signals)
assert config.primary_backend == "qdrant"
assert config.qdrant_collection == "heritage_persons"
def test_person_query_with_institution_filter(self, router):
"""Person queries with institution should add filter."""
signals = QuerySignals(
entity_type="person",
intent="entity_lookup",
institution_mentions=["Noord-Hollands Archief"],
)
config = router.route(signals)
assert "custodian_slug" in config.qdrant_filters
assert "noord-hollands-archief" in config.qdrant_filters["custodian_slug"]
def test_statistical_query_routes_to_ducklake(self, router):
"""Statistical queries should route to DuckLake."""
signals = QuerySignals(
entity_type="institution",
intent="statistical",
requires_aggregation=True,
)
config = router.route(signals)
assert config.primary_backend == "ducklake"
def test_temporal_query_uses_temporal_templates(self, router):
"""Temporal queries should enable temporal templates."""
signals = QuerySignals(
entity_type="institution",
intent="temporal",
has_temporal_constraint=True,
)
config = router.route(signals)
assert config.primary_backend == "sparql"
assert config.use_temporal_templates is True
def test_geographic_query_routes_to_sparql(self, router):
"""Geographic queries should route to SPARQL."""
signals = QuerySignals(
entity_type="institution",
intent="geographic",
has_geographic_constraint=True,
location_mentions=["Amsterdam"],
)
config = router.route(signals)
assert config.primary_backend == "sparql"
def test_default_hybrid_routing(self, router):
"""Default queries should use hybrid routing."""
signals = QuerySignals(
entity_type="institution",
intent="exploration",
)
config = router.route(signals)
assert config.primary_backend == "qdrant"
assert config.secondary_backend == "sparql"
class TestSlugGeneration:
"""Tests for institution slug generation."""
@pytest.fixture
def router(self):
return SemanticDecisionRouter()
def test_simple_slug(self, router):
"""Simple names should convert to lowercase hyphenated slug."""
slug = router._to_slug("Rijksmuseum")
assert slug == "rijksmuseum"
def test_slug_with_spaces(self, router):
"""Spaces should be converted to hyphens."""
slug = router._to_slug("Noord-Hollands Archief")
assert slug == "noord-hollands-archief"
def test_slug_with_article(self, router):
"""Dutch articles should be preserved in slug."""
slug = router._to_slug("Het Utrechts Archief")
assert slug == "het-utrechts-archief"
def test_slug_with_diacritics(self, router):
"""Diacritics should be removed."""
slug = router._to_slug("Musée d'Orsay")
assert slug == "musee-dorsay"
class TestSingletonInstances:
"""Tests for singleton pattern."""
def test_signal_extractor_singleton(self):
"""get_signal_extractor should return same instance."""
ext1 = get_signal_extractor()
ext2 = get_signal_extractor()
assert ext1 is ext2
def test_decision_router_singleton(self):
"""get_decision_router should return same instance."""
router1 = get_decision_router()
router2 = get_decision_router()
assert router1 is router2
class TestIntegration:
"""Integration tests for full signal-decision flow."""
def test_full_person_query_flow(self):
"""Test complete flow for person query."""
extractor = get_signal_extractor()
router = get_decision_router()
# Query with clear person indicator but also institution mention (mixed)
signals = extractor.extract_signals(
"Wie is de archivaris bij het Noord-Hollands Archief?"
)
config = router.route(signals)
# Mixed entity type because both person and institution indicators present
assert signals.entity_type == "mixed"
# Mixed queries route via default (qdrant hybrid)
assert config.primary_backend in ["qdrant", "sparql"]
def test_full_pure_person_query_flow(self):
"""Test complete flow for pure person query (no institution mention)."""
extractor = get_signal_extractor()
router = get_decision_router()
signals = extractor.extract_signals("Wie werkt daar als medewerker?")
config = router.route(signals)
assert signals.entity_type == "person"
assert config.primary_backend == "qdrant"
assert config.qdrant_collection == "heritage_persons"
def test_full_statistical_query_flow(self):
"""Test complete flow for statistical query."""
extractor = get_signal_extractor()
router = get_decision_router()
signals = extractor.extract_signals(
"Hoeveel musea zijn er per provincie in Nederland?"
)
config = router.route(signals)
assert signals.intent == "statistical"
assert signals.requires_aggregation is True
assert config.primary_backend == "ducklake"
def test_full_temporal_query_flow(self):
"""Test complete flow for temporal query."""
extractor = get_signal_extractor()
router = get_decision_router()
signals = extractor.extract_signals(
"Wat is het oudste archief in Noord-Holland?"
)
config = router.route(signals)
assert signals.intent == "temporal"
assert signals.has_temporal_constraint is True
assert config.use_temporal_templates is True
def test_high_confidence_skip_llm_threshold(self):
"""Verify high-confidence queries meet skip threshold."""
extractor = get_signal_extractor()
# These queries should have confidence >= 0.8
# Need clear indicators without ambiguity
high_confidence_queries = [
"Hoeveel archieven zijn er in Nederland?", # clear aggregation
"Wanneer is het Nationaal Archief opgericht?", # clear temporal
"Welke musea zijn er in Amsterdam?", # clear geographic + institution
]
for query in high_confidence_queries:
signals = extractor.extract_signals(query)
assert signals.confidence >= 0.8, (
f"Query '{query}' has confidence {signals.confidence}, expected >= 0.8"
)
def test_moderate_confidence_for_mixed_queries(self):
"""Mixed entity type queries should have lower confidence."""
extractor = get_signal_extractor()
# Mixed queries are more ambiguous
signals = extractor.extract_signals("Wie is de directeur van het Rijksmuseum?")
# Mixed entity type (person + institution) reduces confidence
assert signals.entity_type == "mixed"
assert signals.confidence < 0.9 # Not as high as clear queries
class TestYearPatternDetection:
"""Tests for year-based temporal detection.
Year mentions (1000-2029) should trigger temporal intent,
even when combined with geographic indicators like 'in'.
"""
@pytest.fixture
def extractor(self):
return SemanticSignalExtractor()
def test_year_triggers_temporal_intent(self, extractor):
"""A year mention should classify as temporal intent."""
signals = extractor.extract_signals("Wat was de status van het Rijksmuseum in 1990?")
# Year 1990 should trigger temporal, not "in" triggering geographic
assert signals.intent == "temporal"
assert signals.has_temporal_constraint is True
def test_year_1850_triggers_temporal(self, extractor):
"""Historical year should trigger temporal."""
signals = extractor.extract_signals("Welke musea bestonden in 1850?")
assert signals.intent == "temporal"
assert signals.has_temporal_constraint is True
def test_year_2020_with_aggregation_is_statistical(self, extractor):
"""Aggregation query with year should be statistical with temporal constraint.
'Hoeveel' (how many) triggers aggregation statistical intent.
Year 2020 triggers temporal constraint.
Result: statistical intent WITH temporal filter applied.
"""
signals = extractor.extract_signals("Hoeveel archieven waren er in 2020?")
# "Hoeveel" overrides to statistical, but temporal constraint is detected
assert signals.intent == "statistical"
assert signals.requires_aggregation is True
assert signals.has_temporal_constraint is True # Year still detected!
def test_year_2020_pure_temporal(self, extractor):
"""Recent year without aggregation should be temporal."""
signals = extractor.extract_signals("Welke archieven bestonden in 2020?")
assert signals.intent == "temporal"
assert signals.has_temporal_constraint is True
def test_geographic_without_year_stays_geographic(self, extractor):
"""Geographic query without year should stay geographic."""
signals = extractor.extract_signals("Welke musea zijn er in Amsterdam?")
assert signals.intent == "geographic"
assert signals.has_temporal_constraint is False
def test_year_overrides_geographic_in(self, extractor):
"""Year should make query temporal even with 'in' for location."""
signals = extractor.extract_signals("Welke musea waren er in Amsterdam in 1900?")
# Year 1900 should override the geographic "in Amsterdam"
assert signals.intent == "temporal"
assert signals.has_temporal_constraint is True
# Geographic constraint should still be detected
assert signals.has_geographic_constraint is True
def test_year_in_english_query(self, extractor):
"""Year detection should work in English queries too."""
signals = extractor.extract_signals("What museums existed in 1920?")
assert signals.intent == "temporal"
assert signals.has_temporal_constraint is True
def test_year_range_boundary_1000(self, extractor):
"""Year 1000 should be detected."""
signals = extractor.extract_signals("Bestond dit klooster al in 1000?")
assert signals.has_temporal_constraint is True
def test_year_range_boundary_2029(self, extractor):
"""Year 2029 should be detected (future planning)."""
signals = extractor.extract_signals("Wat zijn de plannen voor 2029?")
assert signals.has_temporal_constraint is True
def test_non_year_number_ignored(self, extractor):
"""Numbers that aren't years should not trigger temporal."""
signals = extractor.extract_signals("Hoeveel van de 500 musea hebben een website?")
# 500 is not a valid year (outside 1000-2029)
# This is a statistical query
assert signals.intent == "statistical"
# has_temporal_constraint could be False (no year) but check intent
def test_year_combined_with_temporal_keyword(self, extractor):
"""Year + temporal keyword should be high confidence temporal."""
signals = extractor.extract_signals("Wanneer in 1945 werd het museum gesloten?")
assert signals.intent == "temporal"
assert signals.has_temporal_constraint is True
# Combined signals should give high confidence
assert signals.confidence >= 0.8
# Run with: pytest backend/rag/test_semantic_routing.py -v

View file

@ -0,0 +1,527 @@
"""
Tests for Temporal Intent Extraction Module
Tests the TemporalConstraintExtractor and TemporalIntentExtractorModule classes
which enable fast LLM-free extraction of temporal constraints from queries.
"""
import pytest
from .temporal_intent import (
TemporalConstraint,
TemporalConstraintExtractor,
TemporalIntentExtractorModule,
get_temporal_extractor,
)
class TestTemporalConstraintExtractor:
"""Tests for TemporalConstraintExtractor class."""
@pytest.fixture
def extractor(self):
return TemporalConstraintExtractor()
# ===== Timeline/History Queries =====
def test_timeline_dutch_geschiedenis(self, extractor):
"""Dutch 'geschiedenis' should trigger timeline constraint."""
constraint = extractor.extract("Wat is de geschiedenis van het Rijksmuseum?")
assert constraint.constraint_type == "timeline"
assert constraint.recommended_template == "institution_timeline"
assert constraint.confidence >= 0.9
def test_timeline_english_history(self, extractor):
"""English 'history' should trigger timeline constraint."""
constraint = extractor.extract("Tell me the history of the British Museum")
assert constraint.constraint_type == "timeline"
assert constraint.recommended_template == "institution_timeline"
def test_timeline_tijdlijn(self, extractor):
"""Dutch 'tijdlijn' should trigger timeline constraint."""
constraint = extractor.extract("Geef me een tijdlijn van het Noord-Hollands Archief")
assert constraint.constraint_type == "timeline"
def test_timeline_evolution(self, extractor):
"""English 'evolution' should trigger timeline constraint."""
constraint = extractor.extract("What was the evolution of this archive?")
assert constraint.constraint_type == "timeline"
# ===== Superlative Queries (Oldest/Newest) =====
def test_oldest_dutch_oudste(self, extractor):
"""Dutch 'oudste' should trigger oldest constraint."""
constraint = extractor.extract("Wat is het oudste museum in Nederland?")
assert constraint.constraint_type == "oldest"
assert constraint.recommended_template == "find_by_founding"
assert constraint.confidence >= 0.9
def test_oldest_english(self, extractor):
"""English 'oldest' should trigger oldest constraint."""
constraint = extractor.extract("What is the oldest library in Amsterdam?")
assert constraint.constraint_type == "oldest"
def test_oldest_eerste(self, extractor):
"""Dutch 'eerste' (first) should trigger oldest constraint."""
constraint = extractor.extract("Welke was de eerste openbare bibliotheek?")
assert constraint.constraint_type == "oldest"
def test_oldest_earliest(self, extractor):
"""English 'earliest' should trigger oldest constraint."""
constraint = extractor.extract("What is the earliest archive in the region?")
assert constraint.constraint_type == "oldest"
def test_newest_dutch_nieuwste(self, extractor):
"""Dutch 'nieuwste' should trigger newest constraint."""
constraint = extractor.extract("Wat is het nieuwste museum?")
assert constraint.constraint_type == "newest"
assert constraint.recommended_template == "find_by_founding"
def test_newest_english_latest(self, extractor):
"""English 'latest' should trigger newest constraint."""
constraint = extractor.extract("What is the latest museum to open?")
assert constraint.constraint_type == "newest"
def test_newest_most_recent(self, extractor):
"""English 'most recent' should trigger newest constraint."""
constraint = extractor.extract("What is the most recent archive established?")
assert constraint.constraint_type == "newest"
# ===== Change Event Keywords =====
def test_merger_dutch_fusie(self, extractor):
"""Dutch 'fusie' should trigger change_event constraint."""
constraint = extractor.extract("Wanneer was de fusie van het archief?")
assert constraint.constraint_type == "change_event"
assert constraint.reference_event == "merger"
assert constraint.recommended_template == "events_in_period"
def test_merger_english(self, extractor):
"""English 'merger' should trigger change_event constraint."""
constraint = extractor.extract("When did the merger happen?")
assert constraint.constraint_type == "change_event"
assert constraint.reference_event == "merger"
def test_merger_merged(self, extractor):
"""English 'merged' should trigger change_event constraint."""
constraint = extractor.extract("Which archives merged in 2001?")
assert constraint.constraint_type == "change_event"
def test_founding_dutch_opgericht(self, extractor):
"""Dutch 'opgericht' should trigger founding constraint."""
constraint = extractor.extract("Wanneer is het Rijksmuseum opgericht?")
assert constraint.constraint_type == "founding"
assert constraint.recommended_template == "institution_timeline"
def test_founding_english_founded(self, extractor):
"""English 'founded' should trigger founding constraint."""
constraint = extractor.extract("When was the library founded?")
assert constraint.constraint_type == "founding"
def test_founding_established(self, extractor):
"""English 'established' should trigger founding constraint."""
constraint = extractor.extract("When was this archive established?")
assert constraint.constraint_type == "founding"
def test_closure_dutch_gesloten(self, extractor):
"""Dutch 'gesloten' should trigger closure constraint."""
constraint = extractor.extract("Wanneer is het museum gesloten?")
assert constraint.constraint_type == "closure"
assert constraint.recommended_template == "institution_timeline"
def test_closure_english_closed(self, extractor):
"""English 'closed' should trigger closure constraint."""
# Note: "close" (verb form) vs "closed" (past participle)
# The extractor only has "closed" in CLOSURE_KEYWORDS
constraint = extractor.extract("When was the archive closed?")
assert constraint.constraint_type == "closure"
def test_closure_dissolved(self, extractor):
"""English 'dissolved' should trigger closure constraint."""
constraint = extractor.extract("When was the organization dissolved?")
assert constraint.constraint_type == "closure"
# ===== Year Extraction =====
def test_single_year_point_in_time(self, extractor):
"""Single year should trigger point_in_time constraint."""
constraint = extractor.extract("Wat was de status van het museum in 1990?")
assert constraint.constraint_type == "point_in_time"
assert constraint.date_start == "1990-01-01"
assert constraint.date_end == "1990-12-31"
assert constraint.recommended_template == "point_in_time_state"
def test_two_years_between(self, extractor):
"""Two years should trigger between constraint."""
constraint = extractor.extract("Welke veranderingen waren er tussen 1990 en 2000?")
assert constraint.constraint_type == "between"
assert constraint.date_start == "1990-01-01"
assert constraint.date_end == "2000-12-31"
assert constraint.recommended_template == "events_in_period"
def test_three_years_uses_first_and_last(self, extractor):
"""Three years should use first and last for range."""
constraint = extractor.extract("Musea in 1950, 1975 en 2000")
assert constraint.constraint_type == "between"
assert constraint.date_start == "1950-01-01"
assert constraint.date_end == "2000-12-31"
def test_year_with_before_dutch(self, extractor):
"""Year with Dutch 'voor' should trigger before constraint."""
constraint = extractor.extract("Welke archieven bestonden voor 1950?")
assert constraint.constraint_type == "before"
assert constraint.date_end == "1950-01-01"
assert constraint.recommended_template == "point_in_time_state"
def test_year_with_before_english(self, extractor):
"""Year with English 'before' should trigger before constraint."""
constraint = extractor.extract("Which museums existed before 1900?")
assert constraint.constraint_type == "before"
assert constraint.date_end == "1900-01-01"
def test_year_with_after_dutch(self, extractor):
"""Year with Dutch 'na' should trigger after constraint.
Note: More specific keywords (like 'opgericht') take precedence.
We use a neutral query without founding/closure keywords.
"""
constraint = extractor.extract("Welke veranderingen waren er na 1980?")
assert constraint.constraint_type == "after"
assert constraint.date_start == "1980-12-31"
assert constraint.recommended_template == "point_in_time_state"
def test_year_with_after_english(self, extractor):
"""Year with English 'after' should trigger after constraint."""
constraint = extractor.extract("What happened after 2010?")
assert constraint.constraint_type == "after"
assert constraint.date_start == "2010-12-31"
def test_year_with_since(self, extractor):
"""'Since' should trigger after constraint."""
constraint = extractor.extract("Museums opened since 2000")
assert constraint.constraint_type == "after"
assert constraint.date_start == "2000-12-31"
# ===== Year Extraction Edge Cases =====
def test_year_1800s(self, extractor):
"""Should extract years from 1800s."""
constraint = extractor.extract("Archieven uit 1856")
assert constraint.constraint_type == "point_in_time"
assert "1856" in constraint.date_start
def test_year_2020s(self, extractor):
"""Should extract years from 2020s."""
constraint = extractor.extract("Nieuwe musea in 2023")
assert constraint.constraint_type == "point_in_time"
assert "2023" in constraint.date_start
def test_ignore_numbers_that_are_not_years(self, extractor):
"""Should not extract non-year numbers as years."""
# Numbers like 500 or 50 should not be treated as years
constraint = extractor.extract("Het museum heeft 500 werken in de collectie")
assert constraint.constraint_type == "none"
# ===== No Temporal Constraint =====
def test_no_constraint_simple_query(self, extractor):
"""Query without temporal indicators should return none."""
constraint = extractor.extract("Welke musea zijn er in Amsterdam?")
assert constraint.constraint_type == "none"
assert constraint.recommended_template is None
def test_no_constraint_descriptive_query(self, extractor):
"""Descriptive query should return none."""
constraint = extractor.extract("Vertel me over de collectie van het Rijksmuseum")
assert constraint.constraint_type == "none"
# ===== Word Boundary Matching =====
def test_na_in_nationaal_not_matched(self, extractor):
"""'na' inside 'nationaal' should NOT trigger after constraint."""
constraint = extractor.extract("Nationaal Archief in Den Haag")
# 'nationaal' contains 'na' but it's not a word boundary
assert constraint.constraint_type == "none"
def test_na_as_word_is_matched(self, extractor):
"""'na' as standalone word SHOULD trigger after constraint."""
constraint = extractor.extract("Na de renovatie in 1995 werd het museum heropend")
assert constraint.constraint_type == "after"
assert "1995" in constraint.date_start
def test_voor_in_voorwerpen_not_matched(self, extractor):
"""'voor' inside 'voorwerpen' should NOT trigger before."""
constraint = extractor.extract("De collectie bevat voorwerpen uit de 18e eeuw")
# No explicit year, so should be none
assert constraint.constraint_type == "none"
def test_voor_as_word_is_matched(self, extractor):
"""'voor' as standalone word SHOULD trigger before constraint."""
constraint = extractor.extract("Archieven van voor 1900")
assert constraint.constraint_type == "before"
assert "1900" in constraint.date_end
# ===== Template Mapping =====
def test_template_mapping_point_in_time(self, extractor):
"""point_in_time should map to point_in_time_state template."""
constraint = extractor.extract("Status in 1990")
template = extractor.get_template_for_constraint(constraint)
assert template == "point_in_time_state"
def test_template_mapping_between(self, extractor):
"""between should map to events_in_period template."""
constraint = extractor.extract("Veranderingen tussen 1990 en 2000")
template = extractor.get_template_for_constraint(constraint)
assert template == "events_in_period"
def test_template_mapping_oldest(self, extractor):
"""oldest should map to find_by_founding template."""
constraint = extractor.extract("Het oudste museum")
template = extractor.get_template_for_constraint(constraint)
assert template == "find_by_founding"
def test_template_mapping_timeline(self, extractor):
"""timeline should map to institution_timeline template."""
constraint = extractor.extract("Geschiedenis van het archief")
template = extractor.get_template_for_constraint(constraint)
assert template == "institution_timeline"
def test_template_mapping_none(self, extractor):
"""none constraint should return None template."""
constraint = extractor.extract("Welke musea zijn er?")
template = extractor.get_template_for_constraint(constraint)
assert template is None
# ===== Confidence Scoring =====
def test_high_confidence_timeline(self, extractor):
"""Timeline queries should have high confidence."""
constraint = extractor.extract("Geschiedenis van het Rijksmuseum")
assert constraint.confidence >= 0.9
def test_high_confidence_superlative(self, extractor):
"""Superlative queries should have high confidence."""
constraint = extractor.extract("Het oudste archief")
assert constraint.confidence >= 0.9
def test_moderate_confidence_year_only(self, extractor):
"""Year-only queries should have moderate confidence."""
constraint = extractor.extract("Musea in 1990")
assert 0.7 <= constraint.confidence <= 0.9
def test_lower_confidence_no_constraint(self, extractor):
"""No-constraint queries should have lower confidence."""
constraint = extractor.extract("Algemene informatie over erfgoed")
assert constraint.confidence <= 0.75
class TestTemporalConstraintDataclass:
"""Tests for TemporalConstraint dataclass."""
def test_default_values(self):
"""Test default values of TemporalConstraint."""
constraint = TemporalConstraint(constraint_type="none")
assert constraint.date_start is None
assert constraint.date_end is None
assert constraint.reference_event is None
assert constraint.confidence == 0.8
assert constraint.recommended_template is None
def test_full_constraint(self):
"""Test TemporalConstraint with all fields."""
constraint = TemporalConstraint(
constraint_type="between",
date_start="1990-01-01",
date_end="2000-12-31",
reference_event=None,
confidence=0.95,
recommended_template="events_in_period"
)
assert constraint.constraint_type == "between"
assert constraint.date_start == "1990-01-01"
assert constraint.date_end == "2000-12-31"
assert constraint.confidence == 0.95
assert constraint.recommended_template == "events_in_period"
class TestTemporalIntentExtractorModule:
"""Tests for the DSPy module (without actual LLM calls)."""
def test_module_initialization(self):
"""Test module initializes correctly."""
module = TemporalIntentExtractorModule(confidence_threshold=0.75)
assert module.confidence_threshold == 0.75
assert module.fast_extractor is not None
def test_high_confidence_uses_fast_extraction(self):
"""High confidence queries should use fast extraction, not LLM."""
module = TemporalIntentExtractorModule(confidence_threshold=0.75)
# This query has high confidence (timeline keyword)
constraint = module.forward("Geschiedenis van het Rijksmuseum")
# Should use fast extraction result
assert constraint.constraint_type == "timeline"
assert constraint.confidence >= 0.75
class TestSingletonInstance:
"""Tests for singleton pattern."""
def test_get_temporal_extractor_singleton(self):
"""get_temporal_extractor should return same instance."""
ext1 = get_temporal_extractor()
ext2 = get_temporal_extractor()
assert ext1 is ext2
def test_singleton_is_temporal_constraint_extractor(self):
"""Singleton should be TemporalConstraintExtractor instance."""
ext = get_temporal_extractor()
assert isinstance(ext, TemporalConstraintExtractor)
class TestIntegration:
"""Integration tests for full temporal extraction flow."""
def test_dutch_point_in_time_full_flow(self):
"""Test complete flow for Dutch point-in-time query."""
extractor = get_temporal_extractor()
constraint = extractor.extract(
"Wat was de status van het Rijksmuseum in 1990?"
)
assert constraint.constraint_type == "point_in_time"
assert constraint.date_start == "1990-01-01"
assert constraint.date_end == "1990-12-31"
assert constraint.recommended_template == "point_in_time_state"
def test_english_timeline_full_flow(self):
"""Test complete flow for English timeline query."""
extractor = get_temporal_extractor()
constraint = extractor.extract(
"What is the history of the British Museum?"
)
assert constraint.constraint_type == "timeline"
assert constraint.recommended_template == "institution_timeline"
def test_date_range_full_flow(self):
"""Test complete flow for date range query."""
extractor = get_temporal_extractor()
constraint = extractor.extract(
"Welke fusies vonden plaats tussen 1990 en 2010?"
)
# Should detect "fusie" (merger) keyword first
# But since there are two years, it should be change_event or between
# Merger keywords take precedence
assert constraint.constraint_type == "change_event"
assert constraint.reference_event == "merger"
def test_superlative_with_location(self):
"""Test superlative query with location."""
extractor = get_temporal_extractor()
constraint = extractor.extract(
"Wat is het oudste archief in Noord-Holland?"
)
assert constraint.constraint_type == "oldest"
assert constraint.recommended_template == "find_by_founding"
def test_complex_query_multiple_indicators(self):
"""Test query with multiple temporal indicators."""
extractor = get_temporal_extractor()
# "geschiedenis" (timeline) and "oudste" (oldest) - timeline wins (checked first)
constraint = extractor.extract(
"Vertel me de geschiedenis van de oudste bibliotheek"
)
assert constraint.constraint_type == "timeline"
def test_query_templates_for_sparql(self):
"""Test that all temporal constraints map to valid templates."""
extractor = get_temporal_extractor()
test_cases = [
("Geschiedenis van het archief", "institution_timeline"),
("Het oudste museum", "find_by_founding"),
("Het nieuwste archief", "find_by_founding"),
("Status in 1990", "point_in_time_state"),
("Voor 1950", "point_in_time_state"), # Year + before
("Na 2000", "point_in_time_state"), # Year + after
("Fusies in de regio", "events_in_period"),
("Wanneer opgericht", "institution_timeline"),
("Wanneer gesloten", "institution_timeline"),
]
for query, expected_template in test_cases:
constraint = extractor.extract(query)
# Some queries may not extract years, check if template matches expectation
if constraint.constraint_type != "none":
assert constraint.recommended_template == expected_template, (
f"Query '{query}' expected template '{expected_template}', "
f"got '{constraint.recommended_template}' "
f"(constraint_type: {constraint.constraint_type})"
)
class TestRealWorldQueries:
"""Tests with real-world heritage queries."""
@pytest.fixture
def extractor(self):
return get_temporal_extractor()
def test_noord_hollands_archief_history(self, extractor):
"""Real query about Noord-Hollands Archief history."""
constraint = extractor.extract(
"Wat is de geschiedenis van het Noord-Hollands Archief sinds de fusie in 2001?"
)
# "geschiedenis" (timeline) is checked before merger/year
assert constraint.constraint_type == "timeline"
def test_museum_founding_date(self, extractor):
"""Real query about museum founding."""
constraint = extractor.extract(
"Wanneer is het Rijksmuseum in Amsterdam opgericht?"
)
assert constraint.constraint_type == "founding"
def test_archives_before_ww2(self, extractor):
"""Query about archives before WWII."""
constraint = extractor.extract(
"Welke gemeentearchieven bestonden voor 1940?"
)
assert constraint.constraint_type == "before"
assert "1940" in constraint.date_end
def test_oldest_university_library(self, extractor):
"""Query about oldest university library."""
constraint = extractor.extract(
"Wat is de oudste universiteitsbibliotheek van Nederland?"
)
assert constraint.constraint_type == "oldest"
def test_museum_closures_pandemic(self, extractor):
"""Query about closures during pandemic."""
constraint = extractor.extract(
"Welke musea zijn gesloten tijdens de pandemie in 2020?"
)
# "gesloten" (closure) keyword
assert constraint.constraint_type == "closure"
def test_digital_archives_recent(self, extractor):
"""Query about recent digital archives."""
constraint = extractor.extract(
"Welke digitale archieven zijn na 2015 gelanceerd?"
)
assert constraint.constraint_type == "after"
assert "2015" in constraint.date_start
# Run with: pytest backend/rag/test_temporal_intent.py -v

View file

@ -1038,6 +1038,400 @@ templates:
- question: "Which museums spend less than 1000 on innovation?"
slots: {budget_category: "innovation", amount: 1000, comparison: "<", institution_type: "M"}
# ---------------------------------------------------------------------------
# TEMPORAL QUERY TEMPLATES
# ---------------------------------------------------------------------------
# These templates handle time-based queries about heritage institutions:
# - Historical state at point in time
# - Institution timelines and history
# - Organizational change events (mergers, closures, foundings)
# - Finding oldest/newest institutions
#
# Reference: docs/plan/external_design_patterns/04_temporal_semantic_hypergraph.md
# Template: Point-in-time institution state
point_in_time_state:
id: "point_in_time_state"
description: "Get institution state at a specific point in time"
intent: ["temporal", "entity_lookup"]
question_patterns:
# Dutch
- "Wat was de status van {institution_name} in {year}?"
- "Hoe zag {institution_name} eruit in {year}?"
- "Bestond {institution_name} al in {year}?"
- "Wie beheerde {institution_name} in {year}?"
- "{institution_name} in {year}"
# English
- "What was the status of {institution_name} in {year}?"
- "How was {institution_name} structured in {year}?"
- "Did {institution_name} exist in {year}?"
- "State of {institution_name} before {event}?"
- "{institution_name} in {year}"
slots:
institution_name:
type: string
required: false
description: "Institution name for lookup (alternative to ghcid)"
ghcid:
type: string
required: false
description: "Global Heritage Custodian Identifier"
query_date:
type: date
required: true
description: "Point in time to query (ISO format or year)"
sparql_template: |
{{ prefixes }}
SELECT ?ghcid ?name ?type ?city ?validFrom ?validTo WHERE {
?s a crm:E39_Actor ;
hc:ghcid ?ghcid ;
skos:prefLabel ?name ;
hc:institutionType ?type .
OPTIONAL { ?s hc:validFrom ?validFrom }
OPTIONAL { ?s schema:addressLocality ?city }
OPTIONAL { ?s hc:validTo ?validTo }
{% if ghcid %}
FILTER(?ghcid = "{{ ghcid }}")
{% elif institution_name %}
FILTER(CONTAINS(LCASE(?name), "{{ institution_name | lower }}"))
{% endif %}
# Temporal filter: valid at query_date
FILTER(!BOUND(?validFrom) || ?validFrom <= "{{ query_date }}"^^xsd:date)
FILTER(!BOUND(?validTo) || ?validTo > "{{ query_date }}"^^xsd:date)
}
ORDER BY ?validFrom
{% if limit %}LIMIT {{ limit }}{% else %}LIMIT 10{% endif %}
examples:
- question: "Wat was de status van Rijksmuseum in 1990?"
slots: {institution_name: "Rijksmuseum", query_date: "1990-01-01"}
- question: "How was Noord-Hollands Archief structured in 1995?"
slots: {institution_name: "Noord-Hollands Archief", query_date: "1995-01-01"}
# Template: Institution timeline/history
institution_timeline:
id: "institution_timeline"
description: "Get complete history and timeline of changes for an institution"
intent: ["temporal", "entity_lookup"]
question_patterns:
# Dutch
- "Geschiedenis van {institution_name}"
- "Wat is de geschiedenis van {institution_name}?"
- "Tijdlijn van {institution_name}"
- "Wat is er gebeurd met {institution_name}?"
- "Vertel me over de geschiedenis van {institution_name}"
- "Hoe is {institution_name} veranderd door de jaren?"
# English
- "History of {institution_name}"
- "Timeline of {institution_name}"
- "Timeline of changes for {institution_name}"
- "What happened to {institution_name}?"
- "Tell me about the history of {institution_name}"
- "How has {institution_name} changed over the years?"
slots:
institution_name:
type: string
required: false
ghcid:
type: string
required: false
sparql_template: |
{{ prefixes }}
SELECT ?ghcid ?name ?validFrom ?validTo ?changeType ?changeReason ?description WHERE {
?entry hc:ghcid ?ghcid ;
skos:prefLabel ?name .
OPTIONAL { ?entry hc:validFrom ?validFrom }
OPTIONAL { ?entry hc:validTo ?validTo }
OPTIONAL { ?entry hc:changeType ?changeType }
OPTIONAL { ?entry hc:changeReason ?changeReason }
OPTIONAL { ?entry schema:description ?description }
{% if ghcid %}
FILTER(?ghcid = "{{ ghcid }}")
{% elif institution_name %}
FILTER(CONTAINS(LCASE(?name), "{{ institution_name | lower }}"))
{% endif %}
}
ORDER BY ?validFrom
examples:
- question: "Geschiedenis van het Rijksmuseum"
slots: {institution_name: "Rijksmuseum"}
- question: "What happened to Noord-Hollands Archief?"
slots: {institution_name: "Noord-Hollands Archief"}
# Template: Organizational change events in time period
events_in_period:
id: "events_in_period"
description: "Find organizational change events in a time period"
intent: ["temporal", "statistical"]
question_patterns:
# Dutch
- "Welke fusies waren er tussen {start_year} en {end_year}?"
- "Welke {event_type_nl} waren er in {year}?"
- "Welke instellingen zijn gesloten in {year}?"
- "Welke archieven zijn gefuseerd na {year}?"
- "Nieuwe musea sinds {year}"
- "Sluitingen in {year}"
- "Fusies tussen {start_year} en {end_year}"
# English
- "Mergers between {start_year} and {end_year}"
- "What {event_type_en} happened in {year}?"
- "What institutions closed in {year}?"
- "Archives founded before {year}"
- "New museums since {year}"
- "Closures in {year}"
- "Mergers between {start_year} and {end_year}"
slots:
start_date:
type: date
required: true
description: "Start of time period (ISO format or year)"
end_date:
type: date
required: false
description: "End of time period (defaults to now)"
event_type:
type: string
required: false
valid_values: ["MERGER", "FOUNDING", "CLOSURE", "RELOCATION", "NAME_CHANGE", "SPLIT", "ACQUISITION"]
description: "Type of organizational change event"
institution_type:
type: institution_type
required: false
sparql_template: |
{{ prefixes }}
SELECT ?event ?eventType ?date ?actor1 ?actor1Name ?actor2 ?actor2Name ?description WHERE {
?event a hc:OrganizationalChangeEvent ;
hc:eventType ?eventType ;
hc:eventDate ?date .
OPTIONAL {
?event hc:affectedActor ?actor1 .
?actor1 skos:prefLabel ?actor1Name .
}
OPTIONAL {
?event hc:resultingActor ?actor2 .
?actor2 skos:prefLabel ?actor2Name .
}
OPTIONAL { ?event schema:description ?description }
FILTER(?date >= "{{ start_date }}"^^xsd:date)
{% if end_date %}
FILTER(?date <= "{{ end_date }}"^^xsd:date)
{% endif %}
{% if event_type %}
FILTER(?eventType = "{{ event_type }}")
{% endif %}
{% if institution_type %}
?actor1 hc:institutionType "{{ institution_type }}" .
{% endif %}
}
ORDER BY ?date
{% if limit %}LIMIT {{ limit }}{% else %}LIMIT 50{% endif %}
examples:
- question: "Welke fusies waren er tussen 2000 en 2010?"
slots: {start_date: "2000-01-01", end_date: "2010-12-31", event_type: "MERGER"}
- question: "What museums closed in 2020?"
slots: {start_date: "2020-01-01", end_date: "2020-12-31", event_type: "CLOSURE", institution_type: "M"}
- question: "Archives founded before 1900"
slots: {start_date: "1800-01-01", end_date: "1899-12-31", event_type: "FOUNDING", institution_type: "A"}
# Template: Find oldest/newest institutions
find_by_founding:
id: "find_by_founding"
description: "Find oldest or newest (most recently founded) institutions"
intent: ["temporal", "exploration"]
question_patterns:
# Dutch
- "Oudste {institution_type_nl} in {location}"
- "Oudste {institution_type_nl} van Nederland"
- "Nieuwste {institution_type_nl} in {location}"
- "Welk {institution_type_nl} is het oudste in {location}?"
- "Welk {institution_type_nl} is het nieuwste?"
- "Eerst opgerichte {institution_type_nl}"
- "Laatst opgerichte {institution_type_nl}"
- "{institution_type_nl} opgericht na {year}"
- "{institution_type_nl} opgericht voor {year}"
# English
- "Oldest {institution_type_en} in {location}"
- "Oldest {institution_type_en} in the Netherlands"
- "Newest {institution_type_en} opened after {year}"
- "Most recently founded {institution_type_en}"
- "Which {institution_type_en} is the oldest in {location}?"
- "First established {institution_type_en}"
- "{institution_type_en} founded after {year}"
- "{institution_type_en} founded before {year}"
slots:
institution_type:
type: institution_type
required: true
order:
type: string
required: false
valid_values: ["ASC", "DESC"]
default: "ASC"
description: "ASC for oldest first, DESC for newest first"
location:
type: city
required: false
description: "City or region to filter by"
country:
type: country
required: false
default: "NL"
founding_after:
type: date
required: false
founding_before:
type: date
required: false
sparql_template: |
{{ prefixes }}
SELECT ?institution ?name ?foundingDate ?city ?country WHERE {
?institution a hcc:Custodian ;
hc:institutionType "{{ institution_type }}" ;
schema:name ?name .
OPTIONAL { ?institution schema:foundingDate ?foundingDate }
OPTIONAL { ?institution hc:settlementName ?city }
OPTIONAL { ?institution hc:countryCode ?country }
# Must have founding date for ordering
FILTER(BOUND(?foundingDate))
{% if location %}
FILTER(CONTAINS(LCASE(?city), "{{ location | lower }}"))
{% endif %}
{% if country %}
FILTER(?country = "{{ country }}")
{% endif %}
{% if founding_after %}
FILTER(?foundingDate >= "{{ founding_after }}"^^xsd:date)
{% endif %}
{% if founding_before %}
FILTER(?foundingDate <= "{{ founding_before }}"^^xsd:date)
{% endif %}
}
ORDER BY {{ order | default('ASC') }}(?foundingDate)
{% if limit %}LIMIT {{ limit }}{% else %}LIMIT 10{% endif %}
examples:
- question: "Oudste musea in Amsterdam"
slots: {institution_type: "M", location: "Amsterdam", order: "ASC"}
- question: "Newest libraries in the Netherlands"
slots: {institution_type: "L", country: "NL", order: "DESC"}
- question: "Archives founded after 2000"
slots: {institution_type: "A", founding_after: "2000-01-01", order: "ASC"}
# Template: Institutions by founding decade
institutions_by_founding_decade:
id: "institutions_by_founding_decade"
description: "Count or list institutions by founding decade"
intent: ["temporal", "statistical"]
question_patterns:
# Dutch
- "Hoeveel {institution_type_nl} zijn opgericht per decennium?"
- "{institution_type_nl} opgericht in de jaren {decade}"
- "Welke {institution_type_nl} zijn in de 19e eeuw opgericht?"
- "Verdeling van oprichtingsjaren voor {institution_type_nl}"
# English
- "How many {institution_type_en} were founded per decade?"
- "{institution_type_en} founded in the {decade}s"
- "Which {institution_type_en} were founded in the 19th century?"
- "Distribution of founding years for {institution_type_en}"
slots:
institution_type:
type: institution_type
required: true
decade:
type: integer
required: false
description: "Decade start year (e.g., 1990 for 1990s)"
century:
type: integer
required: false
description: "Century (e.g., 19 for 19th century)"
country:
type: country
required: false
sparql_template: |
{{ prefixes }}
SELECT ?decade (COUNT(?institution) AS ?count) WHERE {
?institution a hcc:Custodian ;
hc:institutionType "{{ institution_type }}" ;
schema:foundingDate ?foundingDate .
{% if country %}
?institution hc:countryCode "{{ country }}" .
{% endif %}
BIND(YEAR(?foundingDate) AS ?year)
BIND(FLOOR(?year / 10) * 10 AS ?decade)
{% if decade %}
FILTER(?decade = {{ decade }})
{% endif %}
{% if century %}
FILTER(?year >= {{ (century - 1) * 100 }} && ?year < {{ century * 100 }})
{% endif %}
}
GROUP BY ?decade
ORDER BY ?decade
# Alternative: list institutions in specific decade
sparql_template_list: |
{{ prefixes }}
SELECT ?institution ?name ?foundingDate WHERE {
?institution a hcc:Custodian ;
hc:institutionType "{{ institution_type }}" ;
schema:name ?name ;
schema:foundingDate ?foundingDate .
{% if country %}
?institution hc:countryCode "{{ country }}" .
{% endif %}
BIND(YEAR(?foundingDate) AS ?year)
{% if decade %}
FILTER(?year >= {{ decade }} && ?year < {{ decade + 10 }})
{% endif %}
{% if century %}
FILTER(?year >= {{ (century - 1) * 100 }} && ?year < {{ century * 100 }})
{% endif %}
}
ORDER BY ?foundingDate
{% if limit %}LIMIT {{ limit }}{% endif %}
examples:
- question: "Hoeveel musea zijn opgericht per decennium?"
slots: {institution_type: "M"}
- question: "Archives founded in the 1990s"
slots: {institution_type: "A", decade: 1990}
- question: "Libraries founded in the 19th century"
slots: {institution_type: "L", century: 19}
# =============================================================================
# FOLLOW-UP PATTERNS (Conversation Context Resolution)
# =============================================================================

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,91 @@
# GraphRAG Pattern Comparison Matrix
**Purpose**: Quick reference comparing our current implementation against external patterns.
## Comparison Matrix
| Capability | Our Current State | Microsoft GraphRAG | ROGRAG | Zep | HyperGraphRAG | LightRAG |
|------------|-------------------|-------------------|--------|-----|---------------|----------|
| **Vector Search** | Qdrant | Azure Cognitive | Faiss | Custom | Sentence-BERT | Faiss |
| **Knowledge Graph** | Oxigraph (RDF) + TypeDB | LanceDB | TuGraph | Neo4j | Custom hypergraph | Neo4j |
| **LLM Orchestration** | DSPy | Azure OpenAI | Qwen | OpenAI | GPT-4o | Various |
| **Community Detection** | Not implemented | Leiden algorithm | None | Dynamic clustering | None | Louvain |
| **Temporal Modeling** | GHCID history | Not built-in | None | Bitemporal (T, T') | None | None |
| **Multi-hop Retrieval** | SPARQL traversal | Graph expansion | Logic form | BFS | Hyperedge walk | Graph paths |
| **Verification Layer** | Not implemented | Claim extraction | Argument checking | None | None | None |
| **N-ary Relations** | CIDOC-CRM events | Binary only | Binary only | Binary only | Hyperedges | Binary only |
| **Cost Optimization** | Semantic caching | Community summaries | Minimal graph | Caching | None | Simple graph |
## Gap Analysis
### What We Have (Strengths)
| Feature | Description | Files |
|---------|-------------|-------|
| Template SPARQL | 65% precision vs 10% LLM-only | `template_sparql.py` |
| Semantic caching | Redis-backed, reduces LLM calls | `semantic_cache.py` |
| Cost tracking | Token/latency monitoring | `cost_tracker.py` |
| Ontology grounding | LinkML schema validation | `schema_loader.py` |
| Temporal tracking | GHCID history with valid_from/to | LinkML schema |
| Multi-hop SPARQL | Graph traversal via SPARQL | `dspy_heritage_rag.py` |
| Entity extraction | Heritage-specific NER | DSPy signatures |
### What We're Missing (Gaps)
| Gap | Priority | Implementation Effort | Benefit |
|-----|----------|----------------------|---------|
| Retrieval verification | High | Low (DSPy signature) | Reduces hallucination |
| Community summaries | High | Medium (Leiden + indexing) | Enables global questions |
| Dual-level extraction | High | Low (DSPy signature) | Better entity+relation matching |
| Graph context enrichment | Medium | Low (extend retrieval) | Fixes weak embeddings |
| Exploration suggestions | Medium | Medium (session memory) | Improves user experience |
| Hypergraph memory | Low | High (new architecture) | Multi-step reasoning |
## Implementation Priority
```
Priority 1 (This Sprint)
├── Retrieval Verification Layer
│ └── ArgumentVerifier DSPy signature
├── Dual-Level Entity Extraction
│ └── Extend HeritageEntityExtractor
└── Temporal SPARQL Templates
└── Point-in-time query mode
Priority 2 (Next Sprint)
├── Community Detection Pipeline
│ └── Leiden algorithm on institution graph
├── Community Summary Indexing
│ └── Store in Qdrant with embeddings
└── Global Search Mode
└── Search summaries for holistic queries
Priority 3 (Backlog)
├── Session Memory Evolution
│ └── HGMEM-style working memory
├── CIDOC-CRM Event Hyperedges
│ └── Rich custody transfer modeling
└── Exploration Suggestions
└── Suggest related queries
```
## Quick Reference: Pattern Mapping
| External Pattern | Our Implementation Approach |
|-----------------|----------------------------|
| GraphRAG communities | Pre-compute Leiden clusters in Oxigraph, store summaries in Qdrant |
| ROGRAG dual-level | DSPy signature: entities (low) + relations (high) |
| ROGRAG verification | DSPy signature: ArgumentVerifier before generation |
| Zep bitemporal | Already have via GHCID history (extend SPARQL templates) |
| HyperGraphRAG hyperedges | CIDOC-CRM events (crm:E10_Transfer_of_Custody) |
| LightRAG simple graph | We use more complete graph, but can adopt "star graph sufficiency" thinking |
## Files to Modify
| File | Changes |
|------|---------|
| `dspy_heritage_rag.py` | Add ArgumentVerifier, DualLevelExtractor, global_search mode |
| `template_sparql.py` | Add temporal query templates |
| `session_manager.py` | Add working memory evolution |
| **New**: `community_indexer.py` | Leiden detection, summary generation |
| **New**: `exploration_suggester.py` | Pattern-based query suggestions |

View file

@ -0,0 +1,855 @@
# Implementation Guide: GraphRAG Patterns for GLAM
**Purpose**: Concrete implementation patterns for integrating external GraphRAG techniques into our TypeDB-Oxigraph-DSPy stack.
---
## Pattern A: Retrieval Verification Layer
### Rationale
From ROGRAG research: Argument checking (verify context before generation) outperforms result checking (verify after generation) with 75% vs 72% accuracy.
### Implementation
Add to `dspy_heritage_rag.py`:
```python
# =============================================================================
# RETRIEVAL VERIFICATION (ROGRAG Pattern)
# =============================================================================
class ArgumentVerifier(dspy.Signature):
"""
Verify if retrieved context can answer the query before generation.
Prevents hallucination from insufficient context.
Based on ROGRAG (arxiv:2503.06474) finding that argument checking
outperforms result checking (75% vs 72% accuracy).
"""
__doc__ = """
You are a verification assistant for heritage institution queries.
Given a user query and retrieved context, determine if the context
contains sufficient information to answer the query accurately.
Be strict:
- If key entities (institutions, cities, dates) are mentioned in the query
but not found in the context, return can_answer=False
- If the query asks for counts but context doesn't provide them, return False
- If the query asks about relationships but context only has entity lists, return False
Examples of INSUFFICIENT context:
- Query: "How many archives are in Haarlem?" / Context: mentions Haarlem archives but no count
- Query: "When was Rijksmuseum founded?" / Context: describes Rijksmuseum but no founding date
Examples of SUFFICIENT context:
- Query: "What archives are in Haarlem?" / Context: lists 3 specific archives in Haarlem
- Query: "Tell me about the Rijksmuseum" / Context: contains name, location, type, description
"""
query: str = dspy.InputField(desc="User's original question")
context: str = dspy.InputField(desc="Retrieved information from KG and vector search")
can_answer: bool = dspy.OutputField(
desc="True if context contains sufficient information to answer accurately"
)
missing_info: str = dspy.OutputField(
desc="What specific information is missing (empty if can_answer=True)"
)
confidence: float = dspy.OutputField(
desc="Confidence score 0-1 that context is sufficient"
)
suggested_refinement: str = dspy.OutputField(
desc="Suggested query refinement if context is insufficient (empty if can_answer=True)"
)
class VerifiedHeritageRAG(dspy.Module):
"""
RAG pipeline with verification layer before answer generation.
"""
def __init__(self, max_verification_retries: int = 2):
super().__init__()
self.max_retries = max_verification_retries
self.verifier = dspy.ChainOfThought(ArgumentVerifier)
self.retriever = HeritageRetriever() # Existing retriever
self.generator = dspy.ChainOfThought(HeritageAnswerSignature) # Existing generator
def forward(
self,
query: str,
conversation_history: Optional[list[dict]] = None
) -> dspy.Prediction:
"""
Retrieve, verify, then generate - with retry on insufficient context.
"""
context = ""
verification_attempts = []
for attempt in range(self.max_retries + 1):
# Expand search if this is a retry
expand_search = attempt > 0
# Retrieve context
retrieval_result = self.retriever(
query=query,
expand=expand_search,
previous_context=context
)
context = retrieval_result.context
# Verify sufficiency
verification = self.verifier(query=query, context=context)
verification_attempts.append({
"attempt": attempt,
"can_answer": verification.can_answer,
"confidence": verification.confidence,
"missing": verification.missing_info
})
if verification.can_answer and verification.confidence >= 0.7:
break
# Log retry
logger.info(
f"Verification attempt {attempt + 1}/{self.max_retries + 1}: "
f"Insufficient context. Missing: {verification.missing_info}"
)
# Generate answer (with caveat if low confidence)
if not verification.can_answer:
context = f"[NOTE: Limited information available]\n\n{context}"
answer = self.generator(query=query, context=context)
return dspy.Prediction(
answer=answer.response,
context=context,
verification=verification_attempts[-1],
retries=len(verification_attempts) - 1
)
```
### Integration Point
In `dspy_heritage_rag.py`, modify `HeritageRAGModule.forward()` to use verification:
```python
# Before (current):
# answer = self.generate_answer(query, context)
# After (with verification):
verification = self.verifier(query=query, context=context)
if not verification.can_answer and verification.confidence < 0.7:
# Expand search and retry
context = self._expand_retrieval(query, context, verification.missing_info)
verification = self.verifier(query=query, context=context)
answer = self.generate_answer(query, context)
```
---
## Pattern B: Dual-Level Entity Extraction
### Rationale
From ROGRAG: Separating low-level (entities) from high-level (relations) enables:
- Low-level: Fuzzy string matching for names, places, IDs
- High-level: Semantic similarity for concepts, relationships
### Implementation
Add to `dspy_heritage_rag.py`:
```python
# =============================================================================
# DUAL-LEVEL EXTRACTION (ROGRAG Pattern)
# =============================================================================
class DualLevelEntityExtractor(dspy.Signature):
"""
Extract both entity-level and relation-level keywords from heritage queries.
Based on ROGRAG (arxiv:2503.06474) dual-level retrieval method.
Low-level: Named entities for fuzzy graph matching
High-level: Relation descriptions for semantic vector matching
"""
__doc__ = """
You are a heritage query analyzer. Extract two types of information:
LOW-LEVEL (Entities):
- Institution names: Rijksmuseum, Nationaal Archief, etc.
- Place names: Amsterdam, Limburg, Noord-Holland
- Person names: Staff, directors, curators
- Identifiers: GHCID, ISIL codes (NL-XXXX)
- Dates: Years, date ranges
HIGH-LEVEL (Relations/Concepts):
- Collection types: "digitized collections", "medieval manuscripts"
- Institution attributes: "oldest", "largest", "founded before 1900"
- Relationship phrases: "collaborated with", "merged into", "part of"
- Activities: "preserves", "exhibits", "researches"
Examples:
Query: "Which archives in Haarlem have digitized medieval manuscripts?"
Entities: ["Haarlem", "archives"]
Relations: ["digitized collections", "medieval manuscripts"]
Strategy: entity_first (narrow by location, then filter by collection type)
Query: "What museums were founded before 1850 in the Netherlands?"
Entities: ["Netherlands", "museums", "1850"]
Relations: ["founded before", "historical institution"]
Strategy: relation_first (semantic search for founding dates, then verify entities)
Query: "Tell me about the Rijksmuseum"
Entities: ["Rijksmuseum"]
Relations: ["general information", "institution overview"]
Strategy: entity_first (direct lookup)
"""
query: str = dspy.InputField(desc="User's heritage question")
entities: list[str] = dspy.OutputField(
desc="Low-level: Named entities (institutions, places, people, dates, IDs)"
)
relations: list[str] = dspy.OutputField(
desc="High-level: Relation/concept phrases for semantic matching"
)
search_strategy: Literal["entity_first", "relation_first", "parallel"] = dspy.OutputField(
desc="Recommended search strategy based on query structure"
)
entity_types: list[str] = dspy.OutputField(
desc="Types of entities found: institution, place, person, date, identifier"
)
class DualLevelRetriever(dspy.Module):
"""
Combines entity-level graph search with relation-level semantic search.
"""
def __init__(self, qdrant_client, oxigraph_endpoint: str):
super().__init__()
self.extractor = dspy.ChainOfThought(DualLevelEntityExtractor)
self.qdrant = qdrant_client
self.oxigraph = oxigraph_endpoint
def match_entities_in_graph(self, entities: list[str]) -> set[str]:
"""
Fuzzy match entities against Oxigraph nodes.
Returns matching GHCIDs.
"""
ghcids = set()
for entity in entities:
# Use FILTER with CONTAINS for fuzzy matching
sparql = f"""
PREFIX hc: <https://nde.nl/ontology/hc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?ghcid WHERE {{
?s hc:ghcid ?ghcid .
{{
?s skos:prefLabel ?name .
FILTER(CONTAINS(LCASE(?name), LCASE("{entity}")))
}} UNION {{
?s schema:addressLocality ?city .
FILTER(CONTAINS(LCASE(?city), LCASE("{entity}")))
}} UNION {{
?s hc:ghcid ?ghcid .
FILTER(CONTAINS(?ghcid, "{entity.upper()}"))
}}
}}
LIMIT 50
"""
results = self._execute_sparql(sparql)
ghcids.update(r["ghcid"] for r in results)
return ghcids
def match_relations_semantically(
self,
relations: list[str],
ghcid_filter: Optional[set[str]] = None
) -> list[dict]:
"""
Semantic search for relation descriptions in vector store.
Optionally filter by GHCID set from entity matching.
"""
# Combine relation phrases into search query
relation_query = " ".join(relations)
# Build filter
qdrant_filter = None
if ghcid_filter:
qdrant_filter = models.Filter(
must=[
models.FieldCondition(
key="ghcid",
match=models.MatchAny(any=list(ghcid_filter))
)
]
)
# Vector search
results = self.qdrant.search(
collection_name="heritage_chunks",
query_vector=self._embed(relation_query),
query_filter=qdrant_filter,
limit=20
)
return [
{
"ghcid": r.payload.get("ghcid"),
"text": r.payload.get("text"),
"score": r.score
}
for r in results
]
def forward(self, query: str) -> dspy.Prediction:
"""
Dual-level retrieval: entities narrow search, relations refine results.
"""
# Extract dual levels
extraction = self.extractor(query=query)
if extraction.search_strategy == "entity_first":
# Step 1: Entity matching in graph
ghcid_set = self.match_entities_in_graph(extraction.entities)
# Step 2: Relation matching with GHCID filter
results = self.match_relations_semantically(
extraction.relations,
ghcid_filter=ghcid_set if ghcid_set else None
)
elif extraction.search_strategy == "relation_first":
# Step 1: Broad relation matching
results = self.match_relations_semantically(extraction.relations)
# Step 2: Filter by entity matching
result_ghcids = {r["ghcid"] for r in results if r.get("ghcid")}
entity_ghcids = self.match_entities_in_graph(extraction.entities)
# Prioritize intersection
intersection = result_ghcids & entity_ghcids
if intersection:
results = [r for r in results if r.get("ghcid") in intersection]
else: # parallel
# Run both in parallel, merge results
ghcid_set = self.match_entities_in_graph(extraction.entities)
semantic_results = self.match_relations_semantically(extraction.relations)
# Score boost for results matching both
for r in semantic_results:
if r.get("ghcid") in ghcid_set:
r["score"] *= 1.5 # Boost intersection
results = sorted(semantic_results, key=lambda x: -x["score"])
return dspy.Prediction(
results=results,
entities=extraction.entities,
relations=extraction.relations,
strategy=extraction.search_strategy,
ghcid_set=list(ghcid_set) if 'ghcid_set' in locals() else []
)
```
---
## Pattern C: Community Detection and Summaries
### Rationale
From Microsoft GraphRAG: Community summaries enable answering holistic questions like "What are the main archival themes in the Netherlands?"
### Implementation
Create new file `backend/rag/community_indexer.py`:
```python
"""
Community Detection and Summary Indexing for Global Search
Based on Microsoft GraphRAG (arxiv:2404.16130) community hierarchy pattern.
Uses Leiden algorithm for community detection on institution graph.
"""
import json
import logging
from dataclasses import dataclass
from typing import Optional
import dspy
import igraph as ig
import leidenalg
from qdrant_client import QdrantClient, models
logger = logging.getLogger(__name__)
@dataclass
class Community:
"""A community of related heritage institutions."""
community_id: str
ghcids: list[str]
summary: str
institution_count: int
dominant_type: str # Most common institution type
dominant_region: str # Most common region
themes: list[str] # Extracted themes
class CommunitySummarizer(dspy.Signature):
"""Generate a summary for a community of heritage institutions."""
__doc__ = """
You are a heritage domain expert. Given a list of institutions in a community,
generate a concise summary describing:
1. What types of institutions are in this community
2. Geographic concentration (if any)
3. Common themes or specializations
4. Notable relationships between institutions
Keep the summary to 2-3 sentences. Focus on what makes this community distinctive.
"""
institutions: str = dspy.InputField(desc="JSON list of institution metadata")
summary: str = dspy.OutputField(desc="2-3 sentence community summary")
themes: list[str] = dspy.OutputField(desc="Key themes (3-5 keywords)")
notable_features: str = dspy.OutputField(desc="What makes this community distinctive")
class CommunityIndexer:
"""
Builds and indexes institution communities for global search.
Usage:
indexer = CommunityIndexer(oxigraph_url, qdrant_client)
indexer.build_communities()
indexer.index_summaries()
"""
def __init__(
self,
oxigraph_endpoint: str,
qdrant_client: QdrantClient,
collection_name: str = "heritage_communities"
):
self.oxigraph = oxigraph_endpoint
self.qdrant = qdrant_client
self.collection_name = collection_name
self.summarizer = dspy.ChainOfThought(CommunitySummarizer)
def build_institution_graph(self) -> ig.Graph:
"""
Query Oxigraph for institution relationships.
Build igraph for community detection.
"""
# Get all institutions with their properties
sparql = """
PREFIX hc: <https://nde.nl/ontology/hc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX schema: <http://schema.org/>
SELECT ?ghcid ?name ?type ?city ?region WHERE {
?s hc:ghcid ?ghcid ;
skos:prefLabel ?name ;
hc:institutionType ?type .
OPTIONAL { ?s schema:addressLocality ?city }
OPTIONAL { ?s hc:regionCode ?region }
}
"""
institutions = self._execute_sparql(sparql)
# Build graph: nodes are institutions, edges connect those sharing:
# - Same city
# - Same region
# - Same type
# - Part-of relationships
g = ig.Graph()
ghcid_to_idx = {}
# Add nodes
for inst in institutions:
idx = g.add_vertex(
ghcid=inst["ghcid"],
name=inst.get("name", ""),
type=inst.get("type", ""),
city=inst.get("city", ""),
region=inst.get("region", "")
)
ghcid_to_idx[inst["ghcid"]] = idx.index
# Add edges based on shared properties
for i, inst1 in enumerate(institutions):
for j, inst2 in enumerate(institutions[i+1:], i+1):
weight = 0
# Same city: strong connection
if inst1.get("city") and inst1["city"] == inst2.get("city"):
weight += 2
# Same region: medium connection
if inst1.get("region") and inst1["region"] == inst2.get("region"):
weight += 1
# Same type: weak connection
if inst1.get("type") and inst1["type"] == inst2.get("type"):
weight += 0.5
if weight > 0:
g.add_edge(
ghcid_to_idx[inst1["ghcid"]],
ghcid_to_idx[inst2["ghcid"]],
weight=weight
)
return g
def detect_communities(self, graph: ig.Graph) -> dict[str, list[str]]:
"""
Apply Leiden algorithm for community detection.
Returns mapping: community_id -> [ghcid_list]
"""
# Leiden with modularity optimization
partition = leidenalg.find_partition(
graph,
leidenalg.ModularityVertexPartition,
weights="weight"
)
communities = {}
for comm_idx, members in enumerate(partition):
comm_id = f"comm_{comm_idx:04d}"
ghcids = [graph.vs[idx]["ghcid"] for idx in members]
communities[comm_id] = ghcids
logger.info(f"Detected {len(communities)} communities")
return communities
def generate_community_summary(
self,
community_id: str,
ghcids: list[str]
) -> Community:
"""
Generate LLM summary for a community.
"""
# Fetch metadata for all institutions
institutions = self._fetch_institutions(ghcids)
# Generate summary
result = self.summarizer(
institutions=json.dumps(institutions, indent=2)
)
# Determine dominant type and region
types = [i.get("type", "") for i in institutions]
regions = [i.get("region", "") for i in institutions]
dominant_type = max(set(types), key=types.count) if types else ""
dominant_region = max(set(regions), key=regions.count) if regions else ""
return Community(
community_id=community_id,
ghcids=ghcids,
summary=result.summary,
institution_count=len(ghcids),
dominant_type=dominant_type,
dominant_region=dominant_region,
themes=result.themes
)
def index_summaries(self, communities: list[Community]) -> None:
"""
Store community summaries in Qdrant for global search.
"""
# Create collection if not exists
self.qdrant.recreate_collection(
collection_name=self.collection_name,
vectors_config=models.VectorParams(
size=384, # MiniLM embedding size
distance=models.Distance.COSINE
)
)
# Index each community
points = []
for comm in communities:
embedding = self._embed(comm.summary)
points.append(models.PointStruct(
id=hash(comm.community_id) % (2**63),
vector=embedding,
payload={
"community_id": comm.community_id,
"summary": comm.summary,
"ghcids": comm.ghcids,
"institution_count": comm.institution_count,
"dominant_type": comm.dominant_type,
"dominant_region": comm.dominant_region,
"themes": comm.themes
}
))
self.qdrant.upsert(
collection_name=self.collection_name,
points=points
)
logger.info(f"Indexed {len(points)} community summaries")
def global_search(self, query: str, limit: int = 5) -> list[dict]:
"""
Search community summaries for holistic questions.
"""
embedding = self._embed(query)
results = self.qdrant.search(
collection_name=self.collection_name,
query_vector=embedding,
limit=limit
)
return [
{
"community_id": r.payload["community_id"],
"summary": r.payload["summary"],
"themes": r.payload["themes"],
"institution_count": r.payload["institution_count"],
"score": r.score
}
for r in results
]
def build_and_index(self) -> int:
"""
Full pipeline: build graph, detect communities, generate summaries, index.
Returns number of communities indexed.
"""
logger.info("Building institution graph...")
graph = self.build_institution_graph()
logger.info("Detecting communities...")
community_map = self.detect_communities(graph)
logger.info("Generating community summaries...")
communities = []
for comm_id, ghcids in community_map.items():
if len(ghcids) >= 3: # Only summarize communities with 3+ members
comm = self.generate_community_summary(comm_id, ghcids)
communities.append(comm)
logger.info(f"Indexing {len(communities)} community summaries...")
self.index_summaries(communities)
return len(communities)
```
---
## Pattern D: Temporal Query Templates
### Rationale
From Zep: Bitemporal modeling enables point-in-time queries and provenance tracking.
### Implementation
Add to `template_sparql.py`:
```python
# =============================================================================
# TEMPORAL QUERY TEMPLATES (Zep Pattern)
# =============================================================================
TEMPORAL_QUERY_TEMPLATES = {
"point_in_time_state": TemplateDefinition(
id="temporal_pit",
name="Point-in-Time Institution State",
description="Get institution state at a specific point in time",
intent_patterns=["what was", "in [year]", "before", "after", "at that time"],
sparql_template="""
PREFIX hc: <https://nde.nl/ontology/hc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?ghcid ?name ?type ?city ?validFrom ?validTo WHERE {
?s hc:ghcid ?ghcid ;
skos:prefLabel ?name ;
hc:institutionType ?type ;
hc:validFrom ?validFrom .
OPTIONAL { ?s schema:addressLocality ?city }
OPTIONAL { ?s hc:validTo ?validTo }
# Temporal filter: valid at query date
FILTER(?validFrom <= "{{ query_date }}"^^xsd:date)
FILTER(!BOUND(?validTo) || ?validTo > "{{ query_date }}"^^xsd:date)
{% if ghcid_filter %}
FILTER(STRSTARTS(?ghcid, "{{ ghcid_filter }}"))
{% endif %}
}
ORDER BY ?ghcid
""",
slots=[
SlotDefinition(type=SlotType.STRING, name="query_date", required=True),
SlotDefinition(type=SlotType.STRING, name="ghcid_filter", required=False)
]
),
"institution_history": TemplateDefinition(
id="temporal_history",
name="Institution Change History",
description="Get full history of changes for an institution",
intent_patterns=["history of", "changes to", "evolution of", "timeline"],
sparql_template="""
PREFIX hc: <https://nde.nl/ontology/hc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?ghcid ?name ?validFrom ?validTo ?changeType ?description WHERE {
?entry hc:ghcid "{{ ghcid }}" ;
skos:prefLabel ?name ;
hc:validFrom ?validFrom .
OPTIONAL { ?entry hc:validTo ?validTo }
OPTIONAL { ?entry hc:changeType ?changeType }
OPTIONAL { ?entry hc:changeDescription ?description }
}
ORDER BY ?validFrom
""",
slots=[
SlotDefinition(type=SlotType.STRING, name="ghcid", required=True)
]
),
"institutions_founded_before": TemplateDefinition(
id="temporal_founded_before",
name="Institutions Founded Before Date",
description="Find institutions founded before a specific date",
intent_patterns=["founded before", "established before", "older than", "before [year]"],
sparql_template="""
PREFIX hc: <https://nde.nl/ontology/hc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?ghcid ?name ?type ?city ?foundingDate WHERE {
?s hc:ghcid ?ghcid ;
skos:prefLabel ?name ;
hc:institutionType ?type ;
schema:foundingDate ?foundingDate .
OPTIONAL { ?s schema:addressLocality ?city }
FILTER(?foundingDate < "{{ cutoff_date }}"^^xsd:date)
{% if institution_type %}
FILTER(?type = "{{ institution_type }}")
{% endif %}
}
ORDER BY ?foundingDate
LIMIT {{ limit | default(50) }}
""",
slots=[
SlotDefinition(type=SlotType.STRING, name="cutoff_date", required=True),
SlotDefinition(type=SlotType.INSTITUTION_TYPE, name="institution_type", required=False),
SlotDefinition(type=SlotType.INTEGER, name="limit", required=False, default="50")
]
),
"merger_history": TemplateDefinition(
id="temporal_mergers",
name="Institution Merger History",
description="Find institutions that merged or were absorbed",
intent_patterns=["merged", "merger", "combined", "absorbed", "joined"],
sparql_template="""
PREFIX hc: <https://nde.nl/ontology/hc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
SELECT ?event ?eventDate ?description
?sourceGhcid ?sourceName
?targetGhcid ?targetName WHERE {
?event a hc:MergerEvent ;
hc:eventDate ?eventDate ;
hc:description ?description .
OPTIONAL {
?event hc:sourceInstitution ?source .
?source hc:ghcid ?sourceGhcid ;
skos:prefLabel ?sourceName .
}
OPTIONAL {
?event hc:resultingInstitution ?target .
?target hc:ghcid ?targetGhcid ;
skos:prefLabel ?targetName .
}
{% if region_filter %}
FILTER(STRSTARTS(?sourceGhcid, "{{ region_filter }}") ||
STRSTARTS(?targetGhcid, "{{ region_filter }}"))
{% endif %}
}
ORDER BY ?eventDate
""",
slots=[
SlotDefinition(type=SlotType.STRING, name="region_filter", required=False)
]
)
}
```
---
## Integration Checklist
### Immediate Actions
- [ ] Add `ArgumentVerifier` signature to `dspy_heritage_rag.py`
- [ ] Add `DualLevelEntityExtractor` signature
- [ ] Integrate verification into retrieval pipeline
- [ ] Add temporal query templates to `template_sparql.py`
### Short-Term Actions
- [ ] Create `backend/rag/community_indexer.py`
- [ ] Add Leiden algorithm dependency: `pip install leidenalg python-igraph`
- [ ] Create Qdrant collection for community summaries
- [ ] Add global search mode to RAG pipeline
### Testing
```bash
# Test verification layer
python -c "
from backend.rag.dspy_heritage_rag import ArgumentVerifier
import dspy
dspy.configure(lm=...)
verifier = dspy.ChainOfThought(ArgumentVerifier)
result = verifier(
query='How many archives are in Haarlem?',
context='Haarlem has several heritage institutions including archives.'
)
print(f'Can answer: {result.can_answer}')
print(f'Missing: {result.missing_info}')
"
# Test dual-level extraction
python -c "
from backend.rag.dspy_heritage_rag import DualLevelEntityExtractor
import dspy
dspy.configure(lm=...)
extractor = dspy.ChainOfThought(DualLevelEntityExtractor)
result = extractor(query='Which archives in Haarlem have digitized medieval manuscripts?')
print(f'Entities: {result.entities}')
print(f'Relations: {result.relations}')
print(f'Strategy: {result.search_strategy}')
"
```

File diff suppressed because it is too large Load diff

View file

@ -164,10 +164,10 @@ imports:
- modules/slots/managing_unit
- modules/slots/managed_collections
# Enums (12 files - added CustodianPrimaryTypeEnum + EncompassingBodyTypeEnum)
# Enums (11 files - CustodianPrimaryTypeEnum ARCHIVED per Rule 9: Enum-to-Class Promotion)
# See: schemas/20251121/linkml/archive/enums/CustodianPrimaryTypeEnum.yaml.archived_20260105
- modules/enums/AgentTypeEnum
- modules/enums/AppellationTypeEnum
- modules/enums/CustodianPrimaryTypeEnum
- modules/enums/EncompassingBodyTypeEnum
- modules/enums/EntityTypeEnum
- modules/enums/LegalStatusEnum
@ -423,6 +423,102 @@ imports:
- modules/classes/WebArchive
- modules/classes/WomensArchives
# Archive RecordSetTypes - concrete subclasses of rico:RecordSetType (v0.9.12)
# These define the types of record sets held by each archive type
# Updated: all 92 archive types now have RecordSetTypes files
- modules/classes/AcademicArchiveRecordSetTypes
- modules/classes/AdvertisingRadioArchiveRecordSetTypes
- modules/classes/AnimalSoundArchiveRecordSetTypes
- modules/classes/ArchitecturalArchiveRecordSetTypes
- modules/classes/ArchiveOfInternationalOrganizationRecordSetTypes
- modules/classes/ArchivesForBuildingRecordsRecordSetTypes
- modules/classes/ArchivesRegionalesRecordSetTypes
- modules/classes/ArtArchiveRecordSetTypes
- modules/classes/AudiovisualArchiveRecordSetTypes
- modules/classes/BankArchiveRecordSetTypes
- modules/classes/CantonalArchiveRecordSetTypes
- modules/classes/CathedralArchiveRecordSetTypes
- modules/classes/ChurchArchiveRecordSetTypes
- modules/classes/ChurchArchiveSwedenRecordSetTypes
- modules/classes/ClimateArchiveRecordSetTypes
- modules/classes/CollectingArchivesRecordSetTypes
- modules/classes/ComarcalArchiveRecordSetTypes
- modules/classes/CommunityArchiveRecordSetTypes
- modules/classes/CompanyArchiveRecordSetTypes
- modules/classes/CurrentArchiveRecordSetTypes
- modules/classes/CustodianArchiveRecordSetTypes
- modules/classes/DarkArchiveRecordSetTypes
- modules/classes/DepartmentalArchivesRecordSetTypes
- modules/classes/DepositArchiveRecordSetTypes
- modules/classes/DigitalArchiveRecordSetTypes
- modules/classes/DimArchivesRecordSetTypes
- modules/classes/DiocesanArchiveRecordSetTypes
- modules/classes/DistrictArchiveGermanyRecordSetTypes
- modules/classes/DistritalArchiveRecordSetTypes
- modules/classes/EconomicArchiveRecordSetTypes
- modules/classes/FilmArchiveRecordSetTypes
- modules/classes/FoundationArchiveRecordSetTypes
- modules/classes/FreeArchiveRecordSetTypes
- modules/classes/FrenchPrivateArchivesRecordSetTypes
- modules/classes/GovernmentArchiveRecordSetTypes
- modules/classes/HistoricalArchiveRecordSetTypes
- modules/classes/HospitalArchiveRecordSetTypes
- modules/classes/HouseArchiveRecordSetTypes
- modules/classes/IconographicArchivesRecordSetTypes
- modules/classes/InstitutionalArchiveRecordSetTypes
- modules/classes/JointArchivesRecordSetTypes
- modules/classes/LGBTArchiveRecordSetTypes
- modules/classes/LightArchivesRecordSetTypes
- modules/classes/LiteraryArchiveRecordSetTypes
- modules/classes/LocalGovernmentArchiveRecordSetTypes
- modules/classes/LocalHistoryArchiveRecordSetTypes
- modules/classes/MailingListArchiveRecordSetTypes
- modules/classes/MediaArchiveRecordSetTypes
- modules/classes/MilitaryArchiveRecordSetTypes
- modules/classes/MonasteryArchiveRecordSetTypes
- modules/classes/MunicipalArchiveRecordSetTypes
- modules/classes/MuseumArchiveRecordSetTypes
- modules/classes/MusicArchiveRecordSetTypes
- modules/classes/NationalArchivesRecordSetTypes
- modules/classes/NewspaperClippingsArchiveRecordSetTypes
- modules/classes/NobilityArchiveRecordSetTypes
- modules/classes/NotarialArchiveRecordSetTypes
- modules/classes/OnlineNewsArchiveRecordSetTypes
- modules/classes/ParishArchiveRecordSetTypes
- modules/classes/ParliamentaryArchivesRecordSetTypes
- modules/classes/PartyArchiveRecordSetTypes
- modules/classes/PerformingArtsArchiveRecordSetTypes
- modules/classes/PhotoArchiveRecordSetTypes
- modules/classes/PoliticalArchiveRecordSetTypes
- modules/classes/PostcustodialArchiveRecordSetTypes
- modules/classes/PressArchiveRecordSetTypes
- modules/classes/ProvincialArchiveRecordSetTypes
- modules/classes/ProvincialHistoricalArchiveRecordSetTypes
- modules/classes/PublicArchiveRecordSetTypes
- modules/classes/PublicArchivesInFranceRecordSetTypes
- modules/classes/RadioArchiveRecordSetTypes
- modules/classes/RegionalArchiveRecordSetTypes
- modules/classes/RegionalArchivesInIcelandRecordSetTypes
- modules/classes/RegionalEconomicArchiveRecordSetTypes
- modules/classes/RegionalStateArchivesRecordSetTypes
- modules/classes/ReligiousArchiveRecordSetTypes
- modules/classes/SchoolArchiveRecordSetTypes
- modules/classes/ScientificArchiveRecordSetTypes
- modules/classes/SectorOfArchivesInSwedenRecordSetTypes
- modules/classes/SecurityArchivesRecordSetTypes
- modules/classes/SoundArchiveRecordSetTypes
- modules/classes/SpecializedArchiveRecordSetTypes
- modules/classes/SpecializedArchivesCzechiaRecordSetTypes
- modules/classes/StateArchivesRecordSetTypes
- modules/classes/StateArchivesSectionRecordSetTypes
- modules/classes/StateDistrictArchiveRecordSetTypes
- modules/classes/StateRegionalArchiveCzechiaRecordSetTypes
- modules/classes/TelevisionArchiveRecordSetTypes
- modules/classes/TradeUnionArchiveRecordSetTypes
- modules/classes/UniversityArchiveRecordSetTypes
- modules/classes/WebArchiveRecordSetTypes
- modules/classes/WomensArchivesRecordSetTypes
# New slots for registration info
- modules/slots/country
- modules/slots/description
@ -468,6 +564,9 @@ imports:
- modules/slots/is_legal_status_of
- modules/slots/has_derived_observation
- modules/slots/offers_donation_schemes
# Rico:isOrWasHolderOf relationship slot (links custodians to record set types)
- modules/slots/holds_record_set_types
comments:
- "HYPER-MODULAR STRUCTURE: Direct imports of all component files"
@ -476,7 +575,7 @@ comments:
- "Namespace structure: https://nde.nl/ontology/hc/{class|enum|slot}/[Name]"
- "Total components: 44 classes + 12 enums + 102 slots = 158 definition files"
- "Legal entity classes (5): LegalEntityType, LegalForm, LegalName, RegistrationInfo (4 classes within), total 8 classes"
- "Type classification: CustodianType (base) + specialized subclasses (ArchiveOrganizationType, MuseumType, LibraryType, GalleryType, ResearchOrganizationType, OfficialInstitutionType, BioCustodianType, EducationProviderType) + CustodianPrimaryTypeEnum (19 types)"
- "Type classification: CustodianType (base) + 19 specialized subclasses (ArchiveOrganizationType, MuseumType, LibraryType, GalleryType, ResearchOrganizationType, OfficialInstitutionType, BioCustodianType, EducationProviderType, HeritageSocietyType, FeatureCustodianType, IntangibleHeritageGroupType, PersonalCollectionType, HolySacredSiteType, DigitalPlatformType, NonProfitType, TasteScentHeritageType, CommercialOrganizationType, MixedCustodianType, UnspecifiedType)"
- "Specialized types: ArchiveOrganizationType (144 Wikidata), MuseumType (187), LibraryType (60), GalleryType (78), ResearchOrganizationType (44), OfficialInstitutionType (50+), BioCustodianType (1,393 Wikidata), EducationProviderType (60+ Wikidata) with domain-specific slots"
- "Collection aspect: CustodianCollection with 10 collection-specific slots (added managing_unit in v0.7.0, managed_by_cms in v0.8.9)"
- "Organizational aspect: OrganizationalStructure with 7 unit-specific slots (staff_members, managed_collections)"

View file

@ -0,0 +1,922 @@
id: https://nde.nl/ontology/hc/enum/ArchiveTypeEnum
name: ArchiveTypeEnum
title: Archive Type Classification
description: 'Types of archives extracted from Wikidata hyponyms of Q166118 (archive).
Generated: 2025-12-01T16:01:19Z
Total values: 144'
enums:
ArchiveTypeEnum:
permissible_values:
ACADEMIC_ARCHIVE:
description: archive of a higher education institution
meaning: wikidata:Q27032435
comments:
- Hochschularchiv (de)
- archivo académico (es)
- archives académiques (fr)
ADVERTISING_RADIO_ARCHIVE:
description: sound archive with advertising radio productions
meaning: wikidata:Q60658673
comments:
- Werbefunkarchiv (de)
- Archives radiophoniques publicitaires (fr)
- Archivio radio pubblicitaria (it)
ANIMAL_SOUND_ARCHIVE:
description: collection of animal sound recordings
meaning: wikidata:Q18574935
comments:
- Tierstimmenarchiv (de)
- Archives de voix d'animaux (fr)
- Archivio vocale degli animali (it)
ARCHITECTURAL_ARCHIVE:
description: archive that safeguards architectural heritage
meaning: wikidata:Q121409581
comments:
- Architekturarchiv (de)
- archives architecturales (fr)
- architectonisch archief (nl)
ARCHIVAL_LIBRARY:
description: library of an archive
meaning: wikidata:Q25504402
comments:
- Archivbibliothek (de)
- biblioteca de archivo (es)
- bibliothèque liée à une institution conservant des archives (fr)
ARCHIVAL_REPOSITORY:
description: digital repository for archival purposes
meaning: wikidata:Q66656823
comments:
- Archivierungsstelle (de)
- repositorio (es)
ARCHIVE:
description: agency or institution responsible for the preservation and communication of records
selected for permanent preservation
meaning: wikidata:Q166118
comments:
- Archiv (de)
- archivo (es)
- archives (fr)
ARCHIVE_ASSOCIATION:
description: Booster, history and heritage societies for archival institutions
meaning: wikidata:Q130427366
comments:
- Archivverein (de)
- Association des amis des archives (fr)
ARCHIVE_NETWORK:
description: consortium among archives for co-operation
meaning: wikidata:Q96636857
comments:
- Archivverbund (de)
- rete di archivi (it)
ARCHIVE_OF_AN_INTERNATIONAL_ORGANIZATION:
description: archive of an inter-governmental organization or of an international umbrella organization
meaning: wikidata:Q27031014
comments:
- Archiv einer internationalen Organisation (de)
- archives d'une organisation internationale (fr)
ARCHIVES_FOR_BUILDING_RECORDS:
description: Public archives for building records or construction documents
meaning: wikidata:Q136027937
comments:
- Bauaktenarchiv (de)
ARCHIVES_RÉGIONALES:
description: archives régionales (Q2860567)
meaning: wikidata:Q2860567
comments:
- Regionsarchiv (Frankreich) (de)
- archives régionales (fr)
ART_ARCHIVE:
description: specialized archive
meaning: wikidata:Q27032254
comments:
- Kunstarchiv (de)
- archivo de arte (es)
- archives artistiques (fr)
ASSOCIATION_ARCHIVE:
description: association archive (Q27030820)
meaning: wikidata:Q27030820
comments:
- Verbandsarchiv (de)
- archivo de asociación (es)
- archives associatives (fr)
AUDIOVISUAL_ARCHIVE:
description: archive that contains audio-visual materials
meaning: wikidata:Q27030766
comments:
- audio-visuelles Archiv (de)
- archivo audiovisual (es)
- archive audiovisuelle (fr)
BANK_ARCHIVE:
description: bank archive (Q52718263)
meaning: wikidata:Q52718263
comments:
- Bankarchiv (de)
- archivo bancario (es)
- archives bancaires (fr)
BILDSTELLE:
description: German institutions that build and manage collections of visual media for teaching
and research
meaning: wikidata:Q861125
comments:
- Bildstelle (de)
BRANCH:
description: local subdivision of an organization
meaning: wikidata:Q232846
comments:
- Zweigniederlassung (de)
- branche (fr)
BRANCH_OFFICE:
description: outlet of an organization or a company that unlike a subsidiary does not constitute
a separate legal entity, while being physically separated from the organization's main office
meaning: wikidata:Q1880737
comments:
- Filiale (de)
- sucursal (es)
- succursale (fr)
CANTONAL_ARCHIVE:
description: state archives of one of the cantons of Switzerland
meaning: wikidata:Q2860410
comments:
- Kantonsarchiv (de)
- archivo cantonal (es)
- archives cantonales (fr)
CAST_COLLECTION:
description: art-historical or archeological collection, usually for education, where copies,
usually of gypsum, of art works are collected and shown
meaning: wikidata:Q29380643
comments:
- Abgusssammlung (de)
- Afgietsel verzameling (nl)
CATHEDRAL_ARCHIVE:
description: cathedral archive (Q132201761)
meaning: wikidata:Q132201761
comments:
- archivo catedralicio (es)
CHURCH_ARCHIVE:
description: archive for church books about a parish
meaning: wikidata:Q64166606
comments:
- Kirchenarchiv (Schweden) (de)
- archives paroissiales (fr)
- kerkarchief (nl)
CHURCH_ARCHIVE_1:
description: archive kept by a church or ecclesiastical organisation
meaning: wikidata:Q2877653
comments:
- Kirchenarchiv (de)
- archivo eclesiástico (es)
- archives ecclésiastiques (fr)
CINEMATHEQUE:
description: organisation responsible for preserving and restoring cinematographic heritage
meaning: wikidata:Q1352795
comments:
- Kinemathek (de)
- filmoteca (es)
- cinémathèque (fr)
CLIMATE_ARCHIVE:
description: archive that provides information about the climatic past
meaning: wikidata:Q1676725
comments:
- Klimaarchiv (de)
CLOSED_SPACE:
description: an abstract space with borders
meaning: wikidata:Q78642244
comments:
- geschlossener Raum (de)
- espacio cerrado (es)
- spazio chiuso (it)
COLLECTING_ARCHIVES:
description: archive that collects materials from multiple sources
meaning: wikidata:Q117246276
COMARCAL_ARCHIVE:
description: comarcal archive (Q21086734)
meaning: wikidata:Q21086734
comments:
- Bezirksarchiv (Katalonien) (de)
- archivo comarcal (es)
COMMUNITY_ARCHIVE:
description: archive created by individuals and community groups who desire to document their
cultural heritage
meaning: wikidata:Q25105971
comments:
- Gemeinschaftsarchiv (de)
- archivo comunitario (es)
- archives communautaires (fr)
COMPANY_ARCHIVES:
description: organizational entity that keeps or archives fonds of a company
meaning: wikidata:Q10605195
comments:
- Unternehmensarchiv (de)
- archivo empresarial (es)
- archives d'entreprise (fr)
CONSERVATÓRIA:
description: Conservatória (Q9854379)
meaning: wikidata:Q9854379
COUNTY_RECORD_OFFICE:
description: local authority repository
meaning: wikidata:Q5177943
comments:
- archivio pubblico territoriale (it)
COURT_RECORDS:
description: court records (Q11906844)
meaning: wikidata:Q11906844
comments:
- Justizarchiv (de)
- archivo judicial (es)
- archives judiciaires (fr)
CULTURAL_INSTITUTION:
description: organization that works for the preservation or promotion of culture
meaning: wikidata:Q3152824
comments:
- kulturelle Organisation (de)
- institución cultural (es)
- institution culturelle (fr)
CURRENT_ARCHIVE:
description: type of archive
meaning: wikidata:Q3621648
comments:
- archivo corriente (es)
- archive courante (fr)
- archivio corrente (it)
DARK_ARCHIVE:
description: collection of materials preserved for future use but with no current access
meaning: wikidata:Q112796578
comments:
- Dark Archive (de)
DEPARTMENT:
description: office within an organization
meaning: wikidata:Q2366457
comments:
- Abteilung (de)
- departamento (es)
- service (fr)
DEPARTMENTAL_ARCHIVES:
description: departmental archives in France
meaning: wikidata:Q2860456
comments:
- Département-Archiv (de)
- archivos departamentales (es)
- archives départementales (fr)
DEPOSIT_ARCHIVE:
description: part of an archive
meaning: wikidata:Q244904
comments:
- Zwischenarchiv (de)
- archivo de depósito (es)
- archive intermédiaire (fr)
DIGITAL_ARCHIVE:
description: information system whose aim is to collect different digital resources and to make
them available to a defined group of users
meaning: wikidata:Q1224984
comments:
- digitales Archiv (de)
- archivo digital (es)
- archives numériques (fr)
DIM_ARCHIVES:
description: archive with only limited access
meaning: wikidata:Q112796779
comments:
- Dim Archive (de)
DIOCESAN_ARCHIVE:
description: archive of a bishopric
meaning: wikidata:Q11906839
comments:
- Bischöfliches Archiv (de)
- archivo diocesano (es)
- archives diocésaines (fr)
DISTRICT_ARCHIVE_GERMANY:
description: Archive type in Germany
meaning: wikidata:Q130757255
comments:
- Kreisarchiv (de)
DISTRITAL_ARCHIVE:
description: distrital archives in Portugal
meaning: wikidata:Q10296259
comments:
- Bezirksarchiv (Portugal) (de)
DIVISION:
description: distinct and large part of an organization
meaning: wikidata:Q334453
comments:
- Abteilung (de)
- división (es)
- division (fr)
DOCUMENTATION_CENTRE:
description: organisation that deals with documentation
meaning: wikidata:Q2945282
comments:
- Dokumentationszentrum (de)
- centro de documentación (es)
- centre de documentation (fr)
ECONOMIC_ARCHIVE:
description: archive documenting the economic history of a country, region etc.
meaning: wikidata:Q27032167
comments:
- Wirtschaftsarchiv (de)
- archivo económico (es)
- archives économiques (fr)
FILM_ARCHIVE:
description: archive that safeguards film heritage
meaning: wikidata:Q726929
comments:
- Filmarchiv (de)
- archivo fílmico (es)
- archives cinématographiques (fr)
FOUNDATION_ARCHIVE:
description: foundation archive (Q27030827)
meaning: wikidata:Q27030827
comments:
- Stiftungsarchiv (de)
- archivo de fundación (es)
FREE_ARCHIVE:
description: Archive that preserves documents on the history of a social movement
meaning: wikidata:Q635801
comments:
- freies Archiv (de)
- archivio libero (it)
FRENCH_PRIVATE_ARCHIVES:
description: non-public archives in France
meaning: wikidata:Q2860565
comments:
- Privatarchiv (Frankreich) (de)
- archives privées en France (fr)
FYLKESARKIV:
description: fylkesarkiv (Q15119463)
meaning: wikidata:Q15119463
FÖREMÅLSARKIV:
description: Föremålsarkiv (Q10501208)
meaning: wikidata:Q10501208
GLAM:
description: acronym for "galleries, libraries, archives, and museums" that refers to cultural
institutions that have access to knowledge as their mission
meaning: wikidata:Q1030034
comments:
- GLAM (de)
- GLAM (es)
- GLAM (fr)
GOVERNMENT_ARCHIVE:
description: official archive of a government
meaning: wikidata:Q119712417
comments:
- Staatsarchiv (de)
- archivos gubernamentales (es)
- archives gouvernementales (fr)
HISTORICAL_ARCHIVE:
description: historical archive (Q3621673)
meaning: wikidata:Q3621673
comments:
- Historisches Archiv (de)
- archivo histórico (es)
- archive historique (fr)
HOSPITAL_ARCHIVE:
description: hospital archive (Q17301917)
meaning: wikidata:Q17301917
comments:
- Krankenhausarchiv (de)
- archivo hospitalario (es)
- archives hospitalières (fr)
HOUSE_ARCHIVE:
description: archive containing documents and letters that concern a family
meaning: wikidata:Q4344572
comments:
- Familienarchiv (de)
- archivo familiar (es)
- archives familiales (fr)
ICONOGRAPHIC_ARCHIVES:
description: archives containing predominantly pictorial materials
meaning: wikidata:Q117810712
INSTITUTION:
description: structure or mechanism of social order and cooperation governing the behaviour of
a set of individuals within a given community
meaning: wikidata:Q178706
comments:
- Institution (de)
- institución (es)
- institution sociale (fr)
INSTITUTIONAL_ARCHIVE:
description: repository that holds records created or received by its parent institution
meaning: wikidata:Q124762372
comments:
- Institutionsarchiv (de)
- archivo institucional (es)
INSTITUTIONAL_REPOSITORY:
description: archive of publications by an institution's staff
meaning: wikidata:Q1065413
comments:
- Instituts-Repository (de)
- repositorio institucional (es)
- dépôt institutionnel (fr)
JOINT_ARCHIVES:
description: archive containing records or two or more entities
meaning: wikidata:Q117442301
comments:
- Gemeinsames Archiv (de)
KUSTODIE:
description: Archives and administration of art collections in higher educational institutions
meaning: wikidata:Q58482422
comments:
- Kustodie (de)
LANDSARKIV:
description: Landsarkiv (Q16324008)
meaning: wikidata:Q16324008
comments:
- Landesarchiv (de)
LGBT_ARCHIVE:
description: archive related to LGBT topics
meaning: wikidata:Q61710689
comments:
- LGBT-Archiv (de)
- archivo LGBT (es)
- archives LGBT (fr)
LIGHT_ARCHIVES:
description: repository whose holdings are broadly accessible
meaning: wikidata:Q112815447
comments:
- Light Archive (de)
LITERARY_ARCHIVE:
description: archive for literary works
meaning: wikidata:Q28607652
comments:
- Literaturarchiv (de)
- archivo literario (es)
- archives littéraires (fr)
LOCAL_GOVERNMENT_ARCHIVE:
description: archive of records belonging to a local government
meaning: wikidata:Q118281267
comments:
- Kommunalarchiv (de)
LOCAL_HERITAGE_INSTITUTION_IN_SWEDEN:
description: a Swedish type of local history and cultural heritage museums
meaning: wikidata:Q10520688
comments:
- Heimatmuseen in Schweden (de)
- Hembygdsgård (nl)
LOCAL_HISTORY_ARCHIVE:
description: archive dealing with local history
meaning: wikidata:Q12324798
comments:
- Lokalarchiv (de)
- archivo de historia local (es)
- archives d'histoire locale (fr)
LOCATION_LIBRARY:
description: a collection of visual and references information of locations, or places that might
be used for filming or photography.
meaning: wikidata:Q6664811
comments:
- biblioteca de localizaciones (es)
MAILING_LIST_ARCHIVE:
description: mailing list archive (Q104018626)
meaning: wikidata:Q104018626
comments:
- Archiv der Mailingliste (de)
- archive de la liste de diffusion (fr)
- archief van mailinglijst (nl)
MEDIA_ARCHIVE:
description: media archive (Q116809817)
meaning: wikidata:Q116809817
comments:
- Medienarchiv (de)
- archives de médias (fr)
- media-achief (nl)
MEDIENZENTRUM:
description: Medienzentrum (Q1284615)
meaning: wikidata:Q1284615
comments:
- Medienzentrum (de)
MEMORY_INSTITUTION:
description: institution which has curatorial care over a collection and whose mission it is to
preserve the collection for future generations
meaning: wikidata:Q1497649
comments:
- Gedächtnisinstitution (de)
- institución del patrimonio (es)
- institution patrimoniale (fr)
MILITARY_ARCHIVE:
description: archive for documents regarding military topics
meaning: wikidata:Q1934883
comments:
- Militärarchiv (de)
- archivo militar (es)
- archive militaire (fr)
MONASTERY_ARCHIVE:
description: archive of a monastery
meaning: wikidata:Q27030561
comments:
- Klosterarchiv (de)
- archivo monástico (es)
MUNICIPAL_ARCHIVE:
description: accumulation of historical records of a town or city
meaning: wikidata:Q604177
comments:
- Stadt- oder Gemeindearchiv (de)
- archivo municipal (es)
- archives communales (fr)
MUSEUM_ARCHIVE:
description: archive established by a museum to collect, organize, preserve, and provide access
to its organizational records
meaning: wikidata:Q53566456
comments:
- Museumsarchiv (de)
- archivo de museo (es)
- museumarchief (nl)
MUSIC_ARCHIVE:
description: archive of musical recordings and documents
meaning: wikidata:Q53759838
comments:
- Musikarchiv (de)
- archivo musical (es)
- archives musicales (fr)
NACHLASS:
description: collection of manuscripts, notes, correspondence, and so on left behind when a scholar
or an artist dies
meaning: wikidata:Q3827332
comments:
- Nachlass (de)
- Nachlass (es)
- archives formées du legs (fr)
NATIONAL_ARCHIVES:
description: archives of a country
meaning: wikidata:Q2122214
comments:
- Nationalarchiv (de)
- archivo nacional (es)
- archives nationales (fr)
NATIONAL_TREASURE:
description: treasure or artifact that is regarded as emblematic as a nation's cultural heritage,
identity or significance
meaning: wikidata:Q60606520
NATIONAL_TREASURE_OF_FRANCE:
description: designation for entities of cultural significance in France
meaning: wikidata:Q2986426
comments:
- trésor national (fr)
NEWSPAPER_CLIPPINGS_ARCHIVE:
description: archive of press clippings, organized by topics
meaning: wikidata:Q65651503
comments:
- Zeitungsausschnittsarchiv (de)
- archivo de recortes de periódicos (es)
- tijdschriftenknipselarchief (nl)
NOBILITY_ARCHIVE:
description: collection of historical documents and information about members of the nobility
meaning: wikidata:Q355358
comments:
- Adelsarchiv (de)
- archivo nobiliario (es)
- archive de noblesse (fr)
NOTARIAL_ARCHIVE:
description: type of archive housing notarial records
meaning: wikidata:Q8203685
comments:
- Notariatsarchiv (de)
- archivo notarial (es)
- archives notariales (fr)
ONLINE_NEWS_ARCHIVE:
description: archive of newspapers, magazines and other periodicals that can be consulted online
meaning: wikidata:Q2001867
comments:
- Zeitungsbank (de)
- archivo de periódicos (es)
- archives de journaux (fr)
ORGANIZATION:
description: social entity established to meet needs or pursue goals
meaning: wikidata:Q43229
comments:
- Organisation (de)
- organización (es)
- organisation (fr)
ORGANIZATIONAL_SUBDIVISION:
description: organization that is a part of a larger organization
meaning: wikidata:Q9261468
comments:
- Untereinheit (de)
- subdivisión organizacional (es)
- sous-division organisationnelle (fr)
PARENT_ORGANIZATIONUNIT:
description: organization that has a subsidiary unit, e.g. for companies, which owns enough voting
stock in another firm to control management and operations
meaning: wikidata:Q1956113
comments:
- Mutterunternehmen (de)
- organización matriz (es)
- société mère (fr)
PARISH_ARCHIVE:
description: parish archive (Q34544468)
meaning: wikidata:Q34544468
comments:
- Pfarrarchiv (de)
- archivo parroquial (es)
- archivio parrocchiale (it)
PARLIAMENTARY_ARCHIVES:
description: political archives
meaning: wikidata:Q53251146
comments:
- Parlamentsarchiv (de)
- archivo parlamentario (es)
- archives parlementaires (fr)
PARTY_ARCHIVE:
description: subclass of political archive
meaning: wikidata:Q53252161
comments:
- Parteiarchiv (de)
- archivo de partido político (es)
PERFORMING_ARTS_ARCHIVE:
description: performing arts archive (Q27030945)
meaning: wikidata:Q27030945
comments:
- Archiv für darstellende Kunst (de)
- archives des arts de la scène (fr)
PERSON_OR_ORGANIZATION:
description: class of agents
meaning: wikidata:Q106559804
comments:
- Person oder Organisation (de)
- persona u organización (es)
- personne ou organisation (fr)
PERSONAL_LIBRARY:
description: the private library collection of an individual
meaning: wikidata:Q106402388
comments:
- Autorenbibliothek (de)
- biblioteca de autor (es)
- bibliothèque personnelle (fr)
PERSONENSTANDSARCHIV:
description: Personenstandsarchiv (Q2072394)
meaning: wikidata:Q2072394
comments:
- Personenstandsarchiv (de)
PHOTO_ARCHIVE:
description: physical image collection
meaning: wikidata:Q27032363
comments:
- Fotoarchiv (de)
- archivo fotográfico (es)
- archive photographique (fr)
PHOTOGRAPH_COLLECTION:
description: photograph collection (Q130486108)
meaning: wikidata:Q130486108
comments:
- Fotosammlung (de)
- colección de fotografías (es)
- collection de photographies (fr)
POLITICAL_ARCHIVE:
description: political archive (Q27030921)
meaning: wikidata:Q27030921
comments:
- Politikarchiv (de)
- archivo político (es)
POSTCUSTODIAL_ARCHIVE:
description: postcustodial archive (Q124223197)
meaning: wikidata:Q124223197
PRESS_ARCHIVE:
description: collection of press, newspaper materials and content
meaning: wikidata:Q56650887
comments:
- Pressearchiv (de)
- archivo periodístico (es)
- archives de presse (fr)
PRINT_ROOM:
description: collection of prints, and sometimes drawings, watercolours and photographs
meaning: wikidata:Q445396
comments:
- Kupferstichkabinett (de)
- gabinete de estampas (es)
- cabinet des estampes (fr)
PROVINCIAL_ARCHIVE:
description: provincial archive (Q5403345)
meaning: wikidata:Q5403345
comments:
- Provinzarchiv (de)
PROVINCIAL_HISTORICAL_ARCHIVE:
description: type of local archive
meaning: wikidata:Q21087388
comments:
- Historisches Provinzarchiv (Katalonien) (de)
- archivo histórico provincial (es)
PUBLIC_ARCHIVE:
description: repository for official documents
meaning: wikidata:Q27031009
comments:
- Öffentliches Archiv (de)
- archivo público (es)
- archives publiques (fr)
PUBLIC_ARCHIVES_IN_FRANCE:
description: Type of archives in France
meaning: wikidata:Q2421452
comments:
- Öffentliches Archiv (de)
- archives publiques en France (fr)
PUBLIC_SPACE:
description: places for public use
meaning: wikidata:Q294440
comments:
- öffentlicher Raum (de)
- espacio público (es)
- espace public (fr)
RADIO_ARCHIVE:
description: radio archive (Q109326271)
meaning: wikidata:Q109326271
comments:
- Radioarchiv (de)
- archivo radiofónico (es)
- archives radiophoniques (fr)
REGIONAL_ARCHIVE:
description: archive with a regional scope
meaning: wikidata:Q27032392
comments:
- Regionalarchiv (de)
- archivo regional (es)
- archives régionales (fr)
REGIONAL_ARCHIVES_IN_ICELAND:
description: regional archives in Iceland (Q16428785)
meaning: wikidata:Q16428785
comments:
- Regionalarchiv (Island) (de)
REGIONAL_ECONOMIC_ARCHIVE:
description: archive documenting the economic history of a region
meaning: wikidata:Q2138319
comments:
- regionales Wirtschaftsarchiv (de)
- archivo económico regional (es)
REGIONAL_HISTORIC_CENTER:
description: name for archives in the Netherlands
meaning: wikidata:Q1882512
comments:
- Regionalhistorisches Zentrum (de)
- centre régional historique (fr)
- Regionaal Historisch Centrum (nl)
REGIONAL_STATE_ARCHIVES:
description: regional state archives in Sweden
meaning: wikidata:Q8727648
comments:
- Provinzarchiv (de)
- archivo regional (es)
- archives régionales (fr)
RELIGIOUS_ARCHIVE:
description: accumulation of records of a religious denomination or society
meaning: wikidata:Q85545753
comments:
- Religionsarchiv (de)
- archivo religioso (es)
SCHOOL_ARCHIVE:
description: school archive (Q27030883)
meaning: wikidata:Q27030883
comments:
- Schularchiv (de)
- archivo escolar (es)
- archives scolaires (fr)
SCIENTIFIC_ARCHIVE:
description: archive created for academic purposes
meaning: wikidata:Q27032095
comments:
- Forschungsarchiv (de)
- archives scientifiques (fr)
SCIENTIFIC_TECHNIC_AND_INDUSTRIAL_CULTURE_CENTER:
description: popular science place in France
meaning: wikidata:Q2945276
comments:
- centre de culture scientifique, technique et industrielle (fr)
- centro di cultura scientifica, tecnica e industriale (it)
- wetenschappelijk, technisch en industrieel cultuurcentrum (nl)
SECTOR_OF_ARCHIVES_IN_SWEDEN:
description: sector of archives
meaning: wikidata:Q84171278
comments:
- Archivwesen in Schweden (de)
SECURITY_ARCHIVES:
description: type of archives in Czechia
meaning: wikidata:Q101475797
SOCIAL_SPACE:
description: physical or virtual space such as a social center, online social media, or other
gathering place where people gather and interact
meaning: wikidata:Q4430275
comments:
- sozialer Raum (de)
- espacio social (es)
- espace social (fr)
SOUND_ARCHIVE:
description: collection of sounds
meaning: wikidata:Q2230431
comments:
- Schallarchiv (de)
- fonoteca (es)
- phonothèque (fr)
SPECIAL_COLLECTION:
description: library or library unit that houses materials requiring specialized security and
user services or whose relation (period, subject, etc.) is to be preserved
meaning: wikidata:Q4431094
comments:
- Spezialsammlung (de)
- colección especial (es)
- fonds spéciaux (fr)
SPECIALIZED_ARCHIVE:
description: archive specialized in a specific field
meaning: wikidata:Q27030941
comments:
- Facharchiv (de)
- archivo especial (es)
- archives spécialisées (fr)
SPECIALIZED_ARCHIVES:
description: type of archives in Czechia
meaning: wikidata:Q101470010
comments:
- archivo especializado (es)
- archives spécialisées (fr)
STATE_ARCHIVES:
description: archive of a state
meaning: wikidata:Q52341833
comments:
- Staatsarchiv (de)
- archivo estatal (es)
- archives de l'État (fr)
STATE_ARCHIVES_SECTION:
description: section of a national archive in Italy
meaning: wikidata:Q44796387
comments:
- Staatsarchiv-Abteilung (de)
- sezione di archivio di Stato (it)
- sectie staatsarchief (nl)
STATE_DISTRICT_ARCHIVE:
description: Archive type in the Czech Republic
meaning: wikidata:Q53131316
comments:
- Bezirksarchiv (Tschechien) (de)
STATE_REGIONAL_ARCHIVE_CZECHIA:
description: state regional archive (Czechia) (Q53130134)
meaning: wikidata:Q53130134
SUBSIDIARY_ORGANIZATION:
description: entity or organization administered by a larger entity or organization
meaning: wikidata:Q62079110
comments:
- Tochterorganisation (de)
- entidad subsidiaria (es)
- entité subsidiaire (fr)
TELEVISION_ARCHIVE:
description: a collection of television programs, recordings, and broadcasts
meaning: wikidata:Q109326243
comments:
- Fernseharchiv (de)
- archivo de televisión (es)
- archives télévisuelles (fr)
TENTATIVE_WORLD_HERITAGE_SITE:
description: Wikimedia list article
meaning: wikidata:Q1459900
comments:
- Tentativliste (de)
- lista indicativa del Patrimonio de la Humanidad (es)
- liste indicative du patrimoine mondial (fr)
TRADE_UNION_ARCHIVE:
description: archive formed by the documentation of the labor organisations
meaning: wikidata:Q66604802
comments:
- Gewerkschaftsarchiv (de)
UNIVERSITY_ARCHIVE:
description: collection of historical records of a college or university
meaning: wikidata:Q2496264
comments:
- Universitätsarchiv (de)
- archivo universitario (es)
- archives universitaires (fr)
VEREINSARCHIV:
description: Vereinsarchiv (Q130758889)
meaning: wikidata:Q130758889
comments:
- Vereinsarchiv (de)
VERLAGSARCHIV:
description: Verlagsarchiv (Q130759004)
meaning: wikidata:Q130759004
comments:
- Verlagsarchiv (de)
VERWALTUNGSARCHIV:
description: Subclass of archives
meaning: wikidata:Q2519292
comments:
- Verwaltungsarchiv (de)
VIRTUAL_MAP_LIBRARY:
description: type of library for virtual maps or cartographic products
meaning: wikidata:Q5995078
comments:
- Virtuelle Kartenbibliothek (de)
- Mapoteca virtual (es)
WEB_ARCHIVE:
description: publication type, collection of preserved web pages
meaning: wikidata:Q30047053
comments:
- Webarchiv (de)
- archivo web (es)
- archive du Web (fr)
WOMENS_ARCHIVES:
description: archives of documents and records written by and about women
meaning: wikidata:Q130217628
comments:
- Frauenarchiv (de)
WORLD_HERITAGE_SITE:
description: place of significance listed by UNESCO
meaning: wikidata:Q9259
comments:
- UNESCO-Welterbe (de)
- Patrimonio de la Humanidad (es)
- patrimoine mondial (fr)

View file

@ -0,0 +1,204 @@
id: https://nde.nl/ontology/hc/enum/CustodianPrimaryTypeEnum
name: CustodianPrimaryTypeEnum
title: GLAMORCUBESFIXPHDNT Primary Type Categories
description: |
Top-level classification of heritage custodian types using the
GLAMORCUBESFIXPHDNT taxonomy (19 categories).
**Mnemonic**: GLAMORCUBESFIXPHDNT
- **G**alleries
- **L**ibraries
- **A**rchives
- **M**useums
- **O**fficial institutions
- **R**esearch centers
- **C**orporations (commercial)
- **U**nknown/unspecified
- **B**otanical gardens/zoos (bio custodians)
- **E**ducation providers
- **S**ocieties (heritage/collecting societies)
- **F**eatures (geographic features AS custodians)
- **I**ntangible heritage groups
- mi**X**ed (multiple types)
- **P**ersonal collections
- **H**oly/sacred sites
- **D**igital platforms
- **N**GOs (non-profit organizations)
- **T**aste/smell heritage
Each category has specialized subclasses with Wikidata-derived enum values.
enums:
CustodianPrimaryTypeEnum:
permissible_values:
GALLERY:
description: "Art gallery or exhibition space (Q118554787, Q1007870)"
meaning: wikidata:Q118554787
comments:
- "Visual arts organizations"
- "Exhibition spaces (may or may not hold permanent collections)"
- "Kunsthallen, art galleries, visual arts centers"
LIBRARY:
description: "Library - institution preserving and providing access to books and documents (Q7075)"
meaning: wikidata:Q7075
comments:
- "Public libraries, academic libraries, national libraries"
- "Special libraries, digital libraries"
- "Includes bibliotheken, bibliotecas, bibliothèques"
ARCHIVE:
description: "Archive - institution preserving historical documents and records (Q166118)"
meaning: wikidata:Q166118
comments:
- "National archives, city archives, corporate archives"
- "Government archives, religious archives"
- "Includes archieven, archivos, archives"
MUSEUM:
description: "Museum - institution preserving and exhibiting cultural or scientific collections (Q33506)"
meaning: wikidata:Q33506
comments:
- "Art museums, history museums, natural history museums"
- "Science museums, ethnographic museums, local museums"
- "Includes musea, museos, musées, museums"
OFFICIAL_INSTITUTION:
description: "Government heritage agency, platform, or official cultural institution (Q895526)"
meaning: wikidata:Q895526
comments:
- "Provincial heritage services"
- "Heritage aggregation platforms"
- "Government cultural agencies"
- "TOOI: tooi:Overheidsorganisatie (Dutch government)"
- "CPOV: cpov:PublicOrganisation (EU public sector)"
RESEARCH_CENTER:
description: "Research organization or documentation center (Q136410232)"
meaning: wikidata:Q136410232
comments:
- "Research institutes with heritage collections"
- "Documentation centers"
- "University research units"
- "Policy institutes with archives"
COMMERCIAL:
description: "Corporation or business with heritage collections (Q21980538)"
meaning: wikidata:Q21980538
comments:
- "Company archives"
- "Corporate museums"
- "Brand heritage centers"
- "ROV: rov:RegisteredOrganization (if legally registered)"
UNSPECIFIED:
description: "Institution type cannot be determined (data quality flag)"
comments:
- "NOT a real institution type - indicates missing/ambiguous data"
- "Should be resolved during data curation"
- "NOT mapped to Wikidata"
BIO_CUSTODIAN:
description: "Botanical garden, zoo, aquarium, or living collections (Q473972, Q23790, Q43501)"
meaning: wikidata:Q473972
comments:
- "Botanical gardens (Q473972)"
- "Zoological gardens (Q23790)"
- "Arboreta (Q43501)"
- "Herbaria (Q2982911)"
- "Aquariums (Q4915239)"
EDUCATION_PROVIDER:
description: "Educational institution with heritage collections (Q5341295)"
meaning: wikidata:Q5341295
comments:
- "Universities with archives or collections"
- "Schools with historical materials"
- "Training centers preserving educational heritage"
- "Schema.org: schema:EducationalOrganization, schema:CollegeOrUniversity"
HERITAGE_SOCIETY:
description: "Historical society, heritage society, or collecting society (Q5774403, Q10549978)"
meaning: wikidata:Q5774403
comments:
- "Historical societies (Q5774403)"
- "Heritage societies / heemkundige kring (Q10549978)"
- "Philatelic societies (Q955824)"
- "Numismatic clubs"
- "Ephemera collectors"
FEATURE_CUSTODIAN:
description: "Geographic feature that IS the heritage custodian (special case)"
comments:
- "SPECIAL: Also links to FeaturePlace (dual aspect)"
- "Used when custodian IS a geofeature (e.g., historic mansion as museum)"
- "Examples: Q1802963 (mansion), Q44539 (temple), Q16560 (palace)"
- "Requires BOTH custodian_type AND custodian_place.place_type"
INTANGIBLE_HERITAGE_GROUP:
description: "Organization preserving intangible cultural heritage (Q105815710)"
meaning: wikidata:Q105815710
comments:
- "Traditional performance groups"
- "Oral history societies"
- "Folklore organizations"
- "Indigenous cultural practice groups"
- "UNESCO intangible cultural heritage"
MIXED:
description: "Institution with multiple simultaneous type classifications"
comments:
- "GHCID uses 'X' code"
- "actual_types slot documents all applicable types"
- "Example: Combined museum/archive/library facility"
PERSONAL_COLLECTION:
description: "Private personal collection managed by individual collector (Q134886297)"
meaning: wikidata:Q134886297
comments:
- "Individual collectors"
- "Family archives"
- "Private art collections (non-commercial)"
- "Distinguished from commercial galleries"
HOLY_SACRED_SITE:
description: "Religious site with heritage collections (Q4588528)"
meaning: wikidata:Q4588528
comments:
- "Church archives (parish records, baptismal registers)"
- "Monastery libraries (manuscript collections)"
- "Cathedral treasuries (liturgical objects, religious art)"
- "Temple museums (Buddhist artifacts)"
- "Mosque libraries (Islamic manuscripts)"
- "Synagogue archives (Jewish community records)"
- "Schema.org: schema:PlaceOfWorship"
DIGITAL_PLATFORM:
description: "Born-digital heritage platform or online repository (Q28017710)"
meaning: wikidata:Q28017710
comments:
- "Online archives (Internet Archive)"
- "Digital libraries (HathiTrust)"
- "Heritage aggregators (Europeana, DPLA)"
- "Virtual museums"
- "Schema.org: schema:WebSite, schema:SoftwareApplication"
NON_PROFIT:
description: "Non-governmental heritage organization (Q163740)"
meaning: wikidata:Q163740
comments:
- "Heritage preservation NGOs"
- "Cultural advocacy organizations"
- "Conservation societies managing heritage sites"
- "Schema.org: schema:NGO"
TASTE_SCENT_HERITAGE:
description: "Organization preserving culinary or olfactory heritage"
comments:
- "Historic restaurants preserving culinary traditions"
- "Parfumeries with historic formulation archives"
- "Distilleries maintaining traditional production methods"
- "Culinary heritage museums"
- "Potential Wikidata: Q11707 (restaurant), Q185329 (perfumery), Q131734 (distillery)"
- "NEW CATEGORY - not yet formally recognized in Wikidata"

View file

@ -13,6 +13,7 @@ When an enum is converted to a class hierarchy, the original enum file is:
| File | Archived Date | Replaced By | Rationale |
|------|--------------|-------------|-----------|
| `ArchiveTypeEnum.yaml.archived_20250105` | 2025-01-05 | 96 archive class files (e.g., `AcademicArchive.yaml`, `MunicipalArchive.yaml`) | 144 enum values replaced by class hierarchy with dual-class pattern (custodian type + rico:RecordSetType), rich ontology mappings (Schema.org, RiC-O, CIDOC-CRM, Wikidata), and multilingual labels. Enum contained non-archive types (BRANCH, DEPARTMENT, ORGANIZATION) that didn't belong. |
| `StaffRoleTypeEnum.yaml.archived_20251206` | 2025-12-06 | `StaffRole.yaml`, `StaffRoles.yaml` | Enum promoted to class hierarchy to capture formal title vs de facto work distinction, enable rich properties (role_category, common_variants, typical_domains) |
## See Also

View file

@ -1,6 +1,6 @@
# Enum Instances Index
# Generated: 2025-11-30
# Updated: 2025-12-06 (Session 4 - WebPortalTypeEnum migrated to class hierarchy)
# Updated: 2026-01-05 (Session - CustodianPrimaryTypeEnum migrated to CustodianType class hierarchy)
#
# This file provides a manifest of all enum instance files
# for programmatic loading by the frontend, RDF generators, and UML tools.
@ -12,29 +12,31 @@ description: |
Each enum value is represented as a rich instance with extended metadata,
ontology mappings, Wikidata links, and documentation for enrichment.
version: "1.8.0"
version: "1.9.0"
generated: "2025-11-30T00:00:00Z"
last_updated: "2025-12-06T00:00:00Z"
last_updated: "2026-01-05T00:00:00Z"
# Statistics
statistics:
total_enums: 29 # Was 30, SocialMediaPlatformTypeEnum migrated to class hierarchy
completed_instances: 24 # Was 25, one more enum migrated to class hierarchy
total_values_elaborated: 623+ # Was 650+, minus 27 SocialMediaPlatformTypeEnum values
total_values_estimated: 650+
total_enums: 28 # Was 29, CustodianPrimaryTypeEnum migrated to CustodianType class hierarchy
completed_instances: 23 # Was 24, one more enum migrated to class hierarchy
total_values_elaborated: 604+ # Was 623+, minus 19 CustodianPrimaryTypeEnum values
total_values_estimated: 630+
with_wikidata_mapping: 93%
with_ontology_mapping: 96%
# Completed Enum Instance Files
completed:
# === Original 8 (Session 1) ===
- id: 1
name: Custodian Type Classification
file: custodian_primary_type.yaml
enum: CustodianPrimaryTypeEnum
count: 19
status: completed
description: "GLAMORCUBESFIXPHDNT taxonomy - top-level heritage custodian categories"
# ID 1 (CustodianPrimaryTypeEnum) - MIGRATED to CustodianType class hierarchy
# See: modules/classes/CustodianType.yaml and 19 specialized subclasses
# Archived: archive/enums/CustodianPrimaryTypeEnum.yaml.archived_20260105
# Instance archived: instances/enums/archive/custodian_primary_type.yaml.archived_20260105
# Migration date: 2026-01-05
# Rationale: Enum promoted to class hierarchy per Rule 9 (Enum-to-Class Promotion).
# CustodianType subclasses support rich properties (wikidata_entity,
# custodian_type_broader, etc.) and inheritance. Single Source of Truth principle.
- id: 2
name: Organizational Change Events

File diff suppressed because it is too large Load diff

View file

@ -19,19 +19,7 @@ classes:
AcademicArchive:
is_a: ArchiveOrganizationType
class_uri: schema:ArchiveOrganization
description: |
Archive of a higher education institution (university, college, polytechnic).
**Dual-Class Pattern**:
This class represents the CUSTODIAN type (the archive organization).
For the collection type, see `AcademicArchiveRecordSetType` which maps to `rico:RecordSetType`.
**Holdings** (linked via rico:isOrWasHolderOf):
Academic archives typically hold records classified under these RecordSetTypes:
- UniversityAdministrativeFonds - Governance, committee, policy records
- StudentRecordSeries - Enrollment, transcripts, graduation records
- FacultyPaperCollection - Personal papers of faculty members
- CampusDocumentationCollection - Photos, publications, ephemera
description: Archive of a higher education institution (university, college, polytechnic).
slots:
- custodian_types
- custodian_types_rationale
@ -73,15 +61,7 @@ classes:
- campus life documentation
slot_usage:
holds_record_set_types:
description: |
Links this custodian type to the record set types it typically holds.
Uses RiC-O property rico:isOrWasHolderOf to express custodial relationship.
**Academic Archive Holdings**:
- UniversityAdministrativeFonds - Governance, committee, policy records
- StudentRecordSeries - Enrollment, transcripts, graduation records
- FacultyPaperCollection - Personal papers of faculty members
- CampusDocumentationCollection - Photos, publications, ephemera
description: Record set types typically held by academic archives.
equals_expression: |
["hc:UniversityAdministrativeFonds", "hc:StudentRecordSeries", "hc:FacultyPaperCollection", "hc:CampusDocumentationCollection"]
wikidata_entity:
@ -95,9 +75,9 @@ classes:
Typically 'university', 'college', or 'institutional'.
Reflects the educational institution's administrative scope.
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: AcademicArchive is an archival institution - maps to ARCHIVE
equals_string: AcademicArchive is an archival institution - maps to ArchiveOrganizationType
(A)
wikidata_alignment:
range: WikidataAlignment
@ -140,13 +120,7 @@ classes:
- wd:Q1065413
- AcademicArchiveRecordSetType
AcademicArchiveRecordSetType:
description: "A rico:RecordSetType for classifying collections of academic and\
\ higher \neducation institutional records within heritage institutions.\n\n\
**Dual-Class Pattern**:\nThis class represents the COLLECTION type (rico:RecordSetType).\n\
For the custodian organization type, see `AcademicArchive`.\n\n**Scope**:\n\
Used to classify record sets that contain academic institutional materials:\n\
- University administrative fonds\n- Student record series\n- Faculty paper\
\ collections\n- Campus documentation collections\n"
description: A rico:RecordSetType for classifying collections of academic and higher education institutional records.
is_a: CollectionType
class_uri: rico:RecordSetType
slots:
@ -166,13 +140,8 @@ classes:
Structured scope definitions for AcademicArchiveRecordSetType.
Formally documents what types of record sets are classified under this type.
comments:
- "**Subclasses (concrete RecordSetTypes)**:\n\nThis abstract type has four concrete\
\ subclasses defined in \nAcademicArchiveRecordSetTypes.yaml:\n\n1. UniversityAdministrativeFonds\
\ - Governance, committee, policy records\n2. StudentRecordSeries - Enrollment,\
\ transcripts, graduation records\n3. FacultyPaperCollection - Personal papers\
\ of faculty members\n4. CampusDocumentationCollection - Photos, publications,\
\ ephemera\n\nEach subclass maps to rico:RecordSetType with appropriate broad_mappings\
\ \nto RiC-O organizational concepts (rico:Fonds, rico:Series, rico:Collection).\n"
- Collection type class for academic/higher education record sets
- Part of dual-class pattern with AcademicArchive (custodian type)
structured_aliases:
- literal_form: Hochschularchivbestand
in_language: de
@ -186,7 +155,7 @@ classes:
wikidata_equivalent:
equals_string: Q27032435
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: AcademicArchiveRecordSetType classifies collections held by
ARCHIVE (A) type custodians

View file

@ -72,7 +72,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: University administrative fonds are held by ARCHIVE (A) type
custodians
@ -97,7 +97,7 @@ classes:
"strategic planning", "accreditation records"]'
scope_excludes:
equals_string: '["student records", "faculty papers", "research data"]'
StudentRecordSeries:
AcademicStudentRecordSeries:
is_a: AcademicArchiveRecordSetType
class_uri: rico:RecordSetType
description: "A rico:RecordSetType for student records organized as archival series.\n\
@ -155,7 +155,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: Student record series are held by ARCHIVE (A) type custodians
specificity_annotation:
@ -165,7 +165,7 @@ classes:
range: TemplateSpecificityScores
inlined: true
rico_record_set_type:
equals_string: StudentRecordSeries
equals_string: AcademicStudentRecordSeries
rico_organizational_principle:
equals_string: series
rico_organizational_principle_uri:
@ -245,7 +245,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "L"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:LibraryType"]'
custodian_types_rationale:
equals_string: Faculty papers may be held by ARCHIVE (A) or LIBRARY (L) special
collections
@ -334,7 +334,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "L", "M"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:LibraryType", "hc:MuseumType"]'
custodian_types_rationale:
equals_string: Campus documentation may be held by ARCHIVE (A), LIBRARY (L),
or MUSEUM (M) depending on material type

View file

@ -87,7 +87,7 @@ classes:
wikidata_equivalent:
equals_string: Q60658673
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: AdvertisingRadioArchive is an archival institution - maps to
ARCHIVE (A)

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: RadioAdvertisementCollection records are held by ARCHIVE (A)
type custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CampaignDocumentationSeries records are held by ARCHIVE (A)
type custodians

View file

@ -100,9 +100,9 @@ classes:
wikidata_equivalent:
equals_string: Q18574935
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: AnimalSoundArchive is an archival institution - maps to ARCHIVE
equals_string: AnimalSoundArchive is an archival institution - maps to ArchiveOrganizationType
(A)
wikidata_alignment:
range: WikidataAlignment

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: BioacousticRecordingCollection records are held by ARCHIVE
(A) type custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: FieldRecordingSeries records are held by ARCHIVE (A) type custodians
specificity_annotation:

View file

@ -49,10 +49,10 @@ classes:
Typically includes: architectural drawings, blueprints, building plans,
models, photographs, specifications, correspondence, competition entries.
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ArchitecturalArchive is a specialized archive type for architectural
documentation - maps to ARCHIVE type (A)
documentation - maps to ArchiveOrganizationType type (A)
specificity_annotation:
range: SpecificityAnnotation
inlined: true

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ArchitecturalDrawingCollection records are held by ARCHIVE
(A) type custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ArchitectPapersCollection records are held by ARCHIVE (A) type
custodians
@ -123,7 +123,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: BuildingProjectFonds records are held by ARCHIVE (A) type custodians
specificity_annotation:

View file

@ -48,7 +48,7 @@ classes:
All ArchivalLibrary instances MUST be linked to a parent archive.
required: true
custodian_types:
equals_expression: '["A", "L"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:LibraryType"]'
custodian_types_rationale:
equals_string: Archival library is an OrganizationBranch combining archive
(A) and library (L) functions.

View file

@ -54,7 +54,7 @@ classes:
Advocacy, public programming, and engagement activities.
Key focus for archive associations as support organizations.
custodian_types:
equals_expression: '["A", "S"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:HeritageSocietyType"]'
custodian_types_rationale:
equals_string: Archive association combines archive (A) and society/association
(S).

View file

@ -47,8 +47,6 @@ classes:
slots:
- custodian_types
- custodian_types_rationale
- encompassing_body_link
- member_archives
- specificity_annotation
- template_specificity
slot_usage:
@ -65,9 +63,9 @@ classes:
minimum_cardinality: 1
maximum_cardinality: 1
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ArchiveNetwork is an archival institution - maps to ARCHIVE
equals_string: ArchiveNetwork is an archival institution - maps to ArchiveOrganizationType
(A)
specificity_annotation:
range: SpecificityAnnotation

View file

@ -61,7 +61,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ArchiveOfInternationalOrganizationRecordSetType classifies
collections held by ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: InternationalOrgFonds records are held by ARCHIVE (A) type
custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: TreatyCollection records are held by ARCHIVE (A) type custodians
specificity_annotation:
@ -122,7 +122,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ConferenceRecordSeries records are held by ARCHIVE (A) type
custodians

View file

@ -97,7 +97,7 @@ classes:
range: ArchiveOrganizationType
required: false
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ArchiveOrganizationType is specific to archives - institutions
preserving original records and historical documents

View file

@ -86,7 +86,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ArchivesForBuildingRecordsRecordSetType classifies collections
held by ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: BuildingPermitSeries records are held by ARCHIVE (A) type custodians
specificity_annotation:
@ -79,7 +79,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ConstructionDocumentCollection records are held by ARCHIVE
(A) type custodians

View file

@ -80,7 +80,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ArchivesRegionalesRecordSetType classifies collections held
by ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: RegionalAdministrationFonds records are held by ARCHIVE (A)
type custodians

View file

@ -87,7 +87,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ArtArchiveRecordSetType classifies collections held by ARCHIVE
(A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ArtistPapersCollection records are held by ARCHIVE (A) type
custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: GalleryRecordsFonds records are held by ARCHIVE (A) type custodians
specificity_annotation:
@ -122,7 +122,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ExhibitionDocumentationCollection records are held by ARCHIVE
(A) type custodians

View file

@ -87,7 +87,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: AudiovisualArchiveRecordSetType classifies collections held
by ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: AudiovisualRecordingCollection records are held by ARCHIVE
(A) type custodians
@ -58,7 +58,7 @@ classes:
rico_has_or_had_holder_note:
equals_string: This RecordSetType is typically held by AudiovisualArchive
custodians. Inverse of rico:isOrWasHolderOf.
MediaProductionFonds:
AudiovisualProductionFonds:
is_a: AudiovisualArchiveRecordSetType
class_uri: rico:RecordSetType
description: "A rico:RecordSetType for Media production records.\n\n**RiC-O Alignment**:\n\
@ -80,9 +80,9 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: MediaProductionFonds records are held by ARCHIVE (A) type custodians
equals_string: AudiovisualProductionFonds records are held by ARCHIVE (A) type custodians
specificity_annotation:
range: SpecificityAnnotation
inlined: true
@ -90,7 +90,7 @@ classes:
range: TemplateSpecificityScores
inlined: true
rico_record_set_type:
equals_string: MediaProductionFonds
equals_string: AudiovisualProductionFonds
rico_organizational_principle:
equals_string: fonds
rico_organizational_principle_uri:

View file

@ -57,28 +57,10 @@ classes:
- Deutsche Bank Historical Archive
- Rothschild Archive (London)
- Archives historiques de la Société Générale
**Dual-Class Pattern**:
This class represents the CUSTODIAN type (the archive organization).
For the collection type, see `BankRecordSetType` (rico:RecordSetType).
**Ontological Alignment**:
- **SKOS**: skos:Concept with skos:broader Q166118 (archive)
- **Schema.org**: schema:ArchiveOrganization
- **RiC-O**: rico:CorporateBody (as agent)
**Multilingual Labels**:
- de: Bankarchiv
- es: archivo bancario
- fr: archives bancaires
slot_usage: null
BankArchiveRecordSetType:
description: |
A rico:RecordSetType for classifying collections held by BankArchive custodians.
**Dual-Class Pattern**:
This class represents the COLLECTION type (rico:RecordSetType).
For the custodian organization type, see `BankArchive`.
is_a: CollectionType
class_uri: rico:RecordSetType
slots:
@ -93,7 +75,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: BankArchiveRecordSetType classifies collections held by ARCHIVE
(A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: BankingRecordsFonds records are held by ARCHIVE (A) type custodians
specificity_annotation:
@ -79,7 +79,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: FinancialTransactionSeries records are held by ARCHIVE (A)
type custodians
@ -122,7 +122,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CustomerAccountSeries records are held by ARCHIVE (A) type
custodians

View file

@ -60,9 +60,9 @@ classes:
- de: Bildstelle
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: Bildstelle is an archival institution - maps to ARCHIVE (A)
equals_string: Bildstelle is an archival institution - maps to ArchiveOrganizationType (A)
specificity_annotation:
range: SpecificityAnnotation
inlined: true

View file

@ -416,7 +416,7 @@ classes:
range: string
required: false
custodian_types:
equals_expression: '["B"]'
equals_expression: '["hc:BioCustodianType"]'
custodian_types_rationale:
equals_string: BioCustodianType is specific to botanical gardens, zoos, aquariums
- institutions with living collections

View file

@ -487,7 +487,7 @@ classes:
- value: MW123456
- value: MN987654
custodian_types:
equals_expression: '["B", "M", "R"]'
equals_expression: '["hc:BioCustodianType", "hc:MuseumType", "hc:ResearchOrganizationType"]'
custodian_types_rationale:
equals_string: |
BiologicalObject is primarily relevant to:

View file

@ -98,7 +98,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CantonalArchiveRecordSetType classifies collections held by
ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CantonalGovernmentFonds records are held by ARCHIVE (A) type
custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CantonalLegislationCollection records are held by ARCHIVE (A)
type custodians

View file

@ -75,7 +75,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["M"]'
equals_expression: '["hc:MuseumType"]'
custodian_types_rationale:
equals_string: Cast collection is a museum collection type (M).
specificity_annotation:

View file

@ -86,7 +86,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CathedralArchiveRecordSetType classifies collections held by
ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ChapterRecordsFonds records are held by ARCHIVE (A) type custodians
specificity_annotation:
@ -79,7 +79,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: LiturgicalDocumentCollection records are held by ARCHIVE (A)
type custodians
@ -122,7 +122,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: FabricRecordsSeries records are held by ARCHIVE (A) type custodians
specificity_annotation:

View file

@ -105,37 +105,3 @@ classes:
- PastoralCorrespondenceCollection
- ChurchPropertyFonds
- CongregationalLifeCollection
ChurchArchiveRecordSetType:
description: |
A rico:RecordSetType for classifying collections held by ChurchArchive custodians.
**Dual-Class Pattern**:
This class represents the COLLECTION type (rico:RecordSetType).
For the custodian organization type, see `ChurchArchive`.
is_a: CollectionType
class_uri: rico:RecordSetType
slots:
- custodian_types
- custodian_types_rationale
- dual_class_link
- specificity_annotation
- template_specificity
- type_scope
see_also:
- ChurchArchive
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
custodian_types_rationale:
equals_string: ChurchArchiveRecordSetType classifies collections held by ARCHIVE
(A) type custodians
dual_class_link:
range: DualClassLink
inlined: true
specificity_annotation:
range: SpecificityAnnotation
inlined: true
template_specificity:
range: TemplateSpecificityScores
inlined: true

View file

@ -49,7 +49,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "H"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType"]'
custodian_types_rationale:
equals_string: Church archive record set types are held by ARCHIVE (A) or
HOLY_SITES (H) type custodians
@ -121,7 +121,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "H"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType"]'
custodian_types_rationale:
equals_string: Church governance fonds are held by ARCHIVE (A) or HOLY_SITES
(H) type custodians
@ -199,12 +199,12 @@ classes:
- wd:Q185583
close_mappings:
- skos:Concept
- CivilRegistrySeries
see_also:
- ChurchArchiveRecordSetType
- rico:RecordSetType
- rico-rst:Series
- ParishArchive
- CivilRegistrySeries
annotations:
genealogy_note: Primary source for genealogical research, especially pre-civil
registration periods. Many digitized and indexed by organizations like FamilySearch, Alle
@ -219,7 +219,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "H"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType"]'
custodian_types_rationale:
equals_string: Parish register series are held by ARCHIVE (A) or HOLY_SITES
(H), often transferred to regional archives
@ -294,11 +294,11 @@ classes:
- wd:Q22075301
close_mappings:
- skos:Concept
- FacultyPaperCollection
see_also:
- ChurchArchiveRecordSetType
- rico:RecordSetType
- rico-rst:Fonds
- FacultyPaperCollection
slots:
- custodian_types
- custodian_types_rationale
@ -306,7 +306,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "H", "L"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType", "hc:LibraryType"]'
custodian_types_rationale:
equals_string: Pastoral correspondence may be held by ARCHIVE (A), HOLY_SITES
(H), or LIBRARY (L) special collections
@ -396,7 +396,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "H"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType"]'
custodian_types_rationale:
equals_string: Church property fonds are held by ARCHIVE (A) or HOLY_SITES
(H) type custodians
@ -477,11 +477,11 @@ classes:
close_mappings:
- skos:Concept
- schema:Collection
- CampusDocumentationCollection
see_also:
- ChurchArchiveRecordSetType
- rico:RecordSetType
- rico-rst:Collection
- CampusDocumentationCollection
annotations:
collection_nature_note: Often includes artificial/assembled collections. Materials
reflect the lived religious experience of the community beyond formal administration.
@ -492,7 +492,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "H", "S"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType", "hc:HeritageSocietyType"]'
custodian_types_rationale:
equals_string: Congregational life collections may be held by ARCHIVE (A),
HOLY_SITES (H), or COLLECTING_SOCIETY (S)

View file

@ -92,7 +92,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ChurchArchiveSwedenRecordSetType classifies collections held
by ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: SwedishParishRecordSeries records are held by ARCHIVE (A) type
custodians
@ -58,12 +58,13 @@ classes:
rico_has_or_had_holder_note:
equals_string: This RecordSetType is typically held by ChurchArchiveSweden
custodians. Inverse of rico:isOrWasHolderOf.
ChurchPropertyFonds:
SwedishChurchPropertyFonds:
is_a: ChurchArchiveSwedenRecordSetType
class_uri: rico:RecordSetType
description: "A rico:RecordSetType for Church property records.\n\n**RiC-O Alignment**:\n\
description: "A rico:RecordSetType for Swedish Church property records.\n\n**RiC-O Alignment**:\n\
This class is a specialized rico:RecordSetType following the fonds \norganizational\
\ principle as defined by rico-rst:Fonds.\n"
\ principle as defined by rico-rst:Fonds.\n\n**Note**: This is a Swedish-specific\
\ variant. For the general church property fonds type, see ChurchPropertyFonds.\n"
exact_mappings:
- rico:RecordSetType
related_mappings:
@ -73,6 +74,7 @@ classes:
see_also:
- ChurchArchiveSwedenRecordSetType
- rico:RecordSetType
- ChurchPropertyFonds
slots:
- custodian_types
- custodian_types_rationale
@ -80,9 +82,9 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ChurchPropertyFonds records are held by ARCHIVE (A) type custodians
equals_string: SwedishChurchPropertyFonds records are held by ARCHIVE (A) type custodians
specificity_annotation:
range: SpecificityAnnotation
inlined: true
@ -90,7 +92,7 @@ classes:
range: TemplateSpecificityScores
inlined: true
rico_record_set_type:
equals_string: ChurchPropertyFonds
equals_string: SwedishChurchPropertyFonds
rico_organizational_principle:
equals_string: fonds
rico_organizational_principle_uri:

View file

@ -64,9 +64,9 @@ classes:
- fr: cinémathèque
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: Cinematheque is an archival institution - maps to ARCHIVE (A)
equals_string: Cinematheque is an archival institution - maps to ArchiveOrganizationType (A)
specificity_annotation:
range: SpecificityAnnotation
inlined: true

View file

@ -90,7 +90,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ClimateArchiveRecordSetType classifies collections held by
ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ClimateDataCollection records are held by ARCHIVE (A) type
custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: MeteorologicalObservationSeries records are held by ARCHIVE
(A) type custodians

View file

@ -69,9 +69,9 @@ classes:
- **RiC-O**: rico:CorporateBody (as agent)
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CollectingArchives is an archival institution - maps to ARCHIVE
equals_string: CollectingArchives is an archival institution - maps to ArchiveOrganizationType
(A)
specificity_annotation:
range: SpecificityAnnotation
@ -100,7 +100,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CollectingArchivesRecordSetType classifies collections held
by ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CollectedMaterialsFonds records are held by ARCHIVE (A) type
custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: DonatedPapersCollection records are held by ARCHIVE (A) type
custodians

View file

@ -24,16 +24,33 @@ imports:
- ./FindingAid
- ./ExhibitedObject
- ./CurationActivity
- ../slots/access_policy_ref
- ../slots/acquisition_date
- ../slots/acquisition_method
- ../slots/acquisition_source
- ../slots/arrangement
- ../slots/class_metadata_slots
- ../slots/collection_description
- ../slots/collection_id
- ../slots/collection_name
- ../slots/collection_type_ref
- ../slots/curation_activities
- ../slots/custodial_history
- ../slots/digital_surrogate_url
- ../slots/digitization_status
- ../slots/extent
- ../slots/extent_items
- ../slots/finding_aids
- ../slots/items
- ../slots/parent_collection
- ../slots/part_of_custodian_collection
- ../slots/provenance_statement
- ../slots/rico_record_set_type
- ../slots/sub_collections
- ../slots/subject_areas
- ../slots/temporal_coverage
- ../slots/valid_from
- ../slots/valid_to
- ../slots/collection_name
- ../slots/collection_description
- ../slots/extent
- ../slots/temporal_coverage
- ../slots/digitization_status
- ../slots/acquisition_method
- ../slots/acquisition_date
- ../slots/class_metadata_slots
classes:
Collection:
class_uri: rico:RecordSet
@ -616,7 +633,7 @@ classes:
Date when this collection ended at current custodian (if transferred/deaccessioned).
range: date
custodian_types:
equals_expression: '["G", "L", "A", "M", "B", "H"]'
equals_expression: '["hc:GalleryType", "hc:LibraryType", "hc:ArchiveOrganizationType", "hc:MuseumType", "hc:BioCustodianType", "hc:HolySacredSiteType"]'
custodian_types_rationale:
equals_string: 'Collection is relevant to institutions that hold catalogued
collections: Galleries, Libraries, Archives, Museums, Botanical gardens/zoos,
@ -681,65 +698,9 @@ classes:
digitization_status: PARTIAL
part_of_custodian_collection: https://nde.nl/ontology/hc/custodian-collection/nationaal-archief
description: VOC archival fonds at Nationaal Archief
slots:
collection_id:
description: Unique identifier for this collection
range: uriorcurie
collection_type_ref:
description: Classification from CollectionType hierarchy
range: CollectionType
rico_record_set_type:
description: RiC-O RecordSetType vocabulary mapping
range: uriorcurie
extent_items:
description: Numeric item count
range: integer
subject_areas:
description: Thematic subjects
range: string
multivalued: true
provenance_statement:
description: Provenance narrative
range: string
custodial_history:
description: Chain of custody
range: string
multivalued: true
acquisition_source:
description: From whom collection was acquired
range: string
access_policy_ref:
description: Access policy governing collection
range: AccessPolicy
arrangement:
description: Intellectual arrangement system
range: string
finding_aids:
description: Finding aids describing this collection
range: FindingAid
multivalued: true
slot_uri: rico:isDescribedBy
digital_surrogate_url:
description: URL to digital surrogate
range: uri
multivalued: true
parent_collection:
description: Parent collection (hierarchical)
range: Collection
sub_collections:
description: Child collections (hierarchical)
range: Collection
multivalued: true
items:
description: Individual ExhibitedObject items within this collection
range: ExhibitedObject
multivalued: true
slot_uri: rico:hasOrHadConstituent
curation_activities:
description: Ongoing curation activities performed on this collection
range: CurationActivity
multivalued: true
slot_uri: crm:P147i_was_curated_by
part_of_custodian_collection:
description: Link to abstract CustodianCollection
range: CustodianCollection
# NOTE: All slots are defined in centralized modules/slots/ files
# Slots used by this class: collection_id, collection_type_ref, rico_record_set_type,
# extent_items, subject_areas, provenance_statement, custodial_history, acquisition_source,
# access_policy_ref, arrangement, finding_aids, digital_surrogate_url, parent_collection,
# sub_collections, items, curation_activities, part_of_custodian_collection

View file

@ -89,7 +89,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ComarcalArchiveRecordSetType classifies collections held by
ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ComarcalAdministrationFonds records are held by ARCHIVE (A)
type custodians
@ -58,7 +58,7 @@ classes:
rico_has_or_had_holder_note:
equals_string: This RecordSetType is typically held by ComarcalArchive custodians.
Inverse of rico:isOrWasHolderOf.
LocalHistoryCollection:
ComarcalHistoryCollection:
is_a: ComarcalArchiveRecordSetType
class_uri: rico:RecordSetType
description: "A rico:RecordSetType for Regional historical documentation.\n\n\
@ -80,9 +80,9 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: LocalHistoryCollection records are held by ARCHIVE (A) type
equals_string: ComarcalHistoryCollection records are held by ARCHIVE (A) type
custodians
specificity_annotation:
range: SpecificityAnnotation
@ -91,7 +91,7 @@ classes:
range: TemplateSpecificityScores
inlined: true
rico_record_set_type:
equals_string: LocalHistoryCollection
equals_string: ComarcalHistoryCollection
rico_organizational_principle:
equals_string: collection
rico_organizational_principle_uri:

View file

@ -368,10 +368,10 @@ classes:
- value: Corporate events, Weddings, Conference space
description: Company museum activities
custodian_types:
equals_expression: '["C"]'
equals_expression: '["hc:CommercialOrganizationType"]'
custodian_types_rationale:
equals_string: CommercialOrganizationType represents for-profit commercial
heritage custodians (corporate archives, company museums) - maps to CORPORATION
heritage custodians (corporate archives, company museums) - maps to CommercialOrganizationType
type (C)
specificity_annotation:
range: SpecificityAnnotation

View file

@ -96,7 +96,7 @@ classes:
- rico:RecordSetType
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CommunityArchiveRecordSetType classifies collections held by
ARCHIVE (A) type custodians

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CommunityOrganizationFonds records are held by ARCHIVE (A)
type custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: OralHistoryCollection records are held by ARCHIVE (A) type
custodians
@ -123,7 +123,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: LocalEventDocumentation records are held by ARCHIVE (A) type
custodians

View file

@ -51,7 +51,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "C"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType"]'
custodian_types_rationale:
equals_string: Company archive record set types are held by ARCHIVE (A) or
CORPORATION (C) type custodians
@ -114,12 +114,12 @@ classes:
- wd:Q1643722
close_mappings:
- skos:Concept
- CouncilGovernanceFonds
see_also:
- CompanyArchiveRecordSetType
- rico:RecordSetType
- rico-rst:Fonds
- CompanyArchives
- CouncilGovernanceFonds
slots:
- custodian_types
- custodian_types_rationale
@ -127,7 +127,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "C"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType"]'
custodian_types_rationale:
equals_string: Corporate governance fonds are held by ARCHIVE (A) or CORPORATION
(C) type custodians
@ -223,7 +223,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "C", "R"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType", "hc:ResearchOrganizationType"]'
custodian_types_rationale:
equals_string: Product development collections may be held by ARCHIVE (A),
CORPORATION (C), or RESEARCH_CENTER (R)
@ -318,7 +318,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "C", "M"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType", "hc:MuseumType"]'
custodian_types_rationale:
equals_string: Marketing archives may be held by ARCHIVE (A), CORPORATION
(C), or MUSEUM (M) for design/advertising collections
@ -397,11 +397,11 @@ classes:
- wd:Q185583
close_mappings:
- skos:Concept
- StudentRecordSeries
see_also:
- CompanyArchiveRecordSetType
- rico:RecordSetType
- rico-rst:Series
- StudentRecordSeries
slots:
- custodian_types
- custodian_types_rationale
@ -409,7 +409,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "C"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType"]'
custodian_types_rationale:
equals_string: Personnel records series are held by ARCHIVE (A) or CORPORATION
(C) type custodians
@ -504,7 +504,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A", "C", "L"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType", "hc:LibraryType"]'
custodian_types_rationale:
equals_string: Corporate publications may be held by ARCHIVE (A), CORPORATION
(C), or LIBRARY (L)

View file

@ -15,8 +15,12 @@ imports:
- ./Department
- ./OrganizationBranch
- ./CompanyArchiveRecordSetTypes
- ../slots/archive_branches
- ../slots/archive_department_of
- ../slots/holds_record_set_types
- ../slots/parent_corporation
- ../slots/type_scope
- ../slots/wikidata_entity
classes:
CompanyArchives:

View file

@ -414,7 +414,7 @@ classes:
- value: Treatment coincided with preparation for 1995 exhibition
- value: Discovery of Vermeer's signature during cleaning
custodian_types:
equals_expression: '["G", "M", "A", "L", "R", "H", "B"]'
equals_expression: '["hc:GalleryType", "hc:MuseumType", "hc:ArchiveOrganizationType", "hc:LibraryType", "hc:ResearchOrganizationType", "hc:HolySacredSiteType", "hc:BioCustodianType"]'
custodian_types_rationale:
equals_string: |
ConservationRecord is relevant to all custodian types managing physical collections:

View file

@ -81,9 +81,9 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CountyRecordOffice is an archival institution - maps to ARCHIVE
equals_string: CountyRecordOffice is an archival institution - maps to ArchiveOrganizationType
(A)
specificity_annotation:
range: SpecificityAnnotation

View file

@ -68,9 +68,9 @@ classes:
- commercial
description: General court archive covering main jurisdictions
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CourtRecords is an archival institution - maps to ARCHIVE (A)
equals_string: CourtRecords is an archival institution - maps to ArchiveOrganizationType (A)
specificity_annotation:
range: SpecificityAnnotation
inlined: true

View file

@ -88,7 +88,7 @@ classes:
range: string
multivalued: true
custodian_types:
equals_expression: '["X"]'
equals_expression: '["hc:MixedCustodianType"]'
custodian_types_rationale:
equals_string: CulturalInstitution is a broad type encompassing multiple heritage
categories (G, L, A, M, etc.). Maps to MIXED (X) as it spans categories.

View file

@ -525,7 +525,7 @@ classes:
- value: condition-assessment
description: SPECTRUM Condition Checking
custodian_types:
equals_expression: '["G", "L", "A", "M", "R", "H", "B"]'
equals_expression: '["hc:GalleryType", "hc:LibraryType", "hc:ArchiveOrganizationType", "hc:MuseumType", "hc:ResearchOrganizationType", "hc:HolySacredSiteType", "hc:BioCustodianType"]'
custodian_types_rationale:
equals_string: |
CurationActivity is relevant to ALL heritage custodian types that manage collections:

View file

@ -99,10 +99,10 @@ classes:
multivalued: true
required: false
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: Current Archive is an archive for active/current records -
maps to ARCHIVE (A)
maps to ArchiveOrganizationType (A)
specificity_annotation:
range: SpecificityAnnotation
inlined: true
@ -141,20 +141,7 @@ classes:
creating_organization: Ministry of Finance
retention_schedule: Finance Records Schedule 2023
description: Current archive for ministry records
slots:
creating_organization:
description: Organization creating the records
range: string
transfer_policy:
description: Policy for transferring to permanent archive
range: string
has_narrower_instance:
slot_uri: skos:narrowerTransitive
description: |
Links archive TYPE to specific CustodianArchive INSTANCES.
SKOS narrowerTransitive for type-to-instance relationship.
range: CustodianArchive
multivalued: true
CurrentArchiveRecordSetType:
description: |
A rico:RecordSetType for classifying collections held by CurrentArchive custodians.
@ -170,23 +157,29 @@ slots:
- CurrentArchive
- rico:RecordSetType
annotations:
custodian_types: '["A"]'
custodian_types: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale: CurrentArchiveRecordSetType classifies collections
held by ARCHIVE (A) type custodians
held by ArchiveOrganizationType custodians
linked_custodian_type: CurrentArchive
dual_class_pattern: collection_type
specificity_score: 0.7
specificity_rationale: Type taxonomy class.
specificity_annotation_timestamp: '2026-01-06T00:26:29.675099Z'
specificity_annotation_agent: opencode-claude-sonnet-4
template_specificity:
archive_search: 0.2
museum_search: 0.75
library_search: 0.75
collection_discovery: 0.75
person_research: 0.75
location_browse: 0.75
identifier_lookup: 0.75
organizational_change: 0.75
digital_platform: 0.75
general_heritage: 0.75
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
slots:
creating_organization:
description: Organization creating the records
range: string
transfer_policy:
description: Policy for transferring to permanent archive
range: string
has_narrower_instance:
slot_uri: skos:narrowerTransitive
description: |
Links archive TYPE to specific CustodianArchive INSTANCES.
SKOS narrowerTransitive for type-to-instance relationship.
range: CustodianArchive
multivalued: true

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: ActiveRecordsFonds records are held by ARCHIVE (A) type custodians
specificity_annotation:

View file

@ -560,7 +560,7 @@ classes:
- value: wikidata:Q3621648
description: Current archive / active records
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: Custodian archive is an archive type (A).
specificity_annotation:
@ -619,6 +619,33 @@ classes:
schedule.
refers_to_custodian: https://nde.nl/ontology/hc/nl-na
description: Government records in active processing (9 years after accession)
CustodianArchiveRecordSetType:
description: |
A rico:RecordSetType for classifying collections held by CustodianArchive custodians.
**Dual-Class Pattern**:
This class represents the COLLECTION type (rico:RecordSetType).
For the custodian organization type, see `CustodianArchive`.
is_a: CollectionType
class_uri: rico:RecordSetType
slots:
- type_scope
see_also:
- CustodianArchive
- rico:RecordSetType
annotations:
custodian_types: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale: CustodianArchiveRecordSetType classifies collections
held by ArchiveOrganizationType custodians
linked_custodian_type: CustodianArchive
dual_class_pattern: collection_type
specificity_score: 0.7
specificity_rationale: Type taxonomy class.
specificity_annotation_timestamp: '2026-01-06T00:26:29.676176Z'
specificity_annotation_agent: opencode-claude-sonnet-4
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
slots:
archive_name:
description: Name/title for operational archive accession
@ -677,38 +704,3 @@ slots:
\ for instance-to-type relationship.\nValues: CurrentArchive (Q3621648), DepositArchive\
\ (Q244904), \nHistoricalArchive (Q3621673).\n"
range: uriorcurie
CustodianArchiveRecordSetType:
description: |
A rico:RecordSetType for classifying collections held by CustodianArchive custodians.
**Dual-Class Pattern**:
This class represents the COLLECTION type (rico:RecordSetType).
For the custodian organization type, see `CustodianArchive`.
is_a: CollectionType
class_uri: rico:RecordSetType
slots:
- type_scope
see_also:
- CustodianArchive
- rico:RecordSetType
annotations:
custodian_types: '["A"]'
custodian_types_rationale: CustodianArchiveRecordSetType classifies collections
held by ARCHIVE (A) type custodians
linked_custodian_type: CustodianArchive
dual_class_pattern: collection_type
specificity_score: 0.7
specificity_rationale: Type taxonomy class.
specificity_annotation_timestamp: '2026-01-06T00:26:29.676176Z'
specificity_annotation_agent: opencode-claude-sonnet-4
template_specificity:
archive_search: 0.2
museum_search: 0.75
library_search: 0.75
collection_discovery: 0.75
person_research: 0.75
location_browse: 0.75
identifier_lookup: 0.75
organizational_change: 0.75
digital_platform: 0.75
general_heritage: 0.75

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: CustodialRecordsFonds records are held by ARCHIVE (A) type
custodians

View file

@ -112,8 +112,7 @@ classes:
description: Confidence in observation accuracy
range: ConfidenceMeasure
custodian_types:
equals_expression: '["G", "L", "A", "M", "O", "R", "C", "U", "B", "E", "S",
"F", "I", "X", "P", "H", "D", "N", "T"]'
equals_expression: '["hc:GalleryType", "hc:LibraryType", "hc:ArchiveOrganizationType", "hc:MuseumType", "hc:OfficialInstitutionType", "hc:ResearchOrganizationType", "hc:CommercialOrganizationType", "hc:UnspecifiedType", "hc:BioCustodianType", "hc:EducationProviderType", "hc:HeritageSocietyType", "hc:FeatureCustodianType", "hc:IntangibleHeritageGroupType", "hc:MixedCustodianType", "hc:PersonalCollectionType", "hc:HolySacredSiteType", "hc:DigitalPlatformType", "hc:NonProfitType", "hc:TasteScentHeritageType"]'
custodian_types_rationale:
equals_string: CustodianObservation is universal - source-based evidence can
apply to any heritage custodian type

View file

@ -8,11 +8,10 @@ imports:
- ../slots/created
- ../slots/modified
- ../slots/class_metadata_slots
- ../slots/wikidata_entity
slots:
type_id:
range: uriorcurie
wikidata_entity:
range: string
type_label:
range: string
type_description:

View file

@ -104,7 +104,7 @@ classes:
range: AccessPolicy
required: true
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: DarkArchive is a type of archive with restricted access - maps
to ARCHIVE type (A)
@ -168,17 +168,6 @@ classes:
access_level: CLOSED
restriction_reason: Donor restriction - 50 year embargo
description: Embargoed materials dark archive
slots:
access_trigger_events:
description: Events that trigger access
range: string
multivalued: true
preservation_purpose:
description: Purpose for dark archive
range: string
refers_to_access_policy:
description: Link to access policy
range: AccessPolicy
DarkArchiveRecordSetType:
description: |
A rico:RecordSetType for classifying collections held by DarkArchive custodians.
@ -194,7 +183,7 @@ slots:
- DarkArchive
- rico:RecordSetType
annotations:
custodian_types: '["A"]'
custodian_types: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale: DarkArchiveRecordSetType classifies collections held
by ARCHIVE (A) type custodians
linked_custodian_type: DarkArchive
@ -203,14 +192,17 @@ slots:
specificity_rationale: Type taxonomy class.
specificity_annotation_timestamp: '2026-01-06T00:26:29.676643Z'
specificity_annotation_agent: opencode-claude-sonnet-4
template_specificity:
archive_search: 0.2
museum_search: 0.75
library_search: 0.75
collection_discovery: 0.75
person_research: 0.75
location_browse: 0.75
identifier_lookup: 0.75
organizational_change: 0.75
digital_platform: 0.75
general_heritage: 0.75
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
slots:
access_trigger_events:
description: Events that trigger access
range: string
multivalued: true
preservation_purpose:
description: Purpose for dark archive
range: string
refers_to_access_policy:
description: Link to access policy
range: AccessPolicy

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: PreservationCopyCollection records are held by ARCHIVE (A)
type custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: DigitalPreservationFonds records are held by ARCHIVE (A) type
custodians

View file

@ -139,9 +139,9 @@ classes:
minimum_cardinality: 1
maximum_cardinality: 1
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: DepartmentalArchives is an archival institution - maps to ARCHIVE
equals_string: DepartmentalArchives is an archival institution - maps to ArchiveOrganizationType
(A)
specificity_annotation:
range: SpecificityAnnotation
@ -206,9 +206,9 @@ classes:
wikidata_equivalent:
equals_string: Q2860456
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: DepartmentalArchives is an archival institution - maps to ARCHIVE
equals_string: DepartmentalArchives is an archival institution - maps to ArchiveOrganizationType
(A)
wikidata_alignment:
range: WikidataAlignment

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: DepartmentAdministrationFonds records are held by ARCHIVE (A)
type custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: PrefectureSeries records are held by ARCHIVE (A) type custodians
specificity_annotation:

View file

@ -117,9 +117,9 @@ classes:
- permanent archive transfer
- depositor return
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: DepositArchive is an archival institution - maps to ARCHIVE
equals_string: DepositArchive is an archival institution - maps to ArchiveOrganizationType
(A)
specificity_annotation:
range: SpecificityAnnotation
@ -172,6 +172,33 @@ classes:
- secure destruction
- transfer to national archives
description: Federal records center deposit archive
DepositArchiveRecordSetType:
description: |
A rico:RecordSetType for classifying collections held by DepositArchive custodians.
**Dual-Class Pattern**:
This class represents the COLLECTION type (rico:RecordSetType).
For the custodian organization type, see `DepositArchive`.
is_a: CollectionType
class_uri: rico:RecordSetType
slots:
- type_scope
see_also:
- DepositArchive
- rico:RecordSetType
annotations:
custodian_types: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale: DepositArchiveRecordSetType classifies collections
held by ArchiveOrganizationType custodians
linked_custodian_type: DepositArchive
dual_class_pattern: collection_type
specificity_score: 0.7
specificity_rationale: Type taxonomy class.
specificity_annotation_timestamp: '2026-01-06T00:26:29.677478Z'
specificity_annotation_agent: opencode-claude-sonnet-4
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
slots:
operates_storage_types:
description: Storage types operated by deposit archive
@ -188,38 +215,3 @@ slots:
description: Disposition services provided
range: string
multivalued: true
DepositArchiveRecordSetType:
description: |
A rico:RecordSetType for classifying collections held by DepositArchive custodians.
**Dual-Class Pattern**:
This class represents the COLLECTION type (rico:RecordSetType).
For the custodian organization type, see `DepositArchive`.
is_a: CollectionType
class_uri: rico:RecordSetType
slots:
- type_scope
see_also:
- DepositArchive
- rico:RecordSetType
annotations:
custodian_types: '["A"]'
custodian_types_rationale: DepositArchiveRecordSetType classifies collections
held by ARCHIVE (A) type custodians
linked_custodian_type: DepositArchive
dual_class_pattern: collection_type
specificity_score: 0.7
specificity_rationale: Type taxonomy class.
specificity_annotation_timestamp: '2026-01-06T00:26:29.677478Z'
specificity_annotation_agent: opencode-claude-sonnet-4
template_specificity:
archive_search: 0.2
museum_search: 0.75
library_search: 0.75
collection_discovery: 0.75
person_research: 0.75
location_browse: 0.75
identifier_lookup: 0.75
organizational_change: 0.75
digital_platform: 0.75
general_heritage: 0.75

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: DepositedRecordsFonds records are held by ARCHIVE (A) type
custodians

View file

@ -167,10 +167,10 @@ classes:
- JPEG2000
- XML
custodian_types:
equals_expression: '["A", "D"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:DigitalPlatformType"]'
custodian_types_rationale:
equals_string: DigitalArchive bridges archive and digital platform types -
maps to ARCHIVE (A) and DIGITAL_PLATFORM (D)
maps to ArchiveOrganizationType (A) and DIGITAL_PLATFORM (D)
specificity_annotation:
range: SpecificityAnnotation
inlined: true
@ -220,21 +220,6 @@ classes:
- JPEG2000
- WARC
description: Government digital archive with mixed content
slots:
operates_platform_types:
description: Digital platform types operated
range: DigitalPlatformType
multivalued: true
content_origin:
description: Origin of content (born_digital, digitized, mixed)
range: string
access_interface_url:
description: URL of access interface
range: uri
supported_formats:
description: Supported file formats
range: string
multivalued: true
DigitalArchiveRecordSetType:
description: |
A rico:RecordSetType for classifying collections held by DigitalArchive custodians.
@ -250,23 +235,30 @@ slots:
- DigitalArchive
- rico:RecordSetType
annotations:
custodian_types: '["A"]'
custodian_types: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale: DigitalArchiveRecordSetType classifies collections
held by ARCHIVE (A) type custodians
held by ArchiveOrganizationType custodians
linked_custodian_type: DigitalArchive
dual_class_pattern: collection_type
specificity_score: 0.7
specificity_rationale: Type taxonomy class.
specificity_annotation_timestamp: '2026-01-06T00:26:29.677967Z'
specificity_annotation_agent: opencode-claude-sonnet-4
template_specificity:
archive_search: 0.2
museum_search: 0.75
library_search: 0.75
collection_discovery: 0.75
person_research: 0.75
location_browse: 0.75
identifier_lookup: 0.75
organizational_change: 0.75
digital_platform: 0.75
general_heritage: 0.75
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
slots:
operates_platform_types:
description: Digital platform types operated
range: DigitalPlatformType
multivalued: true
content_origin:
description: Origin of content (born_digital, digitized, mixed)
range: string
access_interface_url:
description: URL of access interface
range: uri
supported_formats:
description: Supported file formats
range: string
multivalued: true

View file

@ -37,7 +37,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: DigitalObjectCollection records are held by ARCHIVE (A) type
custodians
@ -80,7 +80,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: DigitizedCollection records are held by ARCHIVE (A) type custodians
specificity_annotation:
@ -122,7 +122,7 @@ classes:
- template_specificity
slot_usage:
custodian_types:
equals_expression: '["A"]'
equals_expression: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale:
equals_string: WebArchiveCollection records are held by ARCHIVE (A) type custodians
specificity_annotation:

View file

@ -158,7 +158,7 @@ classes:
examples:
- value: 2-3 business days
custodian_types:
equals_expression: '["A", "D"]'
equals_expression: '["hc:ArchiveOrganizationType", "hc:DigitalPlatformType"]'
custodian_types_rationale:
equals_string: Digital archives combine archive (A) and digital platform (D).
specificity_annotation:
@ -209,20 +209,6 @@ classes:
access_application_url: https://archive.example.org/apply
typical_approval_time: 5-10 business days
description: Dim archive with researcher access only
slots:
default_access_policy:
description: Default access policy for dim archive
range: AccessPolicy
restriction_categories:
description: Categories of restrictions applied
range: string
multivalued: true
access_application_url:
description: URL for access application
range: uri
typical_approval_time:
description: Typical time for approval
range: string
DimArchivesRecordSetType:
description: |
A rico:RecordSetType for classifying collections held by DimArchives custodians.
@ -238,7 +224,7 @@ slots:
- DimArchives
- rico:RecordSetType
annotations:
custodian_types: '["A"]'
custodian_types: '["hc:ArchiveOrganizationType"]'
custodian_types_rationale: DimArchivesRecordSetType classifies collections held
by ARCHIVE (A) type custodians
linked_custodian_type: DimArchives
@ -247,14 +233,20 @@ slots:
specificity_rationale: Type taxonomy class.
specificity_annotation_timestamp: '2026-01-06T00:26:29.678263Z'
specificity_annotation_agent: opencode-claude-sonnet-4
template_specificity:
archive_search: 0.2
museum_search: 0.75
library_search: 0.75
collection_discovery: 0.75
person_research: 0.75
location_browse: 0.75
identifier_lookup: 0.75
organizational_change: 0.75
digital_platform: 0.75
general_heritage: 0.75
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
slots:
default_access_policy:
description: Default access policy for dim archive
range: AccessPolicy
restriction_categories:
description: Categories of restrictions applied
range: string
multivalued: true
access_application_url:
description: URL for access application
range: uri
typical_approval_time:
description: Typical time for approval
range: string

Some files were not shown because too many files have changed in this diff Show more