Fix LinkML URI conflicts and generate RDF outputs
- Fix scope_note → finding_aid_scope_note in FindingAid.yaml - Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead) - Remove duplicate rico_record_set_type from class_metadata_slots.yaml - Fix range types for equals_string compatibility (uriorcurie → string) - Move class names from close_mappings to see_also in 10 RecordSetTypes files - Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context - Sync schemas to frontend/public/schemas/ Files: 1,151 changed (includes prior CustodianType migration)
This commit is contained in:
parent
6c6810fa43
commit
98c42bf272
1132 changed files with 47267 additions and 18837 deletions
143
.opencode/RICO_RECORDSETTYPE_ALIGNMENT.md
Normal file
143
.opencode/RICO_RECORDSETTYPE_ALIGNMENT.md
Normal file
|
|
@ -0,0 +1,143 @@
|
|||
# RiC-O RecordSetType Alignment Rules
|
||||
|
||||
## RiC-O 1.1 Structure (Actual)
|
||||
|
||||
### rico:RecordSetType CLASS
|
||||
|
||||
**Location**: RiC-O_1-1.rdf lines 29199-29252
|
||||
|
||||
```turtle
|
||||
rico:RecordSetType a owl:Class ;
|
||||
rdfs:subClassOf rico:Type ;
|
||||
rdfs:comment "A broad categorization of the type of Record Set."@en .
|
||||
```
|
||||
|
||||
This is a **class** meant to be instantiated with specific record set type concepts.
|
||||
|
||||
### RiC-O Provided Instances (Named Individuals)
|
||||
|
||||
RiC-O 1.1 provides **four named individuals** in the `recordSetTypes` vocabulary:
|
||||
|
||||
| Individual | URI | Description |
|
||||
|------------|-----|-------------|
|
||||
| **Fonds** | `rico-rst:Fonds` | Organic whole of records from one creator |
|
||||
| **Series** | `rico-rst:Series` | Documents arranged by filing system |
|
||||
| **File** | `rico-rst:File` | Unit of documents grouped together |
|
||||
| **Collection** | `rico-rst:Collection` | Artificial assemblage without provenance |
|
||||
|
||||
**Key**: These are **instances** of BOTH `rico:RecordSetType` AND `skos:Concept`:
|
||||
|
||||
```turtle
|
||||
rico-rst:Fonds a rico:RecordSetType, skos:Concept ;
|
||||
skos:inScheme rico-rst: ;
|
||||
skos:topConceptOf rico-rst: ;
|
||||
skos:definition "The whole of the records... organically created..."@en .
|
||||
```
|
||||
|
||||
### Full URIs
|
||||
|
||||
- **Namespace**: `https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#`
|
||||
- **Prefix**: `rico-rst:` (commonly used)
|
||||
- `rico-rst:Fonds` = `https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Fonds`
|
||||
|
||||
## Our Approach: Classes as RecordSetType Subclasses
|
||||
|
||||
We create **LinkML classes** with `class_uri: rico:RecordSetType`. These classes are themselves record set type definitions that can be instantiated.
|
||||
|
||||
### Why Classes Instead of Instances?
|
||||
|
||||
1. **Extensibility**: Classes allow slots and inheritance patterns
|
||||
2. **LinkML idiom**: LinkML works with class definitions
|
||||
3. **Validation**: Classes enable property constraints
|
||||
4. **Documentation**: Rich documentation in class definitions
|
||||
|
||||
### Correct Mapping Predicates
|
||||
|
||||
Since `rico-rst:Fonds`, `rico-rst:Series`, `rico-rst:Collection`, `rico-rst:File` are **individuals** (not classes), we cannot use them in `broad_mappings` (which implies class hierarchy).
|
||||
|
||||
**Instead, use**:
|
||||
|
||||
| Predicate | Use For |
|
||||
|-----------|---------|
|
||||
| `related_mappings` | Conceptual relationship to RiC-O individuals |
|
||||
| `see_also` | Reference to related RiC-O concepts |
|
||||
| Custom annotation | `rico_organizational_principle` with value `fonds`, `series`, `collection`, `file` |
|
||||
|
||||
### Correct Pattern
|
||||
|
||||
```yaml
|
||||
UniversityAdministrativeFonds:
|
||||
is_a: AcademicArchiveRecordSetType
|
||||
class_uri: rico:RecordSetType
|
||||
description: |
|
||||
A rico:RecordSetType for university administrative records organized as a fonds.
|
||||
|
||||
**RiC-O Alignment**:
|
||||
This class is a specialized rico:RecordSetType. Records classified with this
|
||||
type follow the fonds organizational principle as defined by rico-rst:Fonds
|
||||
(respect des fonds / provenance-based organization).
|
||||
|
||||
# CORRECT: Use related_mappings for conceptual relationship to individual
|
||||
related_mappings:
|
||||
- https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Fonds
|
||||
|
||||
# CORRECT: Use see_also for reference
|
||||
see_also:
|
||||
- rico:RecordSetType
|
||||
- https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Fonds
|
||||
|
||||
annotations:
|
||||
# CORRECT: Document organizational principle as annotation
|
||||
rico_organizational_principle: fonds
|
||||
rico_organizational_principle_uri: https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Fonds
|
||||
rico_note: >-
|
||||
This RecordSetType classifies record sets following the fonds principle.
|
||||
The rico-rst:Fonds individual defines the standard archival concept of fonds.
|
||||
```
|
||||
|
||||
### INCORRECT Pattern (Do Not Use)
|
||||
|
||||
```yaml
|
||||
# WRONG - rico:Fonds is NOT a class, cannot use in broad_mappings
|
||||
broad_mappings:
|
||||
- rico:Fonds # ❌ This individual, not a class!
|
||||
|
||||
# WRONG - Using shorthand that doesn't resolve
|
||||
broad_mappings:
|
||||
- rico-rst:Fonds # ❌ Prefix not defined in LinkML
|
||||
```
|
||||
|
||||
## Prefixes to Include
|
||||
|
||||
When referencing the RiC-O recordSetTypes vocabulary:
|
||||
|
||||
```yaml
|
||||
prefixes:
|
||||
rico: https://www.ica.org/standards/RiC/ontology#
|
||||
rico-rst: https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
| RiC-O Concept | Type | Use In |
|
||||
|---------------|------|--------|
|
||||
| `rico:RecordSetType` | CLASS | `class_uri`, `exact_mappings` |
|
||||
| `rico-rst:Fonds` | INDIVIDUAL | `related_mappings`, `see_also`, annotation |
|
||||
| `rico-rst:Series` | INDIVIDUAL | `related_mappings`, `see_also`, annotation |
|
||||
| `rico-rst:Collection` | INDIVIDUAL | `related_mappings`, `see_also`, annotation |
|
||||
| `rico-rst:File` | INDIVIDUAL | `related_mappings`, `see_also`, annotation |
|
||||
|
||||
## Files to Update
|
||||
|
||||
All `*RecordSetTypes.yaml` files need correction:
|
||||
- `AcademicArchiveRecordSetTypes.yaml`
|
||||
- `MunicipalArchiveRecordSetTypes.yaml`
|
||||
- `ChurchArchiveRecordSetTypes.yaml`
|
||||
- `CompanyArchiveRecordSetTypes.yaml`
|
||||
- `RegionalArchiveRecordSetTypes.yaml`
|
||||
- (and any future files)
|
||||
|
||||
---
|
||||
|
||||
**Created**: 2026-01-05
|
||||
**Agent**: opencode-claude-sonnet-4
|
||||
317
.opencode/rules/slot-centralization-and-semantic-uri-rule.md
Normal file
317
.opencode/rules/slot-centralization-and-semantic-uri-rule.md
Normal file
|
|
@ -0,0 +1,317 @@
|
|||
# Rule 38: Slot Centralization and Semantic URI Requirements
|
||||
|
||||
🚨 **CRITICAL**: All LinkML slots MUST be centralized in `schemas/20251121/linkml/modules/slots/` and MUST have semantically sound `slot_uri` predicates from base ontologies.
|
||||
|
||||
---
|
||||
|
||||
## 1. Slot Centralization is Mandatory
|
||||
|
||||
**Location**: All slot definitions MUST be in `schemas/20251121/linkml/modules/slots/`
|
||||
|
||||
**File Naming**: `{slot_name}.yaml` (snake_case)
|
||||
|
||||
**Import Pattern**: Classes import slots via relative imports:
|
||||
```yaml
|
||||
# In modules/classes/Collection.yaml
|
||||
imports:
|
||||
- ../slots/collection_name
|
||||
- ../slots/collection_type_ref
|
||||
- ../slots/parent_collection
|
||||
```
|
||||
|
||||
### Why Centralization?
|
||||
|
||||
1. **UML Visualization**: The frontend's schema service loads slots from `modules/slots/` to determine aggregation edges. Inline slots in class files are NOT properly parsed for visualization.
|
||||
|
||||
2. **Reusability**: Slots can be used by multiple classes without duplication.
|
||||
|
||||
3. **Semantic Consistency**: Single source of truth for slot semantics prevents drift.
|
||||
|
||||
4. **Maintainability**: Changes to slot semantics propagate automatically to all classes.
|
||||
|
||||
### Anti-Pattern: Inline Slot Definitions
|
||||
|
||||
```yaml
|
||||
# ❌ WRONG - Slots defined inline in class file
|
||||
classes:
|
||||
Collection:
|
||||
slots:
|
||||
- collection_name
|
||||
- parent_collection
|
||||
|
||||
slots: # ← This section in a class file is WRONG
|
||||
collection_name:
|
||||
range: string
|
||||
```
|
||||
|
||||
```yaml
|
||||
# ✅ CORRECT - Slots imported from centralized files
|
||||
# In modules/classes/Collection.yaml
|
||||
imports:
|
||||
- ../slots/collection_name
|
||||
- ../slots/parent_collection
|
||||
|
||||
classes:
|
||||
Collection:
|
||||
slots:
|
||||
- collection_name
|
||||
- parent_collection
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Every Slot MUST Have `slot_uri`
|
||||
|
||||
**`slot_uri`** provides the semantic meaning of the slot in a linked data context. It maps your slot to a predicate from an established ontology.
|
||||
|
||||
### Required Slot File Structure
|
||||
|
||||
```yaml
|
||||
# Global slot definition for {slot_name}
|
||||
# Used by: {list of classes}
|
||||
|
||||
id: https://nde.nl/ontology/hc/slot/{slot_name}
|
||||
name: {slot_name}
|
||||
|
||||
prefixes:
|
||||
linkml: https://w3id.org/linkml/
|
||||
hc: https://nde.nl/ontology/hc/
|
||||
# Add ontology prefixes as needed
|
||||
rico: https://www.ica.org/standards/RiC/ontology#
|
||||
schema: http://schema.org/
|
||||
skos: http://www.w3.org/2004/02/skos/core#
|
||||
|
||||
slots:
|
||||
{slot_name}:
|
||||
slot_uri: {ontology_prefix}:{predicate} # ← REQUIRED
|
||||
description: |
|
||||
Description of the slot's semantic meaning.
|
||||
|
||||
{OntologyName}: {predicate} - "{definition from ontology}"
|
||||
range: {ClassName or primitive}
|
||||
required: true/false
|
||||
multivalued: true/false
|
||||
# Optional mappings for additional semantic relationships
|
||||
exact_mappings:
|
||||
- schema:alternatePredicate
|
||||
close_mappings:
|
||||
- dct:relatedPredicate
|
||||
examples:
|
||||
- value: {example}
|
||||
description: {explanation}
|
||||
```
|
||||
|
||||
### Ontology Sources for `slot_uri`
|
||||
|
||||
Consult these base ontology files in `/data/ontology/`:
|
||||
|
||||
| Ontology | File | Namespace | Use Cases |
|
||||
|----------|------|-----------|-----------|
|
||||
| **RiC-O** | `RiC-O_1-1.rdf` | `rico:` | Archival records, record sets, custody |
|
||||
| **CIDOC-CRM** | `CIDOC_CRM_v7.1.3.rdf` | `crm:` | Cultural heritage objects, events |
|
||||
| **Schema.org** | `schemaorg.owl` | `schema:` | Web semantics, general properties |
|
||||
| **SKOS** | `skos.rdf` | `skos:` | Labels, concepts, mappings |
|
||||
| **Dublin Core** | `dublin_core_elements.rdf` | `dcterms:` | Metadata properties |
|
||||
| **PROV-O** | `prov-o.ttl` | `prov:` | Provenance tracking |
|
||||
| **PAV** | `pav.rdf` | `pav:` | Provenance, authoring, versioning |
|
||||
| **TOOI** | `tooiont.ttl` | `tooi:` | Dutch government organizations |
|
||||
| **CPOV** | `core-public-organisation-ap.ttl` | `cpov:` | EU public sector |
|
||||
| **ORG** | `org.rdf` | `org:` | Organizations, units, roles |
|
||||
| **FOAF** | `foaf.ttl` | `foaf:` | People, agents, social network |
|
||||
| **GLEIF** | `gleif_base.ttl` | `gleif_base:` | Legal entities |
|
||||
|
||||
### Example: Correct Slot with `slot_uri`
|
||||
|
||||
```yaml
|
||||
# modules/slots/preferred_label.yaml
|
||||
id: https://nde.nl/ontology/hc/slot/preferred_label
|
||||
name: preferred_label_slot
|
||||
|
||||
prefixes:
|
||||
linkml: https://w3id.org/linkml/
|
||||
hc: https://nde.nl/ontology/hc/
|
||||
skos: http://www.w3.org/2004/02/skos/core#
|
||||
schema: http://schema.org/
|
||||
rdfs: http://www.w3.org/2000/01/rdf-schema#
|
||||
|
||||
slots:
|
||||
preferred_label:
|
||||
slot_uri: skos:prefLabel # ← REQUIRED
|
||||
description: |
|
||||
The primary display name for this entity.
|
||||
|
||||
SKOS: prefLabel - "A preferred lexical label for a resource."
|
||||
|
||||
This is the CANONICAL name - the standardized label accepted by the
|
||||
entity itself for public representation.
|
||||
range: string
|
||||
required: false
|
||||
exact_mappings:
|
||||
- schema:name
|
||||
- rdfs:label
|
||||
examples:
|
||||
- value: "Rijksmuseum"
|
||||
description: Primary display name for the Rijksmuseum
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Mappings Can Apply to Both Classes AND Slots
|
||||
|
||||
LinkML provides SKOS-based mapping predicates that work on **both classes and slots**:
|
||||
|
||||
| Mapping Type | Predicate | Use Case |
|
||||
|--------------|-----------|----------|
|
||||
| `exact_mappings` | `skos:exactMatch` | Identical meaning |
|
||||
| `close_mappings` | `skos:closeMatch` | Very similar meaning |
|
||||
| `related_mappings` | `skos:relatedMatch` | Semantically related |
|
||||
| `narrow_mappings` | `skos:narrowMatch` | More specific |
|
||||
| `broad_mappings` | `skos:broadMatch` | More general |
|
||||
|
||||
### When to Use Mappings vs. slot_uri
|
||||
|
||||
| Scenario | Use |
|
||||
|----------|-----|
|
||||
| **Primary semantic identity** | `slot_uri` (exactly one) |
|
||||
| **Equivalent predicates in other ontologies** | `exact_mappings` (multiple allowed) |
|
||||
| **Similar but not identical predicates** | `close_mappings` |
|
||||
| **Related predicates with different scope** | `narrow_mappings` / `broad_mappings` |
|
||||
|
||||
### Example: Slot with Multiple Mappings
|
||||
|
||||
```yaml
|
||||
slots:
|
||||
website:
|
||||
slot_uri: gleif_base:hasWebsite # Primary predicate
|
||||
range: uri
|
||||
description: |
|
||||
Official website URL of the organization or entity.
|
||||
|
||||
gleif_base:hasWebsite - "A website associated with something"
|
||||
exact_mappings:
|
||||
- schema:url # Identical meaning in Schema.org
|
||||
close_mappings:
|
||||
- foaf:homepage # Similar but specifically "main" page
|
||||
```
|
||||
|
||||
### Example: Class with Multiple Mappings
|
||||
|
||||
```yaml
|
||||
classes:
|
||||
Collection:
|
||||
class_uri: rico:RecordSet # Primary class
|
||||
exact_mappings:
|
||||
- crm:E78_Curated_Holding # CIDOC-CRM equivalent
|
||||
close_mappings:
|
||||
- bf:Collection # BIBFRAME close match
|
||||
narrow_mappings:
|
||||
- edm:ProvidedCHO # Europeana (narrower - cultural heritage objects)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Workflow for Creating a New Slot
|
||||
|
||||
### Step 1: Search Base Ontologies
|
||||
|
||||
Before creating a slot, search for existing predicates:
|
||||
|
||||
```bash
|
||||
# Search for relevant predicates
|
||||
rg "website|homepage|url" /data/ontology/*.ttl /data/ontology/*.rdf /data/ontology/*.owl
|
||||
|
||||
# Check specific ontology
|
||||
rg "rdfs:label|rdfs:comment" /data/ontology/schemaorg.owl | grep -i "name"
|
||||
```
|
||||
|
||||
### Step 2: Document Ontology Alignment
|
||||
|
||||
In the slot file, document WHY you chose that predicate:
|
||||
|
||||
```yaml
|
||||
slots:
|
||||
source_url:
|
||||
slot_uri: pav:retrievedFrom
|
||||
description: |
|
||||
URL of the web page from which data was retrieved.
|
||||
|
||||
pav:retrievedFrom - "The URI from which the resource was retrieved."
|
||||
|
||||
Chosen over:
|
||||
- schema:url (too generic - refers to the entity's URL, not source)
|
||||
- dct:source (refers to intellectual source, not retrieval location)
|
||||
- prov:wasDerivedFrom (refers to entity derivation, not retrieval)
|
||||
```
|
||||
|
||||
### Step 3: Create Centralized Slot File
|
||||
|
||||
```bash
|
||||
# Create new slot file
|
||||
touch schemas/20251121/linkml/modules/slots/new_slot_name.yaml
|
||||
```
|
||||
|
||||
### Step 4: Update Manifest
|
||||
|
||||
Run the manifest regeneration script or manually add to manifest:
|
||||
|
||||
```bash
|
||||
cd schemas/20251121/linkml
|
||||
python3 scripts/regenerate_manifest.py
|
||||
```
|
||||
|
||||
### Step 5: Import in Class Files
|
||||
|
||||
Add the import to classes that use this slot.
|
||||
|
||||
---
|
||||
|
||||
## 5. Validation Checklist
|
||||
|
||||
Before committing slot changes:
|
||||
|
||||
- [ ] Slot file is in `modules/slots/`
|
||||
- [ ] Slot has `slot_uri` pointing to an established ontology predicate
|
||||
- [ ] Predicate is from `data/ontology/` files or standard vocabularies
|
||||
- [ ] Description includes ontology definition
|
||||
- [ ] Rationale documented if multiple predicates were considered
|
||||
- [ ] `exact_mappings`/`close_mappings` added for equivalent predicates
|
||||
- [ ] Manifest updated to include new slot file
|
||||
- [ ] Classes using the slot have been updated with import
|
||||
- [ ] Frontend slot files synced: `frontend/public/schemas/20251121/linkml/modules/slots/`
|
||||
|
||||
---
|
||||
|
||||
## 6. Common Slot URI Mappings
|
||||
|
||||
| Slot Concept | Recommended `slot_uri` | Alternative Mappings |
|
||||
|--------------|------------------------|---------------------|
|
||||
| Preferred name | `skos:prefLabel` | `schema:name`, `rdfs:label` |
|
||||
| Alternative names | `skos:altLabel` | `schema:alternateName` |
|
||||
| Description | `dcterms:description` | `schema:description`, `rdfs:comment` |
|
||||
| Identifier | `dcterms:identifier` | `schema:identifier` |
|
||||
| Website URL | `gleif_base:hasWebsite` | `schema:url`, `foaf:homepage` |
|
||||
| Source URL | `pav:retrievedFrom` | `prov:wasDerivedFrom` |
|
||||
| Created date | `dcterms:created` | `schema:dateCreated`, `prov:generatedAtTime` |
|
||||
| Modified date | `dcterms:modified` | `schema:dateModified` |
|
||||
| Language | `schema:inLanguage` | `dcterms:language` |
|
||||
| Part of | `dcterms:isPartOf` | `rico:isOrWasPartOf`, `schema:isPartOf` |
|
||||
| Has part | `dcterms:hasPart` | `rico:hasOrHadPart`, `schema:hasPart` |
|
||||
| Location | `schema:location` | `locn:address`, `crm:P53_has_former_or_current_location` |
|
||||
| Start date | `schema:startDate` | `prov:startedAtTime`, `rico:hasBeginningDate` |
|
||||
| End date | `schema:endDate` | `prov:endedAtTime`, `rico:hasEndDate` |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [LinkML slot_uri documentation](https://linkml.io/linkml-model/latest/docs/slot_uri/)
|
||||
- [LinkML mappings documentation](https://linkml.io/linkml-model/latest/docs/mappings/)
|
||||
- [LinkML URIs and Mappings guide](https://linkml.io/linkml/schemas/uris-and-mappings.html)
|
||||
- Rule 1: Ontology Files Are Your Primary Reference
|
||||
- Rule 0: LinkML Schemas Are the Single Source of Truth
|
||||
|
||||
---
|
||||
|
||||
**Version**: 1.0.0
|
||||
**Created**: 2026-01-06
|
||||
**Author**: OpenCODE
|
||||
38
AGENTS.md
38
AGENTS.md
|
|
@ -23,7 +23,7 @@ This is NOT a simple data extraction project. This is an **ontology engineering
|
|||
|
||||
## 🚨 CRITICAL RULES FOR ALL AGENTS
|
||||
|
||||
This section summarizes 37 critical rules. Each rule has complete documentation in `.opencode/` files.
|
||||
This section summarizes 38 critical rules. Each rule has complete documentation in `.opencode/` files.
|
||||
|
||||
### Rule 0: LinkML Schemas Are the Single Source of Truth
|
||||
|
||||
|
|
@ -734,6 +734,42 @@ classes:
|
|||
|
||||
---
|
||||
|
||||
### Rule 38: Slot Centralization and Semantic URI Requirements
|
||||
|
||||
🚨 **CRITICAL**: All LinkML slots MUST be centralized in `schemas/20251121/linkml/modules/slots/` and MUST have semantically sound `slot_uri` predicates from base ontologies.
|
||||
|
||||
**Key Requirements**:
|
||||
|
||||
1. **Centralization**: All slots MUST be defined in `modules/slots/`, never inline in class files
|
||||
2. **slot_uri**: Every slot MUST have a `slot_uri` from base ontologies (`data/ontology/`)
|
||||
3. **Mappings**: Use `exact_mappings`, `close_mappings`, `related_mappings`, `narrow_mappings`, `broad_mappings` for additional semantic relationships
|
||||
|
||||
**Why This Matters**:
|
||||
- **Frontend UML visualization** depends on centralized slots for edge rendering
|
||||
- **Semantic URIs** enable linked data interoperability and RDF serialization
|
||||
- **Mapping annotations** connect to SKOS-based vocabulary alignment standards
|
||||
|
||||
**Common slot_uri Sources**:
|
||||
|
||||
| Ontology | Prefix | Example Predicates |
|
||||
|----------|--------|-------------------|
|
||||
| SKOS | `skos:` | `prefLabel`, `altLabel`, `definition`, `note` |
|
||||
| Schema.org | `schema:` | `name`, `description`, `url`, `dateCreated` |
|
||||
| Dublin Core | `dcterms:` | `identifier`, `title`, `creator`, `date` |
|
||||
| PROV-O | `prov:` | `wasGeneratedBy`, `wasAttributedTo`, `atTime` |
|
||||
| RiC-O | `rico:` | `hasRecordSetType`, `isOrWasPartOf` |
|
||||
| CIDOC-CRM | `crm:` | `P1_is_identified_by`, `P2_has_type` |
|
||||
|
||||
**Workflow for New Slots**:
|
||||
1. Search `data/ontology/` for existing predicate
|
||||
2. Create file in `modules/slots/` with `slot_uri`
|
||||
3. Add mappings to related predicates in other ontologies
|
||||
4. Update `manifest.json` with new slot file
|
||||
|
||||
**See**: `.opencode/rules/slot-centralization-and-semantic-uri-rule.md` for complete documentation
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Full Rule Content (No .opencode Equivalent)
|
||||
|
||||
The following rules have no separate .opencode file and are preserved in full:
|
||||
|
|
|
|||
2
apps/archief-assistent/node_modules/.tmp/tsconfig.app.tsbuildinfo
generated
vendored
2
apps/archief-assistent/node_modules/.tmp/tsconfig.app.tsbuildinfo
generated
vendored
|
|
@ -1 +1 @@
|
|||
{"root":["../../src/app.tsx","../../src/main.tsx","../../src/vite-env.d.ts","../../src/components/changepassworddialog.tsx","../../src/components/sparqlexplorer.tsx","../../src/components/ontology/datamapviewer.tsx","../../src/components/ontology/linkmlschemaviewer.tsx","../../src/components/ontology/ontologyviewer.tsx","../../src/config/api.ts","../../src/context/authcontext.tsx","../../src/lib/semantic-cache.ts","../../src/lib/linkml/custodian-data-mappings.ts","../../src/lib/linkml/schema-loader.ts","../../src/lib/ontology/ontology-loader.ts","../../src/pages/browsepage.tsx","../../src/pages/chatpage.tsx","../../src/pages/loginpage.tsx","../../src/pages/mappage.tsx","../../src/pages/ontologypage.tsx","../../src/pages/rulespage.tsx","../../src/pages/statspage.tsx","../../src/services/authapi.ts"],"version":"5.9.3"}
|
||||
{"root":["../../src/app.tsx","../../src/main.tsx","../../src/vite-env.d.ts","../../src/components/changepassworddialog.tsx","../../src/components/debugpanel.tsx","../../src/components/sparqlexplorer.tsx","../../src/components/ontology/datamapviewer.tsx","../../src/components/ontology/linkmlschemaviewer.tsx","../../src/components/ontology/ontologyviewer.tsx","../../src/config/api.ts","../../src/context/authcontext.tsx","../../src/lib/semantic-cache.ts","../../src/lib/linkml/custodian-data-mappings.ts","../../src/lib/linkml/schema-loader.ts","../../src/lib/ontology/ontology-loader.ts","../../src/pages/browsepage.tsx","../../src/pages/chatpage.tsx","../../src/pages/loginpage.tsx","../../src/pages/mappage.tsx","../../src/pages/ontologypage.tsx","../../src/pages/rulespage.tsx","../../src/pages/statspage.tsx","../../src/services/authapi.ts"],"version":"5.9.3"}
|
||||
1
apps/archief-assistent/node_modules/@types/d3
generated
vendored
Symbolic link
1
apps/archief-assistent/node_modules/@types/d3
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../../../../node_modules/.pnpm/@types+d3@7.4.3/node_modules/@types/d3
|
||||
1
apps/archief-assistent/node_modules/d3
generated
vendored
Symbolic link
1
apps/archief-assistent/node_modules/d3
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../../../node_modules/.pnpm/d3@7.9.0/node_modules/d3
|
||||
1
apps/archief-assistent/node_modules/lucide-react
generated
vendored
Symbolic link
1
apps/archief-assistent/node_modules/lucide-react
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../../../node_modules/.pnpm/lucide-react@0.511.0_react@19.2.3/node_modules/lucide-react
|
||||
|
|
@ -28,9 +28,12 @@
|
|||
"react-markdown": "^10.1.0",
|
||||
"react-router-dom": "^7.9.6",
|
||||
"rehype-raw": "^7.0.0",
|
||||
"remark-gfm": "^4.0.1"
|
||||
"remark-gfm": "^4.0.1",
|
||||
"lucide-react": "^0.511.0",
|
||||
"d3": "^7.9.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/d3": "^7.4.3",
|
||||
"@types/js-yaml": "^4.0.9",
|
||||
"@types/node": "^24.10.1",
|
||||
"@types/react": "^19.2.5",
|
||||
|
|
|
|||
675
apps/archief-assistent/src/components/DebugPanel.css
Normal file
675
apps/archief-assistent/src/components/DebugPanel.css
Normal file
|
|
@ -0,0 +1,675 @@
|
|||
/**
|
||||
* DebugPanel.css
|
||||
*
|
||||
* Styles for the enhanced debug panel with tabs for:
|
||||
* - Raw Results (JSON view)
|
||||
* - Knowledge Graph (D3 visualization)
|
||||
* - Embeddings (PCA projection)
|
||||
*/
|
||||
|
||||
/* Main container */
|
||||
.debug-panel {
|
||||
background: var(--color-surface, #1e1e1e);
|
||||
border: 1px solid var(--color-border, #333);
|
||||
border-radius: 8px;
|
||||
margin-top: 12px;
|
||||
overflow: hidden;
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
/* Tab navigation */
|
||||
.debug-panel__tabs {
|
||||
display: flex;
|
||||
gap: 0;
|
||||
border-bottom: 1px solid var(--color-border, #333);
|
||||
background: var(--color-surface-elevated, #252525);
|
||||
}
|
||||
|
||||
.debug-panel__tab {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 6px;
|
||||
padding: 8px 14px;
|
||||
border: none;
|
||||
background: transparent;
|
||||
color: var(--color-text-secondary, #888);
|
||||
cursor: pointer;
|
||||
transition: all 0.15s ease;
|
||||
font-size: 12px;
|
||||
font-weight: 500;
|
||||
border-bottom: 2px solid transparent;
|
||||
margin-bottom: -1px;
|
||||
}
|
||||
|
||||
.debug-panel__tab:hover {
|
||||
color: var(--color-text-primary, #fff);
|
||||
background: rgba(255, 255, 255, 0.05);
|
||||
}
|
||||
|
||||
.debug-panel__tab--active {
|
||||
color: var(--color-accent, #3b82f6);
|
||||
border-bottom-color: var(--color-accent, #3b82f6);
|
||||
background: rgba(59, 130, 246, 0.08);
|
||||
}
|
||||
|
||||
.debug-panel__tab svg {
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
/* Tab content */
|
||||
.debug-panel__content {
|
||||
max-height: 350px;
|
||||
overflow-y: auto;
|
||||
}
|
||||
|
||||
/* ============================================
|
||||
Raw Results Tab Styles
|
||||
============================================ */
|
||||
|
||||
.debug-panel__raw-results {
|
||||
padding: 12px;
|
||||
}
|
||||
|
||||
/* Toolbar */
|
||||
.debug-panel__toolbar {
|
||||
display: flex;
|
||||
gap: 10px;
|
||||
align-items: center;
|
||||
margin-bottom: 12px;
|
||||
padding-bottom: 10px;
|
||||
border-bottom: 1px solid var(--color-border, #333);
|
||||
}
|
||||
|
||||
.debug-panel__search {
|
||||
flex: 1;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 6px;
|
||||
background: var(--color-surface-elevated, #252525);
|
||||
border: 1px solid var(--color-border, #333);
|
||||
border-radius: 6px;
|
||||
padding: 6px 10px;
|
||||
}
|
||||
|
||||
.debug-panel__search svg {
|
||||
color: var(--color-text-secondary, #888);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
.debug-panel__search-input {
|
||||
flex: 1;
|
||||
border: none;
|
||||
background: transparent;
|
||||
color: var(--color-text-primary, #fff);
|
||||
font-size: 12px;
|
||||
outline: none;
|
||||
}
|
||||
|
||||
.debug-panel__search-input::placeholder {
|
||||
color: var(--color-text-secondary, #666);
|
||||
}
|
||||
|
||||
.debug-panel__search-clear {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
width: 16px;
|
||||
height: 16px;
|
||||
border: none;
|
||||
background: var(--color-border, #444);
|
||||
color: var(--color-text-secondary, #888);
|
||||
border-radius: 50%;
|
||||
cursor: pointer;
|
||||
transition: all 0.15s ease;
|
||||
}
|
||||
|
||||
.debug-panel__search-clear:hover {
|
||||
background: var(--color-text-secondary, #666);
|
||||
color: var(--color-text-primary, #fff);
|
||||
}
|
||||
|
||||
/* Copy button */
|
||||
.debug-panel__copy-btn {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 6px;
|
||||
padding: 6px 12px;
|
||||
border: 1px solid var(--color-border, #444);
|
||||
background: var(--color-surface-elevated, #252525);
|
||||
color: var(--color-text-secondary, #aaa);
|
||||
border-radius: 6px;
|
||||
cursor: pointer;
|
||||
font-size: 12px;
|
||||
transition: all 0.15s ease;
|
||||
}
|
||||
|
||||
.debug-panel__copy-btn:hover {
|
||||
background: rgba(255, 255, 255, 0.08);
|
||||
border-color: var(--color-text-secondary, #666);
|
||||
color: var(--color-text-primary, #fff);
|
||||
}
|
||||
|
||||
.debug-panel__copy-btn--copied {
|
||||
border-color: #10b981;
|
||||
color: #10b981;
|
||||
background: rgba(16, 185, 129, 0.1);
|
||||
}
|
||||
|
||||
/* Collapsible sections */
|
||||
.debug-panel__section {
|
||||
margin-bottom: 10px;
|
||||
border: 1px solid var(--color-border, #333);
|
||||
border-radius: 6px;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.debug-panel__section-header {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
width: 100%;
|
||||
padding: 10px 12px;
|
||||
border: none;
|
||||
background: var(--color-surface-elevated, #252525);
|
||||
color: var(--color-text-primary, #fff);
|
||||
cursor: pointer;
|
||||
transition: background 0.15s ease;
|
||||
}
|
||||
|
||||
.debug-panel__section-header:hover {
|
||||
background: rgba(255, 255, 255, 0.05);
|
||||
}
|
||||
|
||||
.debug-panel__section-title {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
font-size: 12px;
|
||||
font-weight: 500;
|
||||
}
|
||||
|
||||
.debug-panel__section-count {
|
||||
color: var(--color-text-secondary, #888);
|
||||
font-weight: 400;
|
||||
font-size: 11px;
|
||||
}
|
||||
|
||||
/* JSON display */
|
||||
.debug-panel__json {
|
||||
margin: 0;
|
||||
padding: 12px;
|
||||
background: var(--color-surface, #1a1a1a);
|
||||
font-family: 'JetBrains Mono', 'Fira Code', 'Consolas', monospace;
|
||||
font-size: 11px;
|
||||
line-height: 1.5;
|
||||
color: #93c5fd;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
overflow-x: auto;
|
||||
max-height: 200px;
|
||||
overflow-y: auto;
|
||||
}
|
||||
|
||||
/* Show all toggle */
|
||||
.debug-panel__show-all {
|
||||
display: block;
|
||||
width: 100%;
|
||||
padding: 8px;
|
||||
border: 1px dashed var(--color-border, #444);
|
||||
background: transparent;
|
||||
color: var(--color-text-secondary, #888);
|
||||
border-radius: 6px;
|
||||
cursor: pointer;
|
||||
font-size: 12px;
|
||||
transition: all 0.15s ease;
|
||||
}
|
||||
|
||||
.debug-panel__show-all:hover {
|
||||
border-color: var(--color-accent, #3b82f6);
|
||||
color: var(--color-accent, #3b82f6);
|
||||
}
|
||||
|
||||
/* Empty state */
|
||||
.debug-panel__empty {
|
||||
text-align: center;
|
||||
padding: 40px 20px;
|
||||
color: var(--color-text-secondary, #666);
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
/* ============================================
|
||||
Knowledge Graph Tab Styles
|
||||
============================================ */
|
||||
|
||||
.debug-panel__graph {
|
||||
position: relative;
|
||||
padding: 12px;
|
||||
}
|
||||
|
||||
.debug-panel__graph-stats {
|
||||
display: flex;
|
||||
gap: 16px;
|
||||
margin-bottom: 10px;
|
||||
font-size: 11px;
|
||||
color: var(--color-text-secondary, #888);
|
||||
}
|
||||
|
||||
.debug-panel__graph-svg {
|
||||
width: 100%;
|
||||
height: 250px;
|
||||
background: var(--color-surface, #1a1a1a);
|
||||
border-radius: 6px;
|
||||
border: 1px solid var(--color-border, #333);
|
||||
}
|
||||
|
||||
.debug-panel__node-info {
|
||||
position: absolute;
|
||||
bottom: 60px;
|
||||
left: 20px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
background: rgba(30, 30, 30, 0.95);
|
||||
border: 1px solid var(--color-border, #444);
|
||||
border-radius: 6px;
|
||||
padding: 10px 14px;
|
||||
font-size: 12px;
|
||||
max-width: 200px;
|
||||
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
|
||||
}
|
||||
|
||||
.debug-panel__node-info strong {
|
||||
color: var(--color-text-primary, #fff);
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.debug-panel__node-type {
|
||||
color: var(--color-accent, #3b82f6);
|
||||
font-size: 11px;
|
||||
text-transform: capitalize;
|
||||
}
|
||||
|
||||
.debug-panel__node-score {
|
||||
color: var(--color-text-secondary, #888);
|
||||
font-size: 10px;
|
||||
}
|
||||
|
||||
.debug-panel__graph-hint {
|
||||
text-align: center;
|
||||
margin-top: 8px;
|
||||
font-size: 11px;
|
||||
color: var(--color-text-secondary, #666);
|
||||
}
|
||||
|
||||
/* ============================================
|
||||
Embeddings Tab Styles
|
||||
============================================ */
|
||||
|
||||
.debug-panel__embeddings {
|
||||
position: relative;
|
||||
padding: 12px;
|
||||
}
|
||||
|
||||
.debug-panel__embeddings-stats {
|
||||
display: flex;
|
||||
gap: 16px;
|
||||
margin-bottom: 10px;
|
||||
font-size: 11px;
|
||||
color: var(--color-text-secondary, #888);
|
||||
}
|
||||
|
||||
.debug-panel__embeddings-canvas {
|
||||
width: 100%;
|
||||
height: 200px;
|
||||
border-radius: 6px;
|
||||
border: 1px solid var(--color-border, #333);
|
||||
cursor: crosshair;
|
||||
}
|
||||
|
||||
.debug-panel__point-info {
|
||||
position: absolute;
|
||||
bottom: 60px;
|
||||
left: 20px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
background: rgba(30, 30, 30, 0.95);
|
||||
border: 1px solid var(--color-border, #444);
|
||||
border-radius: 6px;
|
||||
padding: 10px 14px;
|
||||
font-size: 12px;
|
||||
max-width: 200px;
|
||||
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
|
||||
pointer-events: none;
|
||||
}
|
||||
|
||||
.debug-panel__point-info strong {
|
||||
color: var(--color-text-primary, #fff);
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.debug-panel__point-type {
|
||||
color: #8b5cf6;
|
||||
font-size: 11px;
|
||||
text-transform: capitalize;
|
||||
}
|
||||
|
||||
.debug-panel__point-score {
|
||||
color: var(--color-text-secondary, #888);
|
||||
font-size: 10px;
|
||||
}
|
||||
|
||||
/* ============================================
|
||||
Graph Controls (Zoom, Cluster, Export)
|
||||
============================================ */
|
||||
|
||||
.debug-panel__graph-controls {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
margin-bottom: 10px;
|
||||
padding-bottom: 8px;
|
||||
border-bottom: 1px solid var(--color-border, #333);
|
||||
}
|
||||
|
||||
.debug-panel__graph-buttons {
|
||||
display: flex;
|
||||
gap: 4px;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.debug-panel__icon-btn {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
gap: 4px;
|
||||
padding: 6px 8px;
|
||||
border: 1px solid var(--color-border, #444);
|
||||
background: var(--color-surface-elevated, #252525);
|
||||
color: var(--color-text-secondary, #888);
|
||||
border-radius: 4px;
|
||||
cursor: pointer;
|
||||
font-size: 11px;
|
||||
transition: all 0.15s ease;
|
||||
}
|
||||
|
||||
.debug-panel__icon-btn:hover {
|
||||
background: rgba(255, 255, 255, 0.08);
|
||||
border-color: var(--color-text-secondary, #666);
|
||||
color: var(--color-text-primary, #fff);
|
||||
}
|
||||
|
||||
.debug-panel__icon-btn--active {
|
||||
background: rgba(59, 130, 246, 0.15);
|
||||
border-color: var(--color-accent, #3b82f6);
|
||||
color: var(--color-accent, #3b82f6);
|
||||
}
|
||||
|
||||
.debug-panel__export-group {
|
||||
display: flex;
|
||||
gap: 2px;
|
||||
margin-left: 8px;
|
||||
padding-left: 8px;
|
||||
border-left: 1px solid var(--color-border, #333);
|
||||
}
|
||||
|
||||
.debug-panel__zoom-level {
|
||||
font-family: 'JetBrains Mono', 'Fira Code', monospace;
|
||||
font-size: 10px;
|
||||
color: var(--color-text-secondary, #666);
|
||||
min-width: 40px;
|
||||
text-align: right;
|
||||
}
|
||||
|
||||
/* Node close button */
|
||||
.debug-panel__node-close {
|
||||
position: absolute;
|
||||
top: 4px;
|
||||
right: 4px;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
width: 18px;
|
||||
height: 18px;
|
||||
border: none;
|
||||
background: rgba(255, 255, 255, 0.1);
|
||||
color: var(--color-text-secondary, #888);
|
||||
border-radius: 50%;
|
||||
cursor: pointer;
|
||||
transition: all 0.15s ease;
|
||||
}
|
||||
|
||||
.debug-panel__node-close:hover {
|
||||
background: rgba(255, 255, 255, 0.2);
|
||||
color: var(--color-text-primary, #fff);
|
||||
}
|
||||
|
||||
.debug-panel__node-id {
|
||||
color: var(--color-text-secondary, #666);
|
||||
font-size: 9px;
|
||||
font-family: 'JetBrains Mono', 'Fira Code', monospace;
|
||||
word-break: break-all;
|
||||
}
|
||||
|
||||
/* Graph legend */
|
||||
.debug-panel__graph-legend {
|
||||
position: absolute;
|
||||
top: 58px;
|
||||
right: 20px;
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 8px;
|
||||
background: rgba(30, 30, 30, 0.9);
|
||||
border: 1px solid var(--color-border, #333);
|
||||
border-radius: 4px;
|
||||
padding: 6px 10px;
|
||||
max-width: 180px;
|
||||
}
|
||||
|
||||
.debug-panel__legend-item {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 4px;
|
||||
font-size: 9px;
|
||||
color: var(--color-text-secondary, #888);
|
||||
text-transform: capitalize;
|
||||
}
|
||||
|
||||
.debug-panel__legend-dot {
|
||||
width: 8px;
|
||||
height: 8px;
|
||||
border-radius: 50%;
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
/* ============================================
|
||||
Embeddings Controls
|
||||
============================================ */
|
||||
|
||||
.debug-panel__embeddings-controls {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
margin-bottom: 10px;
|
||||
padding-bottom: 8px;
|
||||
border-bottom: 1px solid var(--color-border, #333);
|
||||
}
|
||||
|
||||
.debug-panel__embeddings-buttons {
|
||||
display: flex;
|
||||
gap: 4px;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.debug-panel__embeddings-legend {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 10px;
|
||||
margin-top: 10px;
|
||||
padding-top: 8px;
|
||||
border-top: 1px solid var(--color-border, #333);
|
||||
}
|
||||
|
||||
/* ============================================
|
||||
Virtual Scrolling / Load More
|
||||
============================================ */
|
||||
|
||||
.debug-panel__load-more {
|
||||
text-align: center;
|
||||
padding: 12px;
|
||||
color: var(--color-text-secondary, #666);
|
||||
font-size: 11px;
|
||||
background: var(--color-surface-elevated, #252525);
|
||||
border-radius: 4px;
|
||||
margin-top: 8px;
|
||||
}
|
||||
|
||||
/* ============================================
|
||||
Timeline Tab Styles
|
||||
============================================ */
|
||||
|
||||
.debug-panel__timeline {
|
||||
position: relative;
|
||||
padding: 12px;
|
||||
}
|
||||
|
||||
.debug-panel__timeline-stats {
|
||||
display: flex;
|
||||
gap: 16px;
|
||||
margin-bottom: 10px;
|
||||
font-size: 11px;
|
||||
color: var(--color-text-secondary, #888);
|
||||
}
|
||||
|
||||
.debug-panel__timeline-svg {
|
||||
width: 100%;
|
||||
height: 200px;
|
||||
background: var(--color-surface, #1a1a1a);
|
||||
border-radius: 6px;
|
||||
border: 1px solid var(--color-border, #333);
|
||||
}
|
||||
|
||||
.debug-panel__timeline-axis text {
|
||||
fill: var(--color-text-secondary, #888);
|
||||
font-size: 10px;
|
||||
}
|
||||
|
||||
.debug-panel__timeline-axis line,
|
||||
.debug-panel__timeline-axis path {
|
||||
stroke: var(--color-border, #444);
|
||||
}
|
||||
|
||||
/* Event info popup */
|
||||
.debug-panel__event-info {
|
||||
position: absolute;
|
||||
bottom: 60px;
|
||||
left: 20px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
background: rgba(30, 30, 30, 0.95);
|
||||
border: 1px solid var(--color-border, #444);
|
||||
border-radius: 6px;
|
||||
padding: 10px 14px;
|
||||
font-size: 12px;
|
||||
max-width: 220px;
|
||||
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
|
||||
}
|
||||
|
||||
.debug-panel__event-info strong {
|
||||
color: var(--color-text-primary, #fff);
|
||||
font-weight: 600;
|
||||
padding-right: 20px;
|
||||
}
|
||||
|
||||
.debug-panel__event-close {
|
||||
position: absolute;
|
||||
top: 4px;
|
||||
right: 4px;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
width: 18px;
|
||||
height: 18px;
|
||||
border: none;
|
||||
background: rgba(255, 255, 255, 0.1);
|
||||
color: var(--color-text-secondary, #888);
|
||||
border-radius: 50%;
|
||||
cursor: pointer;
|
||||
transition: all 0.15s ease;
|
||||
}
|
||||
|
||||
.debug-panel__event-close:hover {
|
||||
background: rgba(255, 255, 255, 0.2);
|
||||
color: var(--color-text-primary, #fff);
|
||||
}
|
||||
|
||||
.debug-panel__event-date {
|
||||
color: var(--color-accent, #3b82f6);
|
||||
font-size: 11px;
|
||||
}
|
||||
|
||||
.debug-panel__event-type {
|
||||
color: #f59e0b;
|
||||
font-size: 11px;
|
||||
text-transform: capitalize;
|
||||
}
|
||||
|
||||
.debug-panel__event-desc {
|
||||
color: var(--color-text-secondary, #888);
|
||||
font-size: 11px;
|
||||
margin: 4px 0 0 0;
|
||||
line-height: 1.4;
|
||||
}
|
||||
|
||||
/* ============================================
|
||||
Scrollbar Styling
|
||||
============================================ */
|
||||
|
||||
.debug-panel__content::-webkit-scrollbar,
|
||||
.debug-panel__json::-webkit-scrollbar {
|
||||
width: 6px;
|
||||
}
|
||||
|
||||
.debug-panel__content::-webkit-scrollbar-track,
|
||||
.debug-panel__json::-webkit-scrollbar-track {
|
||||
background: transparent;
|
||||
}
|
||||
|
||||
.debug-panel__content::-webkit-scrollbar-thumb,
|
||||
.debug-panel__json::-webkit-scrollbar-thumb {
|
||||
background: var(--color-border, #444);
|
||||
border-radius: 3px;
|
||||
}
|
||||
|
||||
.debug-panel__content::-webkit-scrollbar-thumb:hover,
|
||||
.debug-panel__json::-webkit-scrollbar-thumb:hover {
|
||||
background: var(--color-text-secondary, #666);
|
||||
}
|
||||
|
||||
/* ============================================
|
||||
Responsive adjustments
|
||||
============================================ */
|
||||
|
||||
@media (max-width: 768px) {
|
||||
.debug-panel__tabs {
|
||||
overflow-x: auto;
|
||||
}
|
||||
|
||||
.debug-panel__tab {
|
||||
padding: 8px 10px;
|
||||
font-size: 11px;
|
||||
}
|
||||
|
||||
.debug-panel__tab span {
|
||||
display: none;
|
||||
}
|
||||
|
||||
.debug-panel__toolbar {
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.debug-panel__copy-btn span {
|
||||
display: none;
|
||||
}
|
||||
}
|
||||
1455
apps/archief-assistent/src/components/DebugPanel.tsx
Normal file
1455
apps/archief-assistent/src/components/DebugPanel.tsx
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -50,6 +50,11 @@ import type { CachedResponse, CacheStats } from '../lib/semantic-cache'
|
|||
import { SPARQLExplorer } from '../components/SPARQLExplorer'
|
||||
import type { SPARQLResult } from '../components/SPARQLExplorer'
|
||||
|
||||
// Import Debug Panel component
|
||||
import { DebugPanel } from '../components/DebugPanel'
|
||||
import type { DebugPanelTab } from '../components/DebugPanel'
|
||||
import { Code } from 'lucide-react'
|
||||
|
||||
// NA Color palette
|
||||
const naColors = {
|
||||
primary: '#007bc7',
|
||||
|
|
@ -367,6 +372,11 @@ function ChatPage() {
|
|||
})
|
||||
const [cacheStats, setCacheStats] = useState<CacheStats | null>(null)
|
||||
|
||||
// Debug Panel state
|
||||
const [showDebugPanel, setShowDebugPanel] = useState(false)
|
||||
const [debugPanelTab, setDebugPanelTab] = useState<DebugPanelTab>('raw')
|
||||
const [debugResults, setDebugResults] = useState<Record<string, unknown>[]>([])
|
||||
|
||||
// Derive provider from selected model
|
||||
const selectedModelInfo = LLM_MODELS.find(m => m.id === selectedModel) || LLM_MODELS[0]
|
||||
const llmProvider = selectedModelInfo.provider
|
||||
|
|
@ -692,7 +702,8 @@ function ChatPage() {
|
|||
}))
|
||||
|
||||
// Parse institutions from retrieved_results (metadata is nested)
|
||||
const institutions: Institution[] = ((data.retrieved_results || []) as Record<string, unknown>[]).map((r: Record<string, unknown>) => {
|
||||
const retrievedResults = (data.retrieved_results || []) as Record<string, unknown>[]
|
||||
const institutions: Institution[] = retrievedResults.map((r: Record<string, unknown>) => {
|
||||
const metadata = (r.metadata || {}) as Record<string, unknown>
|
||||
const scores = (r.scores || {}) as Record<string, number>
|
||||
return {
|
||||
|
|
@ -705,6 +716,9 @@ function ChatPage() {
|
|||
score: scores.combined as number | undefined,
|
||||
}
|
||||
})
|
||||
|
||||
// Store retrieved results for Debug Panel
|
||||
setDebugResults(retrievedResults)
|
||||
// ========================================
|
||||
// STORE IN CACHE (after successful response)
|
||||
// ========================================
|
||||
|
|
@ -1105,6 +1119,25 @@ function ChatPage() {
|
|||
</Container>
|
||||
</Box>
|
||||
|
||||
{/* Debug Panel - collapsible section showing RAG results */}
|
||||
<Collapse in={showDebugPanel && debugResults.length > 0}>
|
||||
<Box sx={{
|
||||
borderTop: '1px solid #e0e0e0',
|
||||
maxHeight: '400px',
|
||||
overflow: 'auto',
|
||||
}}>
|
||||
<Container maxWidth="md" sx={{ py: 2 }}>
|
||||
<DebugPanel
|
||||
results={debugResults}
|
||||
activeTab={debugPanelTab}
|
||||
onTabChange={setDebugPanelTab}
|
||||
t={(key) => key}
|
||||
language="nl"
|
||||
/>
|
||||
</Container>
|
||||
</Box>
|
||||
</Collapse>
|
||||
|
||||
{/* Input Area */}
|
||||
<Box
|
||||
component="form"
|
||||
|
|
@ -1148,6 +1181,21 @@ function ChatPage() {
|
|||
<DeleteSweepIcon sx={{ fontSize: 16 }} />
|
||||
</IconButton>
|
||||
</Tooltip>
|
||||
{/* Debug Panel Toggle */}
|
||||
<Tooltip title={showDebugPanel ? "Debug paneel verbergen" : "Debug paneel tonen"} placement="top">
|
||||
<IconButton
|
||||
size="small"
|
||||
onClick={() => setShowDebugPanel(!showDebugPanel)}
|
||||
sx={{
|
||||
width: 24,
|
||||
height: 24,
|
||||
color: showDebugPanel ? naColors.primary : 'text.secondary',
|
||||
'&:hover': { color: naColors.primary },
|
||||
}}
|
||||
>
|
||||
<Code size={16} />
|
||||
</IconButton>
|
||||
</Tooltip>
|
||||
</Box>
|
||||
)}
|
||||
|
||||
|
|
|
|||
|
|
@ -3,6 +3,28 @@ Heritage RAG Backend
|
|||
|
||||
Multi-source retrieval-augmented generation system for heritage custodian data.
|
||||
Combines Qdrant vector search, Oxigraph SPARQL, TypeDB, and PostGIS.
|
||||
|
||||
New modules (v1.1.0):
|
||||
- temporal_resolver: Temporal conflict resolution for historical facts
|
||||
- semantic_router: Signal-based query routing (no LLM)
|
||||
- event_retriever: Hypergraph-based event retrieval
|
||||
"""
|
||||
|
||||
__version__ = "1.0.0"
|
||||
__version__ = "1.1.0"
|
||||
|
||||
# Lazy imports to avoid circular dependencies
|
||||
def get_temporal_resolver():
|
||||
from .temporal_resolver import get_temporal_resolver
|
||||
return get_temporal_resolver()
|
||||
|
||||
def get_signal_extractor():
|
||||
from .semantic_router import get_signal_extractor
|
||||
return get_signal_extractor()
|
||||
|
||||
def get_decision_router():
|
||||
from .semantic_router import get_decision_router
|
||||
return get_decision_router()
|
||||
|
||||
def create_event_retriever(*args, **kwargs):
|
||||
from .event_retriever import create_event_retriever
|
||||
return create_event_retriever(*args, **kwargs)
|
||||
|
|
|
|||
|
|
@ -32,6 +32,20 @@ import httpx
|
|||
from dspy import Example, Prediction, History
|
||||
from dspy.streaming import StatusMessage, StreamListener, StatusMessageProvider
|
||||
|
||||
# Semantic routing (Signal-Decision pattern) for fast LLM-free query classification
|
||||
from .semantic_router import (
|
||||
QuerySignals,
|
||||
RouteConfig,
|
||||
get_signal_extractor,
|
||||
get_decision_router,
|
||||
)
|
||||
|
||||
# Temporal intent extraction for detailed temporal constraint detection
|
||||
from .temporal_intent import (
|
||||
TemporalConstraint,
|
||||
get_temporal_extractor,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
|
|
@ -1670,14 +1684,34 @@ class HeritageQueryRouter(dspy.Module):
|
|||
If provided, routing uses this LM instead of the global default.
|
||||
Recommended: Use a fast model like glm-4.5-flash or gpt-4o-mini
|
||||
for routing while keeping quality models for answer generation.
|
||||
signal_threshold: Confidence threshold (0.0-1.0) for signal-based routing.
|
||||
When semantic signal extraction confidence >= this threshold,
|
||||
skip LLM classification and use signal-based routing (faster).
|
||||
Set to 1.0 to always use LLM classification.
|
||||
Default: 0.8 (skip LLM when signals are clear).
|
||||
"""
|
||||
|
||||
def __init__(self, use_schema_aware: Optional[bool] = None, fast_lm: Optional[dspy.LM] = None):
|
||||
def __init__(
|
||||
self,
|
||||
use_schema_aware: Optional[bool] = None,
|
||||
fast_lm: Optional[dspy.LM] = None,
|
||||
signal_threshold: float = 0.8
|
||||
):
|
||||
super().__init__()
|
||||
|
||||
# Store fast LM for routing (None means use global default)
|
||||
self.fast_lm = fast_lm
|
||||
|
||||
# Signal-Decision pattern: fast LLM-free routing for high-confidence queries
|
||||
self.signal_extractor = get_signal_extractor()
|
||||
self.decision_router = get_decision_router()
|
||||
self.signal_threshold = signal_threshold
|
||||
|
||||
# Temporal intent extraction for detailed constraint detection
|
||||
self.temporal_extractor = get_temporal_extractor()
|
||||
|
||||
logger.info(f"HeritageQueryRouter signal threshold: {signal_threshold}")
|
||||
|
||||
# Determine whether to use schema-aware signature
|
||||
if use_schema_aware is None:
|
||||
use_schema_aware = SCHEMA_LOADER_AVAILABLE
|
||||
|
|
@ -1710,6 +1744,9 @@ class HeritageQueryRouter(dspy.Module):
|
|||
def forward(self, question: str, language: str = "nl", history: History | None = None) -> Prediction:
|
||||
"""Classify query and determine routing.
|
||||
|
||||
Uses Signal-Decision pattern: fast signal extraction first, then LLM only
|
||||
when signals are ambiguous (confidence < signal_threshold).
|
||||
|
||||
Args:
|
||||
question: User's current question
|
||||
language: Language code (nl, en, etc.)
|
||||
|
|
@ -1726,10 +1763,81 @@ class HeritageQueryRouter(dspy.Module):
|
|||
- target_role_category: Staff role category (when entity_type='person')
|
||||
- target_staff_role: Specific staff role (when entity_type='person')
|
||||
- target_custodian_type: Custodian type (when entity_type='institution')
|
||||
- signal_based: Whether routing was signal-based (True) or LLM-based (False)
|
||||
- route_config: RouteConfig from semantic router (when signal_based=True)
|
||||
- temporal_constraint: TemporalConstraint with type, dates, and recommended
|
||||
SPARQL template (when intent='temporal' and signal_based=True)
|
||||
"""
|
||||
if history is None:
|
||||
history = History(messages=[])
|
||||
|
||||
# ===== SIGNAL EXTRACTION (Fast, no LLM) =====
|
||||
signals = self.signal_extractor.extract_signals(question)
|
||||
logger.debug(
|
||||
f"Signal extraction: entity_type={signals.entity_type}, intent={signals.intent}, "
|
||||
f"confidence={signals.confidence:.2f}, temporal={signals.has_temporal_constraint}"
|
||||
)
|
||||
|
||||
# ===== HIGH-CONFIDENCE SIGNALS: Skip LLM =====
|
||||
if signals.confidence >= self.signal_threshold:
|
||||
logger.info(
|
||||
f"Signal-based routing (confidence={signals.confidence:.2f} >= {self.signal_threshold}): "
|
||||
f"entity_type={signals.entity_type}, intent={signals.intent}"
|
||||
)
|
||||
|
||||
# Get route configuration from semantic decision router
|
||||
route_config = self.decision_router.route(signals)
|
||||
|
||||
# Map signal intent to source mapping
|
||||
recommended_sources = self.source_mapping.get(
|
||||
signals.intent, ["qdrant", "sparql"]
|
||||
)
|
||||
|
||||
# ===== TEMPORAL CONSTRAINT EXTRACTION =====
|
||||
# When temporal signals detected, extract detailed constraints for SPARQL template selection
|
||||
temporal_constraint: TemporalConstraint | None = None
|
||||
if signals.has_temporal_constraint:
|
||||
temporal_constraint = self.temporal_extractor.extract(question)
|
||||
logger.debug(
|
||||
f"Temporal constraint: type={temporal_constraint.constraint_type}, "
|
||||
f"template={temporal_constraint.recommended_template}, "
|
||||
f"dates={temporal_constraint.date_start}/{temporal_constraint.date_end}"
|
||||
)
|
||||
|
||||
# Build prediction from signals (no LLM call)
|
||||
prediction = Prediction(
|
||||
intent=signals.intent,
|
||||
entities=signals.institution_mentions + signals.person_mentions,
|
||||
sources=recommended_sources,
|
||||
reasoning=f"Signal-based routing (confidence={signals.confidence:.2f})",
|
||||
resolved_question=question, # No reference resolution without LLM
|
||||
entity_type=signals.entity_type,
|
||||
target_role_category='UNKNOWN', # Requires LLM for detailed classification
|
||||
target_staff_role='UNKNOWN',
|
||||
target_custodian_type='UNKNOWN',
|
||||
target_custodian_slug=None,
|
||||
# Signal-based routing metadata
|
||||
signal_based=True,
|
||||
signals=signals,
|
||||
route_config=route_config,
|
||||
# Temporal constraint for SPARQL template selection
|
||||
temporal_constraint=temporal_constraint,
|
||||
)
|
||||
|
||||
# For person queries, try to extract institution slug
|
||||
if signals.entity_type == 'person' and signals.institution_mentions:
|
||||
target_custodian_slug = extract_institution_slug_from_query(question)
|
||||
if target_custodian_slug:
|
||||
prediction.target_custodian_slug = target_custodian_slug
|
||||
logger.info(f"Signal-based: extracted institution slug '{target_custodian_slug}'")
|
||||
|
||||
return prediction
|
||||
|
||||
# ===== LOW-CONFIDENCE SIGNALS: Use LLM =====
|
||||
logger.info(
|
||||
f"LLM-based routing (confidence={signals.confidence:.2f} < {self.signal_threshold})"
|
||||
)
|
||||
|
||||
# Use fast LM for routing if configured, otherwise use global default
|
||||
if self.fast_lm:
|
||||
with dspy.settings.context(lm=self.fast_lm):
|
||||
|
|
@ -1804,6 +1912,12 @@ class HeritageQueryRouter(dspy.Module):
|
|||
target_custodian_type=target_custodian_type,
|
||||
# Institution filter for person queries
|
||||
target_custodian_slug=target_custodian_slug,
|
||||
# LLM-based routing metadata
|
||||
signal_based=False,
|
||||
signals=signals, # Include signals even for LLM routing (for debugging)
|
||||
route_config=None,
|
||||
# Temporal constraint (extracted on demand for LLM-based routing if temporal intent)
|
||||
temporal_constraint=None, # Could extract here too if result.intent == 'temporal'
|
||||
)
|
||||
|
||||
return prediction
|
||||
|
|
@ -4215,15 +4329,26 @@ class HeritageRAGPipeline(dspy.Module):
|
|||
result_ghcid = uri.split('/hc/')[-1]
|
||||
break
|
||||
if result_ghcid:
|
||||
# Build nested metadata structure for frontend consistency
|
||||
sparql_city = sparql_result.get('city', '')
|
||||
sparql_address = sparql_result.get('address', '')
|
||||
sparql_website = sparql_result.get('website', '')
|
||||
sparql_only_result = {
|
||||
'ghcid': result_ghcid,
|
||||
'name': sparql_result.get('name', result_ghcid.split('/')[-1].replace('-', ' ').title()),
|
||||
'type': 'institution',
|
||||
'source': 'sparql',
|
||||
# Nested metadata for frontend Knowledge Graph
|
||||
'metadata': {
|
||||
'city': sparql_city,
|
||||
'address': sparql_address,
|
||||
'website': sparql_website,
|
||||
},
|
||||
# Also keep flat fields for backward compatibility
|
||||
'city': sparql_city,
|
||||
'address': sparql_address,
|
||||
'website': sparql_website,
|
||||
}
|
||||
for field in ['address', 'website', 'city']:
|
||||
if field in sparql_result:
|
||||
sparql_only_result[field] = sparql_result[field]
|
||||
inst_results.append(type('SPARQLResult', (), sparql_only_result)())
|
||||
|
||||
if inst_results:
|
||||
|
|
@ -4247,10 +4372,24 @@ class HeritageRAGPipeline(dspy.Module):
|
|||
city = getattr(inst, 'city', '')
|
||||
lat = getattr(inst, 'latitude', None)
|
||||
lon = getattr(inst, 'longitude', None)
|
||||
# Build dict for frontend
|
||||
ghcid = getattr(inst, 'ghcid', None)
|
||||
address = getattr(inst, 'address', '')
|
||||
website = getattr(inst, 'website', '')
|
||||
# Build dict with nested metadata for frontend Knowledge Graph
|
||||
retrieved_results.append({
|
||||
"type": "institution",
|
||||
"ghcid": ghcid,
|
||||
"name": name,
|
||||
# Nested metadata for frontend consistency
|
||||
"metadata": {
|
||||
"institution_type": inst_type,
|
||||
"city": city,
|
||||
"address": address,
|
||||
"website": website,
|
||||
"latitude": lat,
|
||||
"longitude": lon,
|
||||
},
|
||||
# Also keep flat fields for backward compatibility
|
||||
"institution_type": inst_type,
|
||||
"city": city,
|
||||
"latitude": lat,
|
||||
|
|
@ -4757,15 +4896,26 @@ class HeritageRAGPipeline(dspy.Module):
|
|||
result_ghcid = uri.split('/hc/')[-1]
|
||||
break
|
||||
if result_ghcid and result_ghcid not in existing_ghcids:
|
||||
# Build nested metadata structure for frontend consistency
|
||||
sparql_city = sparql_result.get('city', '')
|
||||
sparql_address = sparql_result.get('address', '')
|
||||
sparql_website = sparql_result.get('website', '')
|
||||
sparql_only_result = {
|
||||
'ghcid': result_ghcid,
|
||||
'name': sparql_result.get('name', result_ghcid.split('/')[-1].replace('-', ' ').title()),
|
||||
'type': 'institution',
|
||||
'source': 'sparql',
|
||||
# Nested metadata for frontend Knowledge Graph
|
||||
'metadata': {
|
||||
'city': sparql_city,
|
||||
'address': sparql_address,
|
||||
'website': sparql_website,
|
||||
},
|
||||
# Also keep flat fields for backward compatibility
|
||||
'city': sparql_city,
|
||||
'address': sparql_address,
|
||||
'website': sparql_website,
|
||||
}
|
||||
for field in ['address', 'website', 'city']:
|
||||
if field in sparql_result:
|
||||
sparql_only_result[field] = sparql_result[field]
|
||||
filtered_results.append(type('SPARQLResult', (), sparql_only_result)())
|
||||
sparql_only_count += 1
|
||||
|
||||
|
|
@ -4784,15 +4934,26 @@ class HeritageRAGPipeline(dspy.Module):
|
|||
result_ghcid = uri.split('/hc/')[-1]
|
||||
break
|
||||
if result_ghcid:
|
||||
# Build nested metadata structure for frontend consistency
|
||||
sparql_city = sparql_result.get('city', '')
|
||||
sparql_address = sparql_result.get('address', '')
|
||||
sparql_website = sparql_result.get('website', '')
|
||||
sparql_only_result = {
|
||||
'ghcid': result_ghcid,
|
||||
'name': sparql_result.get('name', result_ghcid.split('/')[-1].replace('-', ' ').title()),
|
||||
'type': 'institution',
|
||||
'source': 'sparql',
|
||||
# Nested metadata for frontend Knowledge Graph
|
||||
'metadata': {
|
||||
'city': sparql_city,
|
||||
'address': sparql_address,
|
||||
'website': sparql_website,
|
||||
},
|
||||
# Also keep flat fields for backward compatibility
|
||||
'city': sparql_city,
|
||||
'address': sparql_address,
|
||||
'website': sparql_website,
|
||||
}
|
||||
for field in ['address', 'website', 'city']:
|
||||
if field in sparql_result:
|
||||
sparql_only_result[field] = sparql_result[field]
|
||||
inst_results.append(type('SPARQLResult', (), sparql_only_result)())
|
||||
|
||||
if inst_results:
|
||||
|
|
@ -4813,11 +4974,30 @@ class HeritageRAGPipeline(dspy.Module):
|
|||
name = getattr(inst, 'name', 'Unknown')
|
||||
inst_type = getattr(inst, 'type', '')
|
||||
city = getattr(inst, 'city', '')
|
||||
ghcid = getattr(inst, 'ghcid', None)
|
||||
address = getattr(inst, 'address', '')
|
||||
website = getattr(inst, 'website', '')
|
||||
lat = getattr(inst, 'latitude', None)
|
||||
lon = getattr(inst, 'longitude', None)
|
||||
# Build dict with nested metadata for frontend Knowledge Graph
|
||||
retrieved_results.append({
|
||||
"type": "institution",
|
||||
"ghcid": ghcid,
|
||||
"name": name,
|
||||
# Nested metadata for frontend consistency
|
||||
"metadata": {
|
||||
"institution_type": inst_type,
|
||||
"city": city,
|
||||
"address": address,
|
||||
"website": website,
|
||||
"latitude": lat,
|
||||
"longitude": lon,
|
||||
},
|
||||
# Also keep flat fields for backward compatibility
|
||||
"institution_type": inst_type,
|
||||
"city": city,
|
||||
"latitude": lat,
|
||||
"longitude": lon,
|
||||
})
|
||||
|
||||
entry = f"- {name}"
|
||||
|
|
|
|||
393
backend/rag/event_retriever.py
Normal file
393
backend/rag/event_retriever.py
Normal file
|
|
@ -0,0 +1,393 @@
|
|||
"""
|
||||
Heritage Event Retrieval using Hypergraph Patterns
|
||||
|
||||
Retrieves organizational change events (mergers, foundings, etc.) using
|
||||
multi-factor scoring: entity overlap + semantic similarity + temporal relevance.
|
||||
|
||||
Based on: docs/plan/external_design_patterns/04_temporal_semantic_hypergraph.md
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Optional, Callable, Any
|
||||
import logging
|
||||
|
||||
import numpy as np
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class HeritageEvent:
|
||||
"""Hyperedge representing a heritage organizational event."""
|
||||
event_id: str
|
||||
event_type: str
|
||||
event_date: datetime
|
||||
participants: dict[str, str] # role -> GHCID
|
||||
description: str
|
||||
affected_collections: list[str] = field(default_factory=list)
|
||||
resulting_entities: list[str] = field(default_factory=list)
|
||||
confidence: float = 1.0
|
||||
embedding: Optional[list[float]] = None
|
||||
|
||||
|
||||
class EventRetriever:
|
||||
"""
|
||||
Retrieve heritage events using hypergraph patterns.
|
||||
|
||||
Uses multi-factor scoring:
|
||||
- Entity overlap (entities mentioned in query match event participants)
|
||||
- Semantic similarity (query embedding vs event description)
|
||||
- Temporal relevance (how close event date is to query date)
|
||||
- Graph connectivity (how connected the event is in the knowledge graph)
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
oxigraph_query_fn: Callable[[str], list[dict]],
|
||||
qdrant_search_fn: Callable[[str, int], list[dict]],
|
||||
embed_fn: Callable[[str], list[float]]
|
||||
):
|
||||
"""
|
||||
Args:
|
||||
oxigraph_query_fn: Function to execute SPARQL queries
|
||||
qdrant_search_fn: Function to search Qdrant events collection
|
||||
embed_fn: Function to embed text
|
||||
"""
|
||||
self.sparql = oxigraph_query_fn
|
||||
self.vector_search = qdrant_search_fn
|
||||
self.embed = embed_fn
|
||||
|
||||
def retrieve(
|
||||
self,
|
||||
query: str,
|
||||
query_entities: list[str] = None,
|
||||
query_time: datetime = None,
|
||||
event_type: str = None,
|
||||
limit: int = 10,
|
||||
weights: dict = None
|
||||
) -> list[tuple[HeritageEvent, float]]:
|
||||
"""
|
||||
Retrieve events using multi-factor scoring.
|
||||
|
||||
Args:
|
||||
query: Natural language query
|
||||
query_entities: GHCIDs mentioned in query
|
||||
query_time: Temporal constraint
|
||||
event_type: Filter by event type (MERGER, FOUNDING, CLOSURE, etc.)
|
||||
limit: Max results
|
||||
weights: Scoring weights for each factor
|
||||
|
||||
Returns:
|
||||
List of (event, score) tuples ordered by relevance
|
||||
"""
|
||||
if weights is None:
|
||||
weights = {
|
||||
"entity": 0.3,
|
||||
"semantic": 0.4,
|
||||
"temporal": 0.2,
|
||||
"graph": 0.1
|
||||
}
|
||||
|
||||
# Phase 1: Candidate generation
|
||||
candidates = {}
|
||||
|
||||
# Entity-based candidates from SPARQL
|
||||
if query_entities:
|
||||
sparql_candidates = self._get_entity_candidates(query_entities, event_type)
|
||||
candidates.update(sparql_candidates)
|
||||
|
||||
# Semantic candidates from Qdrant
|
||||
vector_candidates = self._get_semantic_candidates(query, limit * 2)
|
||||
candidates.update(vector_candidates)
|
||||
|
||||
if not candidates:
|
||||
logger.info(f"No event candidates found for query: {query}")
|
||||
return []
|
||||
|
||||
# Phase 2: Score all candidates
|
||||
scored = []
|
||||
for event_id, event in candidates.items():
|
||||
score = self._score_event(
|
||||
event, query, query_entities, query_time, weights
|
||||
)
|
||||
scored.append((event, score))
|
||||
|
||||
# Sort and return top-k
|
||||
scored.sort(key=lambda x: x[1], reverse=True)
|
||||
return scored[:limit]
|
||||
|
||||
def retrieve_by_type(
|
||||
self,
|
||||
event_type: str,
|
||||
start_date: datetime = None,
|
||||
end_date: datetime = None,
|
||||
limit: int = 50
|
||||
) -> list[HeritageEvent]:
|
||||
"""
|
||||
Retrieve events of a specific type within a date range.
|
||||
|
||||
Simpler retrieval for structured queries (no scoring).
|
||||
"""
|
||||
date_filter = ""
|
||||
if start_date:
|
||||
date_filter += f'FILTER(?date >= "{start_date.isoformat()}"^^xsd:date) '
|
||||
if end_date:
|
||||
date_filter += f'FILTER(?date <= "{end_date.isoformat()}"^^xsd:date) '
|
||||
|
||||
sparql = f"""
|
||||
PREFIX hc: <https://nde.nl/ontology/hc/>
|
||||
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
|
||||
PREFIX schema: <http://schema.org/>
|
||||
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
|
||||
|
||||
SELECT ?event ?eventType ?date ?description WHERE {{
|
||||
?event a hc:OrganizationalChangeEvent ;
|
||||
hc:eventType ?eventType ;
|
||||
hc:eventDate ?date .
|
||||
OPTIONAL {{ ?event schema:description ?description }}
|
||||
|
||||
FILTER(?eventType = "{event_type}")
|
||||
{date_filter}
|
||||
}}
|
||||
ORDER BY ?date
|
||||
LIMIT {limit}
|
||||
"""
|
||||
|
||||
results = self.sparql(sparql)
|
||||
events = []
|
||||
|
||||
for row in results:
|
||||
event = HeritageEvent(
|
||||
event_id=row.get("event", ""),
|
||||
event_type=row.get("eventType", event_type),
|
||||
event_date=datetime.fromisoformat(row["date"]) if row.get("date") else datetime.now(),
|
||||
participants={},
|
||||
description=row.get("description", "")
|
||||
)
|
||||
events.append(event)
|
||||
|
||||
return events
|
||||
|
||||
def _get_entity_candidates(
|
||||
self,
|
||||
ghcids: list[str],
|
||||
event_type: str = None
|
||||
) -> dict[str, HeritageEvent]:
|
||||
"""Get events involving specified entities via SPARQL."""
|
||||
ghcid_filter = ", ".join(f'"{g}"' for g in ghcids)
|
||||
event_type_filter = f'FILTER(?eventType = "{event_type}")' if event_type else ""
|
||||
|
||||
sparql = f"""
|
||||
PREFIX hc: <https://nde.nl/ontology/hc/>
|
||||
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
|
||||
PREFIX schema: <http://schema.org/>
|
||||
|
||||
SELECT DISTINCT ?event ?eventType ?date ?description ?participant ?role WHERE {{
|
||||
?event a hc:OrganizationalChangeEvent ;
|
||||
hc:eventType ?eventType ;
|
||||
hc:eventDate ?date .
|
||||
OPTIONAL {{ ?event schema:description ?description }}
|
||||
|
||||
# Get participants
|
||||
?event ?role ?participant .
|
||||
FILTER(STRSTARTS(STR(?role), "http://www.cidoc-crm.org/cidoc-crm/P") ||
|
||||
STRSTARTS(STR(?role), "https://nde.nl/ontology/hc/"))
|
||||
|
||||
{event_type_filter}
|
||||
}}
|
||||
"""
|
||||
|
||||
results = self.sparql(sparql)
|
||||
return self._results_to_events(results)
|
||||
|
||||
def _get_semantic_candidates(
|
||||
self,
|
||||
query: str,
|
||||
limit: int
|
||||
) -> dict[str, HeritageEvent]:
|
||||
"""Get events via semantic similarity."""
|
||||
try:
|
||||
results = self.vector_search(query, limit)
|
||||
except Exception as e:
|
||||
logger.warning(f"Vector search failed: {e}")
|
||||
return {}
|
||||
|
||||
events = {}
|
||||
for r in results:
|
||||
payload = r.get("payload", {}) if isinstance(r, dict) else {}
|
||||
event_id = r.get("id", str(id(r)))
|
||||
|
||||
try:
|
||||
event_date = datetime.fromisoformat(
|
||||
payload.get("event_date", datetime.now().isoformat())
|
||||
)
|
||||
except (ValueError, TypeError):
|
||||
event_date = datetime.now()
|
||||
|
||||
event = HeritageEvent(
|
||||
event_id=event_id,
|
||||
event_type=payload.get("event_type", "UNKNOWN"),
|
||||
event_date=event_date,
|
||||
participants=payload.get("participants", {}),
|
||||
description=payload.get("description", ""),
|
||||
confidence=r.get("score", 0.5)
|
||||
)
|
||||
events[event.event_id] = event
|
||||
|
||||
return events
|
||||
|
||||
def _score_event(
|
||||
self,
|
||||
event: HeritageEvent,
|
||||
query: str,
|
||||
query_entities: list[str],
|
||||
query_time: datetime,
|
||||
weights: dict
|
||||
) -> float:
|
||||
"""Compute multi-factor relevance score."""
|
||||
scores = {}
|
||||
|
||||
# Entity overlap
|
||||
if query_entities:
|
||||
event_entities = set(event.participants.values())
|
||||
overlap = len(event_entities.intersection(set(query_entities)))
|
||||
scores["entity"] = overlap / max(len(query_entities), 1)
|
||||
else:
|
||||
scores["entity"] = 0.5 # Neutral
|
||||
|
||||
# Semantic similarity
|
||||
try:
|
||||
query_emb = self.embed(query)
|
||||
if event.embedding:
|
||||
scores["semantic"] = self._cosine_similarity(query_emb, event.embedding)
|
||||
elif event.description:
|
||||
desc_emb = self.embed(event.description)
|
||||
scores["semantic"] = self._cosine_similarity(query_emb, desc_emb)
|
||||
else:
|
||||
scores["semantic"] = 0.5
|
||||
except Exception as e:
|
||||
logger.warning(f"Embedding failed: {e}")
|
||||
scores["semantic"] = 0.5
|
||||
|
||||
# Temporal relevance
|
||||
if query_time and event.event_date:
|
||||
days_diff = abs((query_time - event.event_date).days)
|
||||
scores["temporal"] = 1.0 / (1.0 + days_diff / 365.0)
|
||||
else:
|
||||
scores["temporal"] = 0.5 # Neutral
|
||||
|
||||
# Graph connectivity (placeholder - would use SPARQL for full implementation)
|
||||
scores["graph"] = 0.5
|
||||
|
||||
# Weighted sum
|
||||
final_score = sum(weights.get(k, 0) * scores.get(k, 0.5) for k in weights)
|
||||
return final_score
|
||||
|
||||
def _cosine_similarity(self, a: list[float], b: list[float]) -> float:
|
||||
"""Compute cosine similarity between two vectors."""
|
||||
a_np = np.array(a)
|
||||
b_np = np.array(b)
|
||||
norm_product = np.linalg.norm(a_np) * np.linalg.norm(b_np)
|
||||
if norm_product == 0:
|
||||
return 0.0
|
||||
return float(np.dot(a_np, b_np) / norm_product)
|
||||
|
||||
def _results_to_events(self, results: list[dict]) -> dict[str, HeritageEvent]:
|
||||
"""Convert SPARQL results to HeritageEvent objects."""
|
||||
events = {}
|
||||
|
||||
# Group by event ID
|
||||
by_event: dict[str, dict[str, Any]] = {}
|
||||
for row in results:
|
||||
event_id = row.get("event")
|
||||
if not event_id:
|
||||
continue
|
||||
|
||||
if event_id not in by_event:
|
||||
by_event[event_id] = {
|
||||
"event_type": row.get("eventType", "UNKNOWN"),
|
||||
"date": row.get("date"),
|
||||
"description": row.get("description", ""),
|
||||
"participants": {}
|
||||
}
|
||||
|
||||
role = row.get("role", "")
|
||||
if "/" in role:
|
||||
role = role.split("/")[-1] # Extract role from URI
|
||||
participant = row.get("participant")
|
||||
if role and participant:
|
||||
by_event[event_id]["participants"][role] = participant
|
||||
|
||||
# Convert to HeritageEvent objects
|
||||
for event_id, data in by_event.items():
|
||||
try:
|
||||
event_date = datetime.fromisoformat(data["date"]) if data["date"] else datetime.now()
|
||||
except (ValueError, TypeError):
|
||||
event_date = datetime.now()
|
||||
|
||||
events[event_id] = HeritageEvent(
|
||||
event_id=event_id,
|
||||
event_type=data["event_type"],
|
||||
event_date=event_date,
|
||||
participants=data["participants"],
|
||||
description=data["description"]
|
||||
)
|
||||
|
||||
return events
|
||||
|
||||
|
||||
# Factory function for creating EventRetriever with default dependencies
|
||||
def create_event_retriever(
|
||||
oxigraph_endpoint: str = "http://localhost:7878/query",
|
||||
qdrant_collection: str = "heritage_events"
|
||||
) -> EventRetriever:
|
||||
"""
|
||||
Create EventRetriever with standard GLAM dependencies.
|
||||
|
||||
This is a convenience factory that wires up the retriever with
|
||||
default Oxigraph and Qdrant connections.
|
||||
"""
|
||||
# Import here to avoid circular dependencies
|
||||
import requests
|
||||
|
||||
def sparql_query(query: str) -> list[dict]:
|
||||
"""Execute SPARQL query against Oxigraph."""
|
||||
response = requests.post(
|
||||
oxigraph_endpoint,
|
||||
data=query,
|
||||
headers={
|
||||
"Content-Type": "application/sparql-query",
|
||||
"Accept": "application/json"
|
||||
},
|
||||
timeout=30
|
||||
)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
|
||||
# Convert bindings to simple dict format
|
||||
results = []
|
||||
for binding in data.get("results", {}).get("bindings", []):
|
||||
row = {}
|
||||
for key, val in binding.items():
|
||||
row[key] = val.get("value")
|
||||
results.append(row)
|
||||
return results
|
||||
|
||||
def qdrant_search(query: str, limit: int) -> list[dict]:
|
||||
"""Search Qdrant events collection."""
|
||||
# Placeholder - would use actual Qdrant client
|
||||
logger.warning("Qdrant search not implemented - using empty results")
|
||||
return []
|
||||
|
||||
def embed(text: str) -> list[float]:
|
||||
"""Embed text using default embedding model."""
|
||||
# Placeholder - would use actual embedding model
|
||||
logger.warning("Embedding not implemented - using random vector")
|
||||
return list(np.random.randn(384))
|
||||
|
||||
return EventRetriever(
|
||||
oxigraph_query_fn=sparql_query,
|
||||
qdrant_search_fn=qdrant_search,
|
||||
embed_fn=embed
|
||||
)
|
||||
372
backend/rag/semantic_router.py
Normal file
372
backend/rag/semantic_router.py
Normal file
|
|
@ -0,0 +1,372 @@
|
|||
"""
|
||||
Semantic Routing for Heritage RAG
|
||||
|
||||
Implements Signal-Decision architecture for fast, accurate query routing.
|
||||
Based on: docs/plan/external_design_patterns/04_temporal_semantic_hypergraph.md
|
||||
|
||||
Key concepts:
|
||||
- Signal extraction (no LLM) for fast query analysis
|
||||
- Decision routing based on extracted signals
|
||||
- Falls back to LLM classification for low-confidence cases
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Literal, Optional
|
||||
import re
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class QuerySignals:
|
||||
"""Semantic signals extracted from query."""
|
||||
# Primary classification
|
||||
# Using str instead of Literal for runtime flexibility
|
||||
entity_type: str # "person", "institution", "collection", "event", "mixed"
|
||||
intent: str # "geographic", "statistical", "relational", "temporal", etc.
|
||||
|
||||
# Extracted entities
|
||||
institution_mentions: list[str] = field(default_factory=list)
|
||||
person_mentions: list[str] = field(default_factory=list)
|
||||
location_mentions: list[str] = field(default_factory=list)
|
||||
|
||||
# Query characteristics
|
||||
language: str = "nl"
|
||||
has_temporal_constraint: bool = False
|
||||
has_geographic_constraint: bool = False
|
||||
requires_aggregation: bool = False
|
||||
|
||||
# Confidence
|
||||
confidence: float = 0.85
|
||||
|
||||
|
||||
@dataclass
|
||||
class RouteConfig:
|
||||
"""Configuration for query routing."""
|
||||
primary_backend: str
|
||||
secondary_backend: Optional[str] = None
|
||||
qdrant_collection: Optional[str] = None
|
||||
use_temporal_templates: bool = False
|
||||
qdrant_filters: dict = field(default_factory=dict)
|
||||
sparql_variant: Optional[str] = None
|
||||
|
||||
|
||||
class SemanticSignalExtractor:
|
||||
"""
|
||||
Extract semantic signals from queries without LLM calls.
|
||||
|
||||
Uses:
|
||||
- Keyword patterns for entity type detection
|
||||
- Embedding similarity for intent classification
|
||||
- Regex for entity extraction
|
||||
"""
|
||||
|
||||
# Entity type indicators
|
||||
PERSON_INDICATORS = [
|
||||
"wie", "who", "curator", "archivist", "archivaris", "bibliothecaris",
|
||||
"directeur", "director", "medewerker", "staff", "employee",
|
||||
"werkt", "works", "persoon", "person", "hoofd", "manager"
|
||||
]
|
||||
|
||||
INSTITUTION_INDICATORS = [
|
||||
"museum", "musea", "archief", "archieven", "bibliotheek", "bibliotheken",
|
||||
"galerie", "gallery", "instelling", "institution", "organisatie"
|
||||
]
|
||||
|
||||
AGGREGATION_INDICATORS = [
|
||||
"hoeveel", "how many", "count", "aantal", "total", "totaal",
|
||||
"per", "verdeling", "distribution", "gemiddelde", "average"
|
||||
]
|
||||
|
||||
# NOTE: Short words like "in" removed - too many false positives
|
||||
# "in" matches "interessant", "instituut", etc.
|
||||
GEOGRAPHIC_INDICATORS = [
|
||||
"nabij", "near", "waar", "where", "locatie", "location",
|
||||
"provincie", "province", "stad", "city", "regio", "region"
|
||||
]
|
||||
|
||||
# NOTE: Short words like "na" removed - too many false positives
|
||||
# "na" matches "nationaal", "naam", etc.
|
||||
# Use word boundary matching for remaining short indicators
|
||||
TEMPORAL_INDICATORS = [
|
||||
"wanneer", "when", "voor", "before", "tussen", "between",
|
||||
"oudste", "oldest", "nieuwste", "newest",
|
||||
"opgericht", "founded", "gesloten", "closed", "fusie", "merger",
|
||||
"geschiedenis", "history", "tijdlijn", "timeline"
|
||||
]
|
||||
|
||||
# Short indicators that require word boundary matching
|
||||
TEMPORAL_INDICATORS_SHORT = ["na", "after"] # Require \b matching
|
||||
GEOGRAPHIC_INDICATORS_SHORT = ["in"] # Require \b matching
|
||||
|
||||
# Year pattern for temporal detection
|
||||
YEAR_PATTERN = re.compile(r'\b(1[0-9]{3}|20[0-2][0-9])\b') # 1000-2029
|
||||
|
||||
# Known Dutch cities and provinces for location extraction
|
||||
KNOWN_LOCATIONS = [
|
||||
"Amsterdam", "Rotterdam", "Den Haag", "Utrecht", "Groningen",
|
||||
"Noord-Holland", "Zuid-Holland", "Noord-Brabant", "Limburg",
|
||||
"Gelderland", "Friesland", "Overijssel", "Drenthe", "Zeeland",
|
||||
"Flevoland", "Haarlem", "Leiden", "Maastricht", "Eindhoven",
|
||||
"Arnhem", "Nijmegen", "Enschede", "Tilburg", "Breda", "Delft"
|
||||
]
|
||||
|
||||
def __init__(self):
|
||||
self._intent_embeddings = None
|
||||
self._model = None
|
||||
# Precompile word boundary patterns for short indicators
|
||||
self._temporal_short_patterns = [
|
||||
re.compile(rf'\b{ind}\b', re.IGNORECASE)
|
||||
for ind in self.TEMPORAL_INDICATORS_SHORT
|
||||
]
|
||||
self._geographic_short_patterns = [
|
||||
re.compile(rf'\b{ind}\b', re.IGNORECASE)
|
||||
for ind in self.GEOGRAPHIC_INDICATORS_SHORT
|
||||
]
|
||||
|
||||
def _has_word_boundary_match(self, query: str, patterns: list) -> bool:
|
||||
"""Check if any pattern matches with word boundaries."""
|
||||
return any(p.search(query) for p in patterns)
|
||||
|
||||
def extract_signals(self, query: str) -> QuerySignals:
|
||||
"""
|
||||
Extract all semantic signals from query.
|
||||
|
||||
Fast operation - no LLM calls.
|
||||
"""
|
||||
query_lower = query.lower()
|
||||
|
||||
# Entity type detection
|
||||
entity_type = self._detect_entity_type(query_lower)
|
||||
|
||||
# Intent classification
|
||||
intent = self._classify_intent(query, query_lower)
|
||||
|
||||
# Entity extraction
|
||||
institutions = self._extract_institutions(query)
|
||||
persons = self._extract_persons(query)
|
||||
locations = self._extract_locations(query)
|
||||
|
||||
# Constraint detection (with word boundary matching for short indicators)
|
||||
has_temporal = (
|
||||
any(ind in query_lower for ind in self.TEMPORAL_INDICATORS) or
|
||||
self._has_word_boundary_match(query, self._temporal_short_patterns) or
|
||||
bool(self.YEAR_PATTERN.search(query)) # Year mention implies temporal
|
||||
)
|
||||
has_geographic = (
|
||||
any(ind in query_lower for ind in self.GEOGRAPHIC_INDICATORS) or
|
||||
self._has_word_boundary_match(query, self._geographic_short_patterns) or
|
||||
bool(locations)
|
||||
)
|
||||
requires_aggregation = any(ind in query_lower for ind in self.AGGREGATION_INDICATORS)
|
||||
|
||||
# Language detection
|
||||
language = self._detect_language(query)
|
||||
|
||||
# Confidence based on signal clarity
|
||||
confidence = self._compute_confidence(entity_type, intent, query_lower)
|
||||
|
||||
return QuerySignals(
|
||||
entity_type=entity_type,
|
||||
intent=intent,
|
||||
institution_mentions=institutions,
|
||||
person_mentions=persons,
|
||||
location_mentions=locations,
|
||||
language=language,
|
||||
has_temporal_constraint=has_temporal,
|
||||
has_geographic_constraint=has_geographic,
|
||||
requires_aggregation=requires_aggregation,
|
||||
confidence=confidence
|
||||
)
|
||||
|
||||
def _detect_entity_type(self, query_lower: str) -> str:
|
||||
"""Detect primary entity type in query."""
|
||||
person_score = sum(1 for p in self.PERSON_INDICATORS if p in query_lower)
|
||||
institution_score = sum(1 for p in self.INSTITUTION_INDICATORS if p in query_lower)
|
||||
|
||||
if person_score > 0 and institution_score > 0:
|
||||
return "mixed"
|
||||
elif person_score > institution_score:
|
||||
return "person"
|
||||
elif institution_score > 0:
|
||||
return "institution"
|
||||
else:
|
||||
return "institution" # Default
|
||||
|
||||
def _classify_intent(self, query: str, query_lower: str) -> str:
|
||||
"""Classify query intent."""
|
||||
# Quick rule-based classification
|
||||
if any(ind in query_lower for ind in self.AGGREGATION_INDICATORS):
|
||||
return "statistical"
|
||||
# Temporal: check long indicators, short indicators with word boundary, AND year patterns
|
||||
if (any(ind in query_lower for ind in self.TEMPORAL_INDICATORS) or
|
||||
self._has_word_boundary_match(query, self._temporal_short_patterns) or
|
||||
bool(self.YEAR_PATTERN.search(query))): # Year implies temporal intent
|
||||
return "temporal"
|
||||
if "vergelijk" in query_lower or "compare" in query_lower:
|
||||
return "comparative"
|
||||
if any(ind in query_lower for ind in ["wat is", "what is", "tell me about", "vertel"]):
|
||||
return "entity_lookup"
|
||||
# Geographic: check both long indicators and short with word boundary
|
||||
if (any(ind in query_lower for ind in self.GEOGRAPHIC_INDICATORS) or
|
||||
self._has_word_boundary_match(query, self._geographic_short_patterns)):
|
||||
return "geographic"
|
||||
|
||||
# Default based on question type
|
||||
if query_lower.startswith(("welke", "which", "wat", "what")):
|
||||
return "exploration"
|
||||
|
||||
return "exploration"
|
||||
|
||||
def _extract_institutions(self, query: str) -> list[str]:
|
||||
"""Extract institution mentions from query."""
|
||||
# Known institution patterns
|
||||
patterns = [
|
||||
r"(?:het\s+)?(\w+\s+(?:Museum|Archief|Bibliotheek|Galerie))",
|
||||
r"(Rijksmuseum|Nationaal Archief|KB|Koninklijke Bibliotheek)",
|
||||
r"(Noord-Hollands Archief|Stadsarchief Amsterdam|Gemeentearchief)",
|
||||
r"(\w+archief|\w+museum|\w+bibliotheek)",
|
||||
]
|
||||
|
||||
mentions = []
|
||||
for pattern in patterns:
|
||||
for match in re.finditer(pattern, query, re.IGNORECASE):
|
||||
mentions.append(match.group(1))
|
||||
|
||||
return list(set(mentions))
|
||||
|
||||
def _extract_persons(self, query: str) -> list[str]:
|
||||
"""Extract person mentions from query."""
|
||||
# Basic person name pattern (capitalized words with optional tussenvoegsel)
|
||||
pattern = r"\b([A-Z][a-z]+\s+(?:van\s+(?:de\s+)?|de\s+)?[A-Z][a-z]+)\b"
|
||||
matches = re.findall(pattern, query)
|
||||
return matches
|
||||
|
||||
def _extract_locations(self, query: str) -> list[str]:
|
||||
"""Extract location mentions from query."""
|
||||
mentions = []
|
||||
query_lower = query.lower()
|
||||
for loc in self.KNOWN_LOCATIONS:
|
||||
if loc.lower() in query_lower:
|
||||
mentions.append(loc)
|
||||
|
||||
return mentions
|
||||
|
||||
def _detect_language(self, query: str) -> str:
|
||||
"""Detect query language."""
|
||||
dutch_indicators = ["welke", "hoeveel", "waar", "wanneer", "wie", "het", "de", "zijn", "er"]
|
||||
english_indicators = ["which", "how many", "where", "when", "who", "the", "are", "there"]
|
||||
|
||||
query_lower = query.lower()
|
||||
dutch_score = sum(1 for w in dutch_indicators if w in query_lower)
|
||||
english_score = sum(1 for w in english_indicators if w in query_lower)
|
||||
|
||||
return "nl" if dutch_score >= english_score else "en"
|
||||
|
||||
def _compute_confidence(self, entity_type: str, intent: str, query_lower: str) -> float:
|
||||
"""Compute confidence in signal extraction."""
|
||||
confidence = 0.7 # Base
|
||||
|
||||
# Boost for clear entity type
|
||||
if entity_type != "mixed":
|
||||
confidence += 0.1
|
||||
|
||||
# Boost for clear intent indicators
|
||||
if any(ind in query_lower for ind in self.AGGREGATION_INDICATORS + self.TEMPORAL_INDICATORS):
|
||||
confidence += 0.1
|
||||
|
||||
# Boost for clear question structure
|
||||
if query_lower.startswith(("welke", "which", "hoeveel", "how many", "waar", "where")):
|
||||
confidence += 0.05
|
||||
|
||||
return min(confidence, 0.95)
|
||||
|
||||
|
||||
class SemanticDecisionRouter:
|
||||
"""
|
||||
Route queries to backends based on signals.
|
||||
"""
|
||||
|
||||
def route(self, signals: QuerySignals) -> RouteConfig:
|
||||
"""
|
||||
Determine routing based on signals.
|
||||
"""
|
||||
# Person queries → Qdrant persons collection
|
||||
if signals.entity_type == "person":
|
||||
config = RouteConfig(
|
||||
primary_backend="qdrant",
|
||||
secondary_backend="sparql",
|
||||
qdrant_collection="heritage_persons",
|
||||
)
|
||||
|
||||
# Add institution filter if mentioned
|
||||
if signals.institution_mentions:
|
||||
config.qdrant_filters["custodian_slug"] = self._to_slug(
|
||||
signals.institution_mentions[0]
|
||||
)
|
||||
|
||||
return config
|
||||
|
||||
# Statistical queries → DuckLake
|
||||
if signals.requires_aggregation:
|
||||
return RouteConfig(
|
||||
primary_backend="ducklake",
|
||||
secondary_backend="sparql",
|
||||
)
|
||||
|
||||
# Temporal queries → Temporal SPARQL templates
|
||||
if signals.has_temporal_constraint:
|
||||
return RouteConfig(
|
||||
primary_backend="sparql",
|
||||
secondary_backend="qdrant",
|
||||
use_temporal_templates=True,
|
||||
qdrant_collection="heritage_custodians",
|
||||
)
|
||||
|
||||
# Geographic queries → SPARQL with location filter
|
||||
if signals.has_geographic_constraint:
|
||||
return RouteConfig(
|
||||
primary_backend="sparql",
|
||||
secondary_backend="qdrant",
|
||||
qdrant_collection="heritage_custodians",
|
||||
)
|
||||
|
||||
# Default: hybrid SPARQL + Qdrant
|
||||
return RouteConfig(
|
||||
primary_backend="qdrant",
|
||||
secondary_backend="sparql",
|
||||
qdrant_collection="heritage_custodians",
|
||||
)
|
||||
|
||||
def _to_slug(self, institution_name: str) -> str:
|
||||
"""Convert institution name to slug format."""
|
||||
import unicodedata
|
||||
normalized = unicodedata.normalize('NFD', institution_name)
|
||||
ascii_name = ''.join(c for c in normalized if unicodedata.category(c) != 'Mn')
|
||||
slug = ascii_name.lower()
|
||||
slug = re.sub(r"[''`\",.:;!?()[\]{}]", '', slug)
|
||||
slug = re.sub(r'[\s_]+', '-', slug)
|
||||
slug = re.sub(r'-+', '-', slug).strip('-')
|
||||
return slug
|
||||
|
||||
|
||||
# Singleton instances
|
||||
_signal_extractor: Optional[SemanticSignalExtractor] = None
|
||||
_decision_router: Optional[SemanticDecisionRouter] = None
|
||||
|
||||
|
||||
def get_signal_extractor() -> SemanticSignalExtractor:
|
||||
"""Get or create singleton signal extractor instance."""
|
||||
global _signal_extractor
|
||||
if _signal_extractor is None:
|
||||
_signal_extractor = SemanticSignalExtractor()
|
||||
return _signal_extractor
|
||||
|
||||
|
||||
def get_decision_router() -> SemanticDecisionRouter:
|
||||
"""Get or create singleton decision router instance."""
|
||||
global _decision_router
|
||||
if _decision_router is None:
|
||||
_decision_router = SemanticDecisionRouter()
|
||||
return _decision_router
|
||||
311
backend/rag/temporal_intent.py
Normal file
311
backend/rag/temporal_intent.py
Normal file
|
|
@ -0,0 +1,311 @@
|
|||
"""
|
||||
Temporal Query Intent Extraction for Heritage RAG
|
||||
|
||||
Extracts temporal constraints from natural language queries to enable
|
||||
temporal SPARQL template selection and conflict resolution.
|
||||
|
||||
Based on: docs/plan/external_design_patterns/04_temporal_semantic_hypergraph.md
|
||||
"""
|
||||
|
||||
import dspy
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Optional, Literal
|
||||
from datetime import datetime
|
||||
import re
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class TemporalConstraint:
|
||||
"""Extracted temporal constraint from a query."""
|
||||
constraint_type: Literal[
|
||||
"point_in_time", # "in 1990", "on January 1, 2000"
|
||||
"before", # "before 2000", "vóór de fusie"
|
||||
"after", # "after 1995", "na de renovatie"
|
||||
"between", # "between 1990 and 2000"
|
||||
"oldest", # "oldest museum", "oudste archief"
|
||||
"newest", # "newest library", "nieuwste bibliotheek"
|
||||
"founding", # "when was X founded", "opgericht"
|
||||
"closure", # "when did X close", "gesloten"
|
||||
"change_event", # "merger", "split", "relocation"
|
||||
"timeline", # "history of", "geschiedenis van"
|
||||
"none" # No temporal constraint detected
|
||||
]
|
||||
|
||||
# Extracted dates (ISO format or year)
|
||||
date_start: Optional[str] = None
|
||||
date_end: Optional[str] = None
|
||||
|
||||
# For relative references
|
||||
reference_event: Optional[str] = None # e.g., "de fusie", "the merger"
|
||||
|
||||
# Confidence
|
||||
confidence: float = 0.8
|
||||
|
||||
# Recommended SPARQL template
|
||||
recommended_template: Optional[str] = None
|
||||
|
||||
|
||||
class TemporalConstraintExtractor:
|
||||
"""
|
||||
Fast extraction of temporal constraints without LLM.
|
||||
|
||||
Uses pattern matching for common temporal expressions.
|
||||
Falls back to LLM for complex/ambiguous cases.
|
||||
"""
|
||||
|
||||
# Year patterns
|
||||
YEAR_PATTERN = re.compile(r'\b(1[0-9]{3}|20[0-2][0-9])\b') # 1000-2029
|
||||
DATE_PATTERN = re.compile(
|
||||
r'\b(\d{1,2}[-/]\d{1,2}[-/]\d{2,4}|\d{4}[-/]\d{2}[-/]\d{2})\b'
|
||||
)
|
||||
|
||||
# Dutch temporal keywords
|
||||
BEFORE_KEYWORDS_NL = ["voor", "vóór", "voordat", "eerder dan"]
|
||||
AFTER_KEYWORDS_NL = ["na", "nadat", "later dan", "sinds"]
|
||||
BETWEEN_KEYWORDS_NL = ["tussen", "van", "tot"]
|
||||
OLDEST_KEYWORDS_NL = ["oudste", "eerste", "oorspronkelijke"]
|
||||
NEWEST_KEYWORDS_NL = ["nieuwste", "laatste", "meest recente"]
|
||||
|
||||
# English temporal keywords
|
||||
BEFORE_KEYWORDS_EN = ["before", "prior to", "earlier than"]
|
||||
AFTER_KEYWORDS_EN = ["after", "following", "since", "later than"]
|
||||
BETWEEN_KEYWORDS_EN = ["between", "from", "to"]
|
||||
OLDEST_KEYWORDS_EN = ["oldest", "first", "original", "earliest"]
|
||||
NEWEST_KEYWORDS_EN = ["newest", "latest", "most recent"]
|
||||
|
||||
# Event keywords
|
||||
FOUNDING_KEYWORDS = ["opgericht", "gesticht", "founded", "established", "created"]
|
||||
CLOSURE_KEYWORDS = ["gesloten", "opgeheven", "closed", "dissolved", "terminated"]
|
||||
MERGER_KEYWORDS = ["fusie", "samenvoeging", "merger", "merged", "combined"]
|
||||
TIMELINE_KEYWORDS = [
|
||||
"geschiedenis", "tijdlijn", "history", "timeline", "evolution",
|
||||
"door de jaren", "over time", "changes"
|
||||
]
|
||||
|
||||
# Template mapping
|
||||
TEMPLATE_MAP = {
|
||||
"point_in_time": "point_in_time_state",
|
||||
"before": "point_in_time_state",
|
||||
"after": "point_in_time_state",
|
||||
"between": "events_in_period",
|
||||
"oldest": "find_by_founding",
|
||||
"newest": "find_by_founding",
|
||||
"founding": "institution_timeline",
|
||||
"closure": "institution_timeline",
|
||||
"change_event": "events_in_period",
|
||||
"timeline": "institution_timeline",
|
||||
}
|
||||
|
||||
def extract(self, query: str) -> TemporalConstraint:
|
||||
"""
|
||||
Extract temporal constraint from query.
|
||||
|
||||
Fast operation using pattern matching.
|
||||
"""
|
||||
query_lower = query.lower()
|
||||
|
||||
# 1. Check for timeline/history queries
|
||||
if any(kw in query_lower for kw in self.TIMELINE_KEYWORDS):
|
||||
return TemporalConstraint(
|
||||
constraint_type="timeline",
|
||||
confidence=0.9,
|
||||
recommended_template="institution_timeline"
|
||||
)
|
||||
|
||||
# 2. Check for superlatives (oldest/newest)
|
||||
if any(kw in query_lower for kw in self.OLDEST_KEYWORDS_NL + self.OLDEST_KEYWORDS_EN):
|
||||
return TemporalConstraint(
|
||||
constraint_type="oldest",
|
||||
confidence=0.9,
|
||||
recommended_template="find_by_founding"
|
||||
)
|
||||
|
||||
if any(kw in query_lower for kw in self.NEWEST_KEYWORDS_NL + self.NEWEST_KEYWORDS_EN):
|
||||
return TemporalConstraint(
|
||||
constraint_type="newest",
|
||||
confidence=0.9,
|
||||
recommended_template="find_by_founding"
|
||||
)
|
||||
|
||||
# 3. Check for change event keywords
|
||||
if any(kw in query_lower for kw in self.MERGER_KEYWORDS):
|
||||
return TemporalConstraint(
|
||||
constraint_type="change_event",
|
||||
reference_event="merger",
|
||||
confidence=0.85,
|
||||
recommended_template="events_in_period"
|
||||
)
|
||||
|
||||
if any(kw in query_lower for kw in self.FOUNDING_KEYWORDS):
|
||||
return TemporalConstraint(
|
||||
constraint_type="founding",
|
||||
confidence=0.85,
|
||||
recommended_template="institution_timeline"
|
||||
)
|
||||
|
||||
if any(kw in query_lower for kw in self.CLOSURE_KEYWORDS):
|
||||
return TemporalConstraint(
|
||||
constraint_type="closure",
|
||||
confidence=0.85,
|
||||
recommended_template="institution_timeline"
|
||||
)
|
||||
|
||||
# 4. Extract years from query
|
||||
years = self.YEAR_PATTERN.findall(query)
|
||||
|
||||
if len(years) >= 2:
|
||||
# "between 1990 and 2000"
|
||||
years_sorted = sorted([int(y) for y in years])
|
||||
return TemporalConstraint(
|
||||
constraint_type="between",
|
||||
date_start=f"{years_sorted[0]}-01-01",
|
||||
date_end=f"{years_sorted[-1]}-12-31",
|
||||
confidence=0.85,
|
||||
recommended_template="events_in_period"
|
||||
)
|
||||
|
||||
if len(years) == 1:
|
||||
year = years[0]
|
||||
|
||||
# Check for before/after indicators with word boundary
|
||||
before_match = any(
|
||||
re.search(rf'\b{kw}\b', query_lower)
|
||||
for kw in self.BEFORE_KEYWORDS_NL + self.BEFORE_KEYWORDS_EN
|
||||
)
|
||||
after_match = any(
|
||||
re.search(rf'\b{kw}\b', query_lower)
|
||||
for kw in self.AFTER_KEYWORDS_NL + self.AFTER_KEYWORDS_EN
|
||||
)
|
||||
|
||||
if before_match:
|
||||
return TemporalConstraint(
|
||||
constraint_type="before",
|
||||
date_end=f"{year}-01-01",
|
||||
confidence=0.85,
|
||||
recommended_template="point_in_time_state"
|
||||
)
|
||||
|
||||
if after_match:
|
||||
return TemporalConstraint(
|
||||
constraint_type="after",
|
||||
date_start=f"{year}-12-31",
|
||||
confidence=0.85,
|
||||
recommended_template="point_in_time_state"
|
||||
)
|
||||
|
||||
# Default: point in time
|
||||
return TemporalConstraint(
|
||||
constraint_type="point_in_time",
|
||||
date_start=f"{year}-01-01",
|
||||
date_end=f"{year}-12-31",
|
||||
confidence=0.8,
|
||||
recommended_template="point_in_time_state"
|
||||
)
|
||||
|
||||
# 5. No clear temporal constraint
|
||||
return TemporalConstraint(
|
||||
constraint_type="none",
|
||||
confidence=0.7
|
||||
)
|
||||
|
||||
def get_template_for_constraint(
|
||||
self,
|
||||
constraint: TemporalConstraint
|
||||
) -> Optional[str]:
|
||||
"""Get recommended SPARQL template ID for temporal constraint."""
|
||||
return self.TEMPLATE_MAP.get(constraint.constraint_type)
|
||||
|
||||
|
||||
# DSPy Signature for complex temporal extraction
|
||||
class TemporalQueryIntent(dspy.Signature):
|
||||
"""
|
||||
Extract temporal constraints from a heritage institution query.
|
||||
|
||||
Use this for complex queries where pattern matching fails.
|
||||
"""
|
||||
query: str = dspy.InputField(desc="Natural language query about heritage institutions")
|
||||
language: str = dspy.InputField(desc="Query language: 'nl' or 'en'", default="nl")
|
||||
|
||||
constraint_type: str = dspy.OutputField(
|
||||
desc="Type of temporal constraint: point_in_time, before, after, between, "
|
||||
"oldest, newest, founding, closure, change_event, timeline, none"
|
||||
)
|
||||
date_start: str = dspy.OutputField(
|
||||
desc="Start date in ISO format (YYYY-MM-DD) or empty string if not applicable"
|
||||
)
|
||||
date_end: str = dspy.OutputField(
|
||||
desc="End date in ISO format (YYYY-MM-DD) or empty string if not applicable"
|
||||
)
|
||||
reference_event: str = dspy.OutputField(
|
||||
desc="Referenced event (e.g., 'fusie', 'merger') or empty string"
|
||||
)
|
||||
confidence: float = dspy.OutputField(
|
||||
desc="Confidence score 0.0-1.0"
|
||||
)
|
||||
|
||||
|
||||
class TemporalIntentExtractorModule(dspy.Module):
|
||||
"""
|
||||
DSPy module for temporal intent extraction.
|
||||
|
||||
Uses fast pattern matching first, falls back to LLM for complex cases.
|
||||
"""
|
||||
|
||||
def __init__(self, confidence_threshold: float = 0.75):
|
||||
super().__init__()
|
||||
self.fast_extractor = TemporalConstraintExtractor()
|
||||
self.llm_extractor = dspy.ChainOfThought(TemporalQueryIntent)
|
||||
self.confidence_threshold = confidence_threshold
|
||||
|
||||
def forward(self, query: str, language: str = "nl") -> TemporalConstraint:
|
||||
"""
|
||||
Extract temporal constraint from query.
|
||||
|
||||
Args:
|
||||
query: Natural language query
|
||||
language: Query language ('nl' or 'en')
|
||||
|
||||
Returns:
|
||||
TemporalConstraint with extracted information
|
||||
"""
|
||||
# Try fast extraction first
|
||||
constraint = self.fast_extractor.extract(query)
|
||||
|
||||
# If confidence is high enough, use fast result
|
||||
if constraint.confidence >= self.confidence_threshold:
|
||||
logger.debug(f"Fast temporal extraction: {constraint.constraint_type} (conf={constraint.confidence})")
|
||||
return constraint
|
||||
|
||||
# Fall back to LLM for low confidence cases
|
||||
logger.debug(f"LLM temporal extraction (fast conf={constraint.confidence})")
|
||||
|
||||
try:
|
||||
result = self.llm_extractor(query=query, language=language)
|
||||
|
||||
return TemporalConstraint(
|
||||
constraint_type=result.constraint_type or "none",
|
||||
date_start=result.date_start if result.date_start else None,
|
||||
date_end=result.date_end if result.date_end else None,
|
||||
reference_event=result.reference_event if result.reference_event else None,
|
||||
confidence=float(result.confidence) if result.confidence else 0.7,
|
||||
recommended_template=self.fast_extractor.TEMPLATE_MAP.get(result.constraint_type)
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(f"LLM temporal extraction failed: {e}")
|
||||
# Return fast extraction result as fallback
|
||||
return constraint
|
||||
|
||||
|
||||
# Singleton instance
|
||||
_temporal_extractor: Optional[TemporalConstraintExtractor] = None
|
||||
|
||||
|
||||
def get_temporal_extractor() -> TemporalConstraintExtractor:
|
||||
"""Get or create singleton temporal extractor instance."""
|
||||
global _temporal_extractor
|
||||
if _temporal_extractor is None:
|
||||
_temporal_extractor = TemporalConstraintExtractor()
|
||||
return _temporal_extractor
|
||||
258
backend/rag/temporal_resolver.py
Normal file
258
backend/rag/temporal_resolver.py
Normal file
|
|
@ -0,0 +1,258 @@
|
|||
"""
|
||||
Temporal Conflict Resolution for Heritage Data
|
||||
|
||||
Handles cases where multiple facts exist for the same property at overlapping times.
|
||||
Based on: docs/plan/external_design_patterns/04_temporal_semantic_hypergraph.md
|
||||
|
||||
Strategies:
|
||||
1. Temporal ordering: Use fact valid at query time
|
||||
2. Recency: Prefer more recent sources
|
||||
3. Authority: Prefer authoritative sources (Tier 1)
|
||||
4. Confidence: Use higher confidence facts
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Optional
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class TemporalFact:
|
||||
"""A fact with temporal validity."""
|
||||
property: str
|
||||
value: str
|
||||
valid_from: datetime
|
||||
valid_to: Optional[datetime]
|
||||
source: str
|
||||
confidence: float = 1.0
|
||||
ghcid: Optional[str] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class ConflictResolution:
|
||||
"""Result of conflict resolution."""
|
||||
property: str
|
||||
authoritative_value: str
|
||||
valid_for_date: datetime
|
||||
conflict_type: str
|
||||
explanation: str
|
||||
alternative_values: list[TemporalFact] = field(default_factory=list)
|
||||
|
||||
|
||||
class TemporalConflictResolver:
|
||||
"""
|
||||
Resolve conflicts between temporal facts.
|
||||
|
||||
Uses a multi-factor scoring system:
|
||||
- Source authority (Tier 1-4)
|
||||
- Confidence scores
|
||||
- Temporal recency
|
||||
"""
|
||||
|
||||
SOURCE_AUTHORITY = {
|
||||
"TIER_1_AUTHORITATIVE": 1.0,
|
||||
"TIER_2_VERIFIED": 0.8,
|
||||
"TIER_3_CROWD_SOURCED": 0.6,
|
||||
"TIER_4_INFERRED": 0.4,
|
||||
}
|
||||
|
||||
def resolve_conflicts(
|
||||
self,
|
||||
ghcid: str,
|
||||
facts: list[TemporalFact],
|
||||
query_date: Optional[datetime] = None
|
||||
) -> list[ConflictResolution]:
|
||||
"""
|
||||
Resolve all conflicts in a set of facts.
|
||||
|
||||
Args:
|
||||
ghcid: Institution identifier
|
||||
facts: All facts about the institution
|
||||
query_date: Point in time for resolution (default: now)
|
||||
|
||||
Returns:
|
||||
List of conflict resolutions with authoritative values
|
||||
"""
|
||||
if query_date is None:
|
||||
query_date = datetime.now()
|
||||
|
||||
# Group facts by property
|
||||
by_property: dict[str, list[TemporalFact]] = {}
|
||||
for fact in facts:
|
||||
by_property.setdefault(fact.property, []).append(fact)
|
||||
|
||||
resolutions = []
|
||||
|
||||
for prop, prop_facts in by_property.items():
|
||||
# Find facts valid at query_date
|
||||
valid_facts = [
|
||||
f for f in prop_facts
|
||||
if f.valid_from <= query_date and
|
||||
(f.valid_to is None or f.valid_to > query_date)
|
||||
]
|
||||
|
||||
if len(valid_facts) <= 1:
|
||||
# No conflict
|
||||
continue
|
||||
|
||||
# Multiple valid facts - resolve conflict
|
||||
resolution = self._resolve_property_conflict(
|
||||
prop, valid_facts, query_date
|
||||
)
|
||||
resolutions.append(resolution)
|
||||
|
||||
return resolutions
|
||||
|
||||
def get_authoritative_value(
|
||||
self,
|
||||
ghcid: str,
|
||||
property: str,
|
||||
facts: list[TemporalFact],
|
||||
query_date: Optional[datetime] = None
|
||||
) -> Optional[str]:
|
||||
"""
|
||||
Get the authoritative value for a single property.
|
||||
|
||||
Convenience method for single-property lookups.
|
||||
"""
|
||||
if query_date is None:
|
||||
query_date = datetime.now()
|
||||
|
||||
# Filter facts for this property
|
||||
prop_facts = [f for f in facts if f.property == property]
|
||||
|
||||
if not prop_facts:
|
||||
return None
|
||||
|
||||
# Find facts valid at query_date
|
||||
valid_facts = [
|
||||
f for f in prop_facts
|
||||
if f.valid_from <= query_date and
|
||||
(f.valid_to is None or f.valid_to > query_date)
|
||||
]
|
||||
|
||||
if not valid_facts:
|
||||
return None
|
||||
|
||||
if len(valid_facts) == 1:
|
||||
return valid_facts[0].value
|
||||
|
||||
# Resolve conflict
|
||||
resolution = self._resolve_property_conflict(property, valid_facts, query_date)
|
||||
return resolution.authoritative_value
|
||||
|
||||
def _resolve_property_conflict(
|
||||
self,
|
||||
property: str,
|
||||
facts: list[TemporalFact],
|
||||
query_date: datetime
|
||||
) -> ConflictResolution:
|
||||
"""
|
||||
Resolve conflict for a single property.
|
||||
"""
|
||||
# Score each fact
|
||||
scored = []
|
||||
for fact in facts:
|
||||
score = self._compute_authority_score(fact)
|
||||
scored.append((fact, score))
|
||||
|
||||
# Sort by score (descending)
|
||||
scored.sort(key=lambda x: x[1], reverse=True)
|
||||
|
||||
winner = scored[0][0]
|
||||
alternatives = [f for f, s in scored[1:]]
|
||||
|
||||
# Determine conflict type
|
||||
if all(f.value == winner.value for f in facts):
|
||||
conflict_type = "redundant" # Same value from multiple sources
|
||||
elif self._is_name_change(facts):
|
||||
conflict_type = "name_change"
|
||||
elif self._is_location_change(facts, property):
|
||||
conflict_type = "location_change"
|
||||
else:
|
||||
conflict_type = "data_inconsistency"
|
||||
|
||||
explanation = self._generate_explanation(
|
||||
property, winner, alternatives, conflict_type, query_date
|
||||
)
|
||||
|
||||
return ConflictResolution(
|
||||
property=property,
|
||||
authoritative_value=winner.value,
|
||||
valid_for_date=query_date,
|
||||
conflict_type=conflict_type,
|
||||
explanation=explanation,
|
||||
alternative_values=alternatives
|
||||
)
|
||||
|
||||
def _compute_authority_score(self, fact: TemporalFact) -> float:
|
||||
"""Compute authority score for a fact."""
|
||||
# Base authority from source tier
|
||||
authority = self.SOURCE_AUTHORITY.get(fact.source, 0.5)
|
||||
|
||||
# Boost for confidence
|
||||
authority *= fact.confidence
|
||||
|
||||
# Recency bonus (facts with recent valid_from get slight boost)
|
||||
days_old = (datetime.now() - fact.valid_from).days
|
||||
recency_factor = 1.0 / (1.0 + days_old / 365.0) # Decay over years
|
||||
authority *= (0.8 + 0.2 * recency_factor)
|
||||
|
||||
return authority
|
||||
|
||||
def _is_name_change(self, facts: list[TemporalFact]) -> bool:
|
||||
"""Check if conflict represents a name change."""
|
||||
# Name changes typically have non-overlapping validity
|
||||
facts_sorted = sorted(facts, key=lambda f: f.valid_from)
|
||||
for i in range(len(facts_sorted) - 1):
|
||||
if facts_sorted[i].valid_to == facts_sorted[i+1].valid_from:
|
||||
return True
|
||||
return False
|
||||
|
||||
def _is_location_change(self, facts: list[TemporalFact], property: str) -> bool:
|
||||
"""Check if conflict represents a location change."""
|
||||
return property in ["city", "address", "location", "settlementName", "subregionCode"]
|
||||
|
||||
def _generate_explanation(
|
||||
self,
|
||||
property: str,
|
||||
winner: TemporalFact,
|
||||
alternatives: list[TemporalFact],
|
||||
conflict_type: str,
|
||||
query_date: datetime
|
||||
) -> str:
|
||||
"""Generate human-readable explanation of resolution."""
|
||||
if conflict_type == "name_change":
|
||||
return (
|
||||
f"The institution name changed over time. "
|
||||
f"At {query_date.strftime('%Y-%m-%d')}, the authoritative name was '{winner.value}'. "
|
||||
f"Previous names: {', '.join(f.value for f in alternatives)}."
|
||||
)
|
||||
elif conflict_type == "location_change":
|
||||
return (
|
||||
f"The institution relocated. "
|
||||
f"At {query_date.strftime('%Y-%m-%d')}, it was located at '{winner.value}'."
|
||||
)
|
||||
elif conflict_type == "redundant":
|
||||
return f"Multiple sources confirm: {winner.value}"
|
||||
else:
|
||||
return (
|
||||
f"Data conflict for {property}. "
|
||||
f"Using '{winner.value}' from {winner.source} (confidence: {winner.confidence:.2f}). "
|
||||
f"Alternative values exist in other sources."
|
||||
)
|
||||
|
||||
|
||||
# Singleton instance
|
||||
_resolver: Optional[TemporalConflictResolver] = None
|
||||
|
||||
|
||||
def get_temporal_resolver() -> TemporalConflictResolver:
|
||||
"""Get or create singleton resolver instance."""
|
||||
global _resolver
|
||||
if _resolver is None:
|
||||
_resolver = TemporalConflictResolver()
|
||||
return _resolver
|
||||
493
backend/rag/test_semantic_routing.py
Normal file
493
backend/rag/test_semantic_routing.py
Normal file
|
|
@ -0,0 +1,493 @@
|
|||
"""
|
||||
Tests for Semantic Routing (Signal-Decision Pattern)
|
||||
|
||||
Tests the SemanticSignalExtractor and SemanticDecisionRouter classes
|
||||
which enable fast LLM-free query routing for high-confidence queries.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from .semantic_router import (
|
||||
QuerySignals,
|
||||
RouteConfig,
|
||||
SemanticSignalExtractor,
|
||||
SemanticDecisionRouter,
|
||||
get_signal_extractor,
|
||||
get_decision_router,
|
||||
)
|
||||
|
||||
|
||||
class TestSemanticSignalExtractor:
|
||||
"""Tests for SemanticSignalExtractor class."""
|
||||
|
||||
@pytest.fixture
|
||||
def extractor(self):
|
||||
return SemanticSignalExtractor()
|
||||
|
||||
# ===== Entity Type Detection =====
|
||||
|
||||
def test_detect_person_query(self, extractor):
|
||||
"""Person indicators should detect person entity type."""
|
||||
# Query with clear person indicator and no institution indicator
|
||||
signals = extractor.extract_signals("Wie werkt daar als medewerker?")
|
||||
assert signals.entity_type == "person"
|
||||
|
||||
def test_detect_person_query_with_institution_is_mixed(self, extractor):
|
||||
"""Person query mentioning institution should be mixed."""
|
||||
signals = extractor.extract_signals("Wie is de archivaris bij het Noord-Hollands Archief?")
|
||||
# "archief" is an institution indicator, so this is mixed
|
||||
assert signals.entity_type == "mixed"
|
||||
|
||||
def test_detect_person_query_with_organisatie_is_mixed(self, extractor):
|
||||
"""Person query with 'organisatie' should be mixed."""
|
||||
signals = extractor.extract_signals("Wie is de directeur van deze organisatie?")
|
||||
# "organisatie" is an institution indicator
|
||||
assert signals.entity_type == "mixed"
|
||||
|
||||
def test_detect_institution_query(self, extractor):
|
||||
"""Institution indicators should detect institution entity type."""
|
||||
signals = extractor.extract_signals("Welke musea zijn er in Amsterdam?")
|
||||
assert signals.entity_type == "institution"
|
||||
|
||||
def test_detect_mixed_query(self, extractor):
|
||||
"""Mixed indicators should detect mixed entity type."""
|
||||
signals = extractor.extract_signals("Welke curatoren werken bij musea in Utrecht?")
|
||||
assert signals.entity_type == "mixed"
|
||||
|
||||
def test_default_to_institution(self, extractor):
|
||||
"""Ambiguous queries should default to institution."""
|
||||
signals = extractor.extract_signals("Vertel me over cultureel erfgoed")
|
||||
assert signals.entity_type == "institution"
|
||||
|
||||
# ===== Intent Classification =====
|
||||
|
||||
def test_statistical_intent(self, extractor):
|
||||
"""Aggregation indicators should classify as statistical."""
|
||||
signals = extractor.extract_signals("Hoeveel archieven zijn er in Nederland?")
|
||||
assert signals.intent == "statistical"
|
||||
assert signals.requires_aggregation is True
|
||||
|
||||
def test_temporal_intent(self, extractor):
|
||||
"""Temporal indicators should classify as temporal."""
|
||||
signals = extractor.extract_signals("Wanneer is het Rijksmuseum opgericht?")
|
||||
assert signals.intent == "temporal"
|
||||
assert signals.has_temporal_constraint is True
|
||||
|
||||
def test_temporal_intent_with_oldest(self, extractor):
|
||||
"""Oldest/newest queries should be temporal."""
|
||||
signals = extractor.extract_signals("Wat is het oudste museum in Nederland?")
|
||||
assert signals.intent == "temporal"
|
||||
assert signals.has_temporal_constraint is True
|
||||
|
||||
def test_geographic_intent(self, extractor):
|
||||
"""Geographic indicators should classify as geographic."""
|
||||
# "waar" (where) is a geographic indicator
|
||||
signals = extractor.extract_signals("Waar staat dit museum?")
|
||||
assert signals.intent == "geographic"
|
||||
assert signals.has_geographic_constraint is True
|
||||
|
||||
def test_geographic_intent_with_location(self, extractor):
|
||||
"""Location mentions should trigger geographic constraint."""
|
||||
signals = extractor.extract_signals("Vertel me over musea in Amsterdam")
|
||||
assert signals.has_geographic_constraint is True
|
||||
|
||||
def test_temporal_indicator_substring_fixed(self, extractor):
|
||||
"""Verify fix: substring matching no longer causes false positives.
|
||||
|
||||
'nationaal' contains 'na' but should NOT trigger temporal (uses word boundaries).
|
||||
This tests that the fix for substring matching is working.
|
||||
"""
|
||||
signals = extractor.extract_signals("In welke stad ligt het Nationaal Archief?")
|
||||
# After fix: should NOT be temporal (no word-boundary match for "na")
|
||||
# "In" at start is a word boundary match for geographic indicator
|
||||
assert signals.intent == "geographic"
|
||||
assert signals.has_temporal_constraint is False
|
||||
|
||||
def test_entity_lookup_intent(self, extractor):
|
||||
"""Entity lookup indicators should classify correctly."""
|
||||
signals = extractor.extract_signals("Wat is het Rijksmuseum?")
|
||||
assert signals.intent == "entity_lookup"
|
||||
|
||||
def test_comparative_intent(self, extractor):
|
||||
"""Comparative queries should be classified correctly."""
|
||||
signals = extractor.extract_signals("Vergelijk het Rijksmuseum met het Van Gogh Museum")
|
||||
assert signals.intent == "comparative"
|
||||
|
||||
def test_exploration_default_intent(self, extractor):
|
||||
"""Default to exploration for open questions without clear indicators."""
|
||||
# Query without geographic, temporal, or aggregation indicators
|
||||
# Note: "in" is a geographic indicator, so avoid words containing it
|
||||
signals = extractor.extract_signals("Welke schilderijen vallen op?")
|
||||
assert signals.intent == "exploration"
|
||||
|
||||
def test_geographic_indicator_substring_fixed(self, extractor):
|
||||
"""Verify fix: 'in' no longer matches inside words.
|
||||
|
||||
'interessant' contains 'in' but should NOT trigger geographic.
|
||||
This tests that the word boundary fix is working.
|
||||
"""
|
||||
signals = extractor.extract_signals("Welke schilderijen zijn interessant?")
|
||||
# After fix: should be exploration, not geographic
|
||||
assert signals.intent == "exploration"
|
||||
assert signals.has_geographic_constraint is False
|
||||
|
||||
def test_word_boundary_in_works_correctly(self, extractor):
|
||||
"""Verify 'in' as standalone word DOES trigger geographic."""
|
||||
signals = extractor.extract_signals("Welke musea zijn er in Amsterdam?")
|
||||
# "in" as standalone word should trigger geographic
|
||||
assert signals.intent == "geographic"
|
||||
assert signals.has_geographic_constraint is True
|
||||
|
||||
def test_word_boundary_na_works_correctly(self, extractor):
|
||||
"""Verify 'na' as standalone word DOES trigger temporal."""
|
||||
# Dutch: "Na de fusie..." = "After the merger..."
|
||||
signals = extractor.extract_signals("Wat gebeurde er na de fusie met het archief?")
|
||||
# "na" as standalone word should trigger temporal
|
||||
assert signals.intent == "temporal"
|
||||
assert signals.has_temporal_constraint is True
|
||||
|
||||
# ===== Entity Extraction =====
|
||||
|
||||
def test_extract_institution_mention(self, extractor):
|
||||
"""Should extract institution names from query."""
|
||||
signals = extractor.extract_signals("Vertel me over het Noord-Hollands Archief")
|
||||
assert len(signals.institution_mentions) >= 1
|
||||
# Should find "Noord-Hollands Archief" or similar
|
||||
|
||||
def test_extract_location_mention(self, extractor):
|
||||
"""Should extract known Dutch locations."""
|
||||
signals = extractor.extract_signals("Welke musea zijn er in Amsterdam?")
|
||||
assert "Amsterdam" in signals.location_mentions
|
||||
assert signals.has_geographic_constraint is True
|
||||
|
||||
def test_extract_multiple_locations(self, extractor):
|
||||
"""Should extract multiple locations."""
|
||||
signals = extractor.extract_signals("Archieven in Utrecht en Haarlem")
|
||||
assert "Utrecht" in signals.location_mentions
|
||||
assert "Haarlem" in signals.location_mentions
|
||||
|
||||
# ===== Language Detection =====
|
||||
|
||||
def test_detect_dutch_language(self, extractor):
|
||||
"""Dutch queries should be detected."""
|
||||
signals = extractor.extract_signals("Hoeveel musea zijn er in Nederland?")
|
||||
assert signals.language == "nl"
|
||||
|
||||
def test_detect_english_language(self, extractor):
|
||||
"""English queries should be detected."""
|
||||
signals = extractor.extract_signals("How many museums are there in Amsterdam?")
|
||||
assert signals.language == "en"
|
||||
|
||||
# ===== Confidence Scoring =====
|
||||
|
||||
def test_high_confidence_clear_query(self, extractor):
|
||||
"""Clear queries should have high confidence."""
|
||||
signals = extractor.extract_signals("Hoeveel archieven zijn er in Noord-Holland?")
|
||||
assert signals.confidence >= 0.8
|
||||
|
||||
def test_moderate_confidence_ambiguous_query(self, extractor):
|
||||
"""Ambiguous queries should have moderate confidence."""
|
||||
signals = extractor.extract_signals("erfgoed informatie")
|
||||
assert signals.confidence < 0.9
|
||||
|
||||
def test_confidence_capped_at_095(self, extractor):
|
||||
"""Confidence should not exceed 0.95."""
|
||||
signals = extractor.extract_signals("Hoeveel musea zijn er in Amsterdam?")
|
||||
assert signals.confidence <= 0.95
|
||||
|
||||
|
||||
class TestSemanticDecisionRouter:
|
||||
"""Tests for SemanticDecisionRouter class."""
|
||||
|
||||
@pytest.fixture
|
||||
def router(self):
|
||||
return SemanticDecisionRouter()
|
||||
|
||||
def test_person_query_routes_to_qdrant_persons(self, router):
|
||||
"""Person queries should route to heritage_persons collection."""
|
||||
signals = QuerySignals(
|
||||
entity_type="person",
|
||||
intent="entity_lookup",
|
||||
institution_mentions=["Noord-Hollands Archief"],
|
||||
)
|
||||
config = router.route(signals)
|
||||
assert config.primary_backend == "qdrant"
|
||||
assert config.qdrant_collection == "heritage_persons"
|
||||
|
||||
def test_person_query_with_institution_filter(self, router):
|
||||
"""Person queries with institution should add filter."""
|
||||
signals = QuerySignals(
|
||||
entity_type="person",
|
||||
intent="entity_lookup",
|
||||
institution_mentions=["Noord-Hollands Archief"],
|
||||
)
|
||||
config = router.route(signals)
|
||||
assert "custodian_slug" in config.qdrant_filters
|
||||
assert "noord-hollands-archief" in config.qdrant_filters["custodian_slug"]
|
||||
|
||||
def test_statistical_query_routes_to_ducklake(self, router):
|
||||
"""Statistical queries should route to DuckLake."""
|
||||
signals = QuerySignals(
|
||||
entity_type="institution",
|
||||
intent="statistical",
|
||||
requires_aggregation=True,
|
||||
)
|
||||
config = router.route(signals)
|
||||
assert config.primary_backend == "ducklake"
|
||||
|
||||
def test_temporal_query_uses_temporal_templates(self, router):
|
||||
"""Temporal queries should enable temporal templates."""
|
||||
signals = QuerySignals(
|
||||
entity_type="institution",
|
||||
intent="temporal",
|
||||
has_temporal_constraint=True,
|
||||
)
|
||||
config = router.route(signals)
|
||||
assert config.primary_backend == "sparql"
|
||||
assert config.use_temporal_templates is True
|
||||
|
||||
def test_geographic_query_routes_to_sparql(self, router):
|
||||
"""Geographic queries should route to SPARQL."""
|
||||
signals = QuerySignals(
|
||||
entity_type="institution",
|
||||
intent="geographic",
|
||||
has_geographic_constraint=True,
|
||||
location_mentions=["Amsterdam"],
|
||||
)
|
||||
config = router.route(signals)
|
||||
assert config.primary_backend == "sparql"
|
||||
|
||||
def test_default_hybrid_routing(self, router):
|
||||
"""Default queries should use hybrid routing."""
|
||||
signals = QuerySignals(
|
||||
entity_type="institution",
|
||||
intent="exploration",
|
||||
)
|
||||
config = router.route(signals)
|
||||
assert config.primary_backend == "qdrant"
|
||||
assert config.secondary_backend == "sparql"
|
||||
|
||||
|
||||
class TestSlugGeneration:
|
||||
"""Tests for institution slug generation."""
|
||||
|
||||
@pytest.fixture
|
||||
def router(self):
|
||||
return SemanticDecisionRouter()
|
||||
|
||||
def test_simple_slug(self, router):
|
||||
"""Simple names should convert to lowercase hyphenated slug."""
|
||||
slug = router._to_slug("Rijksmuseum")
|
||||
assert slug == "rijksmuseum"
|
||||
|
||||
def test_slug_with_spaces(self, router):
|
||||
"""Spaces should be converted to hyphens."""
|
||||
slug = router._to_slug("Noord-Hollands Archief")
|
||||
assert slug == "noord-hollands-archief"
|
||||
|
||||
def test_slug_with_article(self, router):
|
||||
"""Dutch articles should be preserved in slug."""
|
||||
slug = router._to_slug("Het Utrechts Archief")
|
||||
assert slug == "het-utrechts-archief"
|
||||
|
||||
def test_slug_with_diacritics(self, router):
|
||||
"""Diacritics should be removed."""
|
||||
slug = router._to_slug("Musée d'Orsay")
|
||||
assert slug == "musee-dorsay"
|
||||
|
||||
|
||||
class TestSingletonInstances:
|
||||
"""Tests for singleton pattern."""
|
||||
|
||||
def test_signal_extractor_singleton(self):
|
||||
"""get_signal_extractor should return same instance."""
|
||||
ext1 = get_signal_extractor()
|
||||
ext2 = get_signal_extractor()
|
||||
assert ext1 is ext2
|
||||
|
||||
def test_decision_router_singleton(self):
|
||||
"""get_decision_router should return same instance."""
|
||||
router1 = get_decision_router()
|
||||
router2 = get_decision_router()
|
||||
assert router1 is router2
|
||||
|
||||
|
||||
class TestIntegration:
|
||||
"""Integration tests for full signal-decision flow."""
|
||||
|
||||
def test_full_person_query_flow(self):
|
||||
"""Test complete flow for person query."""
|
||||
extractor = get_signal_extractor()
|
||||
router = get_decision_router()
|
||||
|
||||
# Query with clear person indicator but also institution mention (mixed)
|
||||
signals = extractor.extract_signals(
|
||||
"Wie is de archivaris bij het Noord-Hollands Archief?"
|
||||
)
|
||||
config = router.route(signals)
|
||||
|
||||
# Mixed entity type because both person and institution indicators present
|
||||
assert signals.entity_type == "mixed"
|
||||
# Mixed queries route via default (qdrant hybrid)
|
||||
assert config.primary_backend in ["qdrant", "sparql"]
|
||||
|
||||
def test_full_pure_person_query_flow(self):
|
||||
"""Test complete flow for pure person query (no institution mention)."""
|
||||
extractor = get_signal_extractor()
|
||||
router = get_decision_router()
|
||||
|
||||
signals = extractor.extract_signals("Wie werkt daar als medewerker?")
|
||||
config = router.route(signals)
|
||||
|
||||
assert signals.entity_type == "person"
|
||||
assert config.primary_backend == "qdrant"
|
||||
assert config.qdrant_collection == "heritage_persons"
|
||||
|
||||
def test_full_statistical_query_flow(self):
|
||||
"""Test complete flow for statistical query."""
|
||||
extractor = get_signal_extractor()
|
||||
router = get_decision_router()
|
||||
|
||||
signals = extractor.extract_signals(
|
||||
"Hoeveel musea zijn er per provincie in Nederland?"
|
||||
)
|
||||
config = router.route(signals)
|
||||
|
||||
assert signals.intent == "statistical"
|
||||
assert signals.requires_aggregation is True
|
||||
assert config.primary_backend == "ducklake"
|
||||
|
||||
def test_full_temporal_query_flow(self):
|
||||
"""Test complete flow for temporal query."""
|
||||
extractor = get_signal_extractor()
|
||||
router = get_decision_router()
|
||||
|
||||
signals = extractor.extract_signals(
|
||||
"Wat is het oudste archief in Noord-Holland?"
|
||||
)
|
||||
config = router.route(signals)
|
||||
|
||||
assert signals.intent == "temporal"
|
||||
assert signals.has_temporal_constraint is True
|
||||
assert config.use_temporal_templates is True
|
||||
|
||||
def test_high_confidence_skip_llm_threshold(self):
|
||||
"""Verify high-confidence queries meet skip threshold."""
|
||||
extractor = get_signal_extractor()
|
||||
|
||||
# These queries should have confidence >= 0.8
|
||||
# Need clear indicators without ambiguity
|
||||
high_confidence_queries = [
|
||||
"Hoeveel archieven zijn er in Nederland?", # clear aggregation
|
||||
"Wanneer is het Nationaal Archief opgericht?", # clear temporal
|
||||
"Welke musea zijn er in Amsterdam?", # clear geographic + institution
|
||||
]
|
||||
|
||||
for query in high_confidence_queries:
|
||||
signals = extractor.extract_signals(query)
|
||||
assert signals.confidence >= 0.8, (
|
||||
f"Query '{query}' has confidence {signals.confidence}, expected >= 0.8"
|
||||
)
|
||||
|
||||
def test_moderate_confidence_for_mixed_queries(self):
|
||||
"""Mixed entity type queries should have lower confidence."""
|
||||
extractor = get_signal_extractor()
|
||||
|
||||
# Mixed queries are more ambiguous
|
||||
signals = extractor.extract_signals("Wie is de directeur van het Rijksmuseum?")
|
||||
# Mixed entity type (person + institution) reduces confidence
|
||||
assert signals.entity_type == "mixed"
|
||||
assert signals.confidence < 0.9 # Not as high as clear queries
|
||||
|
||||
|
||||
class TestYearPatternDetection:
|
||||
"""Tests for year-based temporal detection.
|
||||
|
||||
Year mentions (1000-2029) should trigger temporal intent,
|
||||
even when combined with geographic indicators like 'in'.
|
||||
"""
|
||||
|
||||
@pytest.fixture
|
||||
def extractor(self):
|
||||
return SemanticSignalExtractor()
|
||||
|
||||
def test_year_triggers_temporal_intent(self, extractor):
|
||||
"""A year mention should classify as temporal intent."""
|
||||
signals = extractor.extract_signals("Wat was de status van het Rijksmuseum in 1990?")
|
||||
# Year 1990 should trigger temporal, not "in" triggering geographic
|
||||
assert signals.intent == "temporal"
|
||||
assert signals.has_temporal_constraint is True
|
||||
|
||||
def test_year_1850_triggers_temporal(self, extractor):
|
||||
"""Historical year should trigger temporal."""
|
||||
signals = extractor.extract_signals("Welke musea bestonden in 1850?")
|
||||
assert signals.intent == "temporal"
|
||||
assert signals.has_temporal_constraint is True
|
||||
|
||||
def test_year_2020_with_aggregation_is_statistical(self, extractor):
|
||||
"""Aggregation query with year should be statistical with temporal constraint.
|
||||
|
||||
'Hoeveel' (how many) triggers aggregation → statistical intent.
|
||||
Year 2020 triggers temporal constraint.
|
||||
Result: statistical intent WITH temporal filter applied.
|
||||
"""
|
||||
signals = extractor.extract_signals("Hoeveel archieven waren er in 2020?")
|
||||
# "Hoeveel" overrides to statistical, but temporal constraint is detected
|
||||
assert signals.intent == "statistical"
|
||||
assert signals.requires_aggregation is True
|
||||
assert signals.has_temporal_constraint is True # Year still detected!
|
||||
|
||||
def test_year_2020_pure_temporal(self, extractor):
|
||||
"""Recent year without aggregation should be temporal."""
|
||||
signals = extractor.extract_signals("Welke archieven bestonden in 2020?")
|
||||
assert signals.intent == "temporal"
|
||||
assert signals.has_temporal_constraint is True
|
||||
|
||||
def test_geographic_without_year_stays_geographic(self, extractor):
|
||||
"""Geographic query without year should stay geographic."""
|
||||
signals = extractor.extract_signals("Welke musea zijn er in Amsterdam?")
|
||||
assert signals.intent == "geographic"
|
||||
assert signals.has_temporal_constraint is False
|
||||
|
||||
def test_year_overrides_geographic_in(self, extractor):
|
||||
"""Year should make query temporal even with 'in' for location."""
|
||||
signals = extractor.extract_signals("Welke musea waren er in Amsterdam in 1900?")
|
||||
# Year 1900 should override the geographic "in Amsterdam"
|
||||
assert signals.intent == "temporal"
|
||||
assert signals.has_temporal_constraint is True
|
||||
# Geographic constraint should still be detected
|
||||
assert signals.has_geographic_constraint is True
|
||||
|
||||
def test_year_in_english_query(self, extractor):
|
||||
"""Year detection should work in English queries too."""
|
||||
signals = extractor.extract_signals("What museums existed in 1920?")
|
||||
assert signals.intent == "temporal"
|
||||
assert signals.has_temporal_constraint is True
|
||||
|
||||
def test_year_range_boundary_1000(self, extractor):
|
||||
"""Year 1000 should be detected."""
|
||||
signals = extractor.extract_signals("Bestond dit klooster al in 1000?")
|
||||
assert signals.has_temporal_constraint is True
|
||||
|
||||
def test_year_range_boundary_2029(self, extractor):
|
||||
"""Year 2029 should be detected (future planning)."""
|
||||
signals = extractor.extract_signals("Wat zijn de plannen voor 2029?")
|
||||
assert signals.has_temporal_constraint is True
|
||||
|
||||
def test_non_year_number_ignored(self, extractor):
|
||||
"""Numbers that aren't years should not trigger temporal."""
|
||||
signals = extractor.extract_signals("Hoeveel van de 500 musea hebben een website?")
|
||||
# 500 is not a valid year (outside 1000-2029)
|
||||
# This is a statistical query
|
||||
assert signals.intent == "statistical"
|
||||
# has_temporal_constraint could be False (no year) but check intent
|
||||
|
||||
def test_year_combined_with_temporal_keyword(self, extractor):
|
||||
"""Year + temporal keyword should be high confidence temporal."""
|
||||
signals = extractor.extract_signals("Wanneer in 1945 werd het museum gesloten?")
|
||||
assert signals.intent == "temporal"
|
||||
assert signals.has_temporal_constraint is True
|
||||
# Combined signals should give high confidence
|
||||
assert signals.confidence >= 0.8
|
||||
|
||||
|
||||
# Run with: pytest backend/rag/test_semantic_routing.py -v
|
||||
527
backend/rag/test_temporal_intent.py
Normal file
527
backend/rag/test_temporal_intent.py
Normal file
|
|
@ -0,0 +1,527 @@
|
|||
"""
|
||||
Tests for Temporal Intent Extraction Module
|
||||
|
||||
Tests the TemporalConstraintExtractor and TemporalIntentExtractorModule classes
|
||||
which enable fast LLM-free extraction of temporal constraints from queries.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from .temporal_intent import (
|
||||
TemporalConstraint,
|
||||
TemporalConstraintExtractor,
|
||||
TemporalIntentExtractorModule,
|
||||
get_temporal_extractor,
|
||||
)
|
||||
|
||||
|
||||
class TestTemporalConstraintExtractor:
|
||||
"""Tests for TemporalConstraintExtractor class."""
|
||||
|
||||
@pytest.fixture
|
||||
def extractor(self):
|
||||
return TemporalConstraintExtractor()
|
||||
|
||||
# ===== Timeline/History Queries =====
|
||||
|
||||
def test_timeline_dutch_geschiedenis(self, extractor):
|
||||
"""Dutch 'geschiedenis' should trigger timeline constraint."""
|
||||
constraint = extractor.extract("Wat is de geschiedenis van het Rijksmuseum?")
|
||||
assert constraint.constraint_type == "timeline"
|
||||
assert constraint.recommended_template == "institution_timeline"
|
||||
assert constraint.confidence >= 0.9
|
||||
|
||||
def test_timeline_english_history(self, extractor):
|
||||
"""English 'history' should trigger timeline constraint."""
|
||||
constraint = extractor.extract("Tell me the history of the British Museum")
|
||||
assert constraint.constraint_type == "timeline"
|
||||
assert constraint.recommended_template == "institution_timeline"
|
||||
|
||||
def test_timeline_tijdlijn(self, extractor):
|
||||
"""Dutch 'tijdlijn' should trigger timeline constraint."""
|
||||
constraint = extractor.extract("Geef me een tijdlijn van het Noord-Hollands Archief")
|
||||
assert constraint.constraint_type == "timeline"
|
||||
|
||||
def test_timeline_evolution(self, extractor):
|
||||
"""English 'evolution' should trigger timeline constraint."""
|
||||
constraint = extractor.extract("What was the evolution of this archive?")
|
||||
assert constraint.constraint_type == "timeline"
|
||||
|
||||
# ===== Superlative Queries (Oldest/Newest) =====
|
||||
|
||||
def test_oldest_dutch_oudste(self, extractor):
|
||||
"""Dutch 'oudste' should trigger oldest constraint."""
|
||||
constraint = extractor.extract("Wat is het oudste museum in Nederland?")
|
||||
assert constraint.constraint_type == "oldest"
|
||||
assert constraint.recommended_template == "find_by_founding"
|
||||
assert constraint.confidence >= 0.9
|
||||
|
||||
def test_oldest_english(self, extractor):
|
||||
"""English 'oldest' should trigger oldest constraint."""
|
||||
constraint = extractor.extract("What is the oldest library in Amsterdam?")
|
||||
assert constraint.constraint_type == "oldest"
|
||||
|
||||
def test_oldest_eerste(self, extractor):
|
||||
"""Dutch 'eerste' (first) should trigger oldest constraint."""
|
||||
constraint = extractor.extract("Welke was de eerste openbare bibliotheek?")
|
||||
assert constraint.constraint_type == "oldest"
|
||||
|
||||
def test_oldest_earliest(self, extractor):
|
||||
"""English 'earliest' should trigger oldest constraint."""
|
||||
constraint = extractor.extract("What is the earliest archive in the region?")
|
||||
assert constraint.constraint_type == "oldest"
|
||||
|
||||
def test_newest_dutch_nieuwste(self, extractor):
|
||||
"""Dutch 'nieuwste' should trigger newest constraint."""
|
||||
constraint = extractor.extract("Wat is het nieuwste museum?")
|
||||
assert constraint.constraint_type == "newest"
|
||||
assert constraint.recommended_template == "find_by_founding"
|
||||
|
||||
def test_newest_english_latest(self, extractor):
|
||||
"""English 'latest' should trigger newest constraint."""
|
||||
constraint = extractor.extract("What is the latest museum to open?")
|
||||
assert constraint.constraint_type == "newest"
|
||||
|
||||
def test_newest_most_recent(self, extractor):
|
||||
"""English 'most recent' should trigger newest constraint."""
|
||||
constraint = extractor.extract("What is the most recent archive established?")
|
||||
assert constraint.constraint_type == "newest"
|
||||
|
||||
# ===== Change Event Keywords =====
|
||||
|
||||
def test_merger_dutch_fusie(self, extractor):
|
||||
"""Dutch 'fusie' should trigger change_event constraint."""
|
||||
constraint = extractor.extract("Wanneer was de fusie van het archief?")
|
||||
assert constraint.constraint_type == "change_event"
|
||||
assert constraint.reference_event == "merger"
|
||||
assert constraint.recommended_template == "events_in_period"
|
||||
|
||||
def test_merger_english(self, extractor):
|
||||
"""English 'merger' should trigger change_event constraint."""
|
||||
constraint = extractor.extract("When did the merger happen?")
|
||||
assert constraint.constraint_type == "change_event"
|
||||
assert constraint.reference_event == "merger"
|
||||
|
||||
def test_merger_merged(self, extractor):
|
||||
"""English 'merged' should trigger change_event constraint."""
|
||||
constraint = extractor.extract("Which archives merged in 2001?")
|
||||
assert constraint.constraint_type == "change_event"
|
||||
|
||||
def test_founding_dutch_opgericht(self, extractor):
|
||||
"""Dutch 'opgericht' should trigger founding constraint."""
|
||||
constraint = extractor.extract("Wanneer is het Rijksmuseum opgericht?")
|
||||
assert constraint.constraint_type == "founding"
|
||||
assert constraint.recommended_template == "institution_timeline"
|
||||
|
||||
def test_founding_english_founded(self, extractor):
|
||||
"""English 'founded' should trigger founding constraint."""
|
||||
constraint = extractor.extract("When was the library founded?")
|
||||
assert constraint.constraint_type == "founding"
|
||||
|
||||
def test_founding_established(self, extractor):
|
||||
"""English 'established' should trigger founding constraint."""
|
||||
constraint = extractor.extract("When was this archive established?")
|
||||
assert constraint.constraint_type == "founding"
|
||||
|
||||
def test_closure_dutch_gesloten(self, extractor):
|
||||
"""Dutch 'gesloten' should trigger closure constraint."""
|
||||
constraint = extractor.extract("Wanneer is het museum gesloten?")
|
||||
assert constraint.constraint_type == "closure"
|
||||
assert constraint.recommended_template == "institution_timeline"
|
||||
|
||||
def test_closure_english_closed(self, extractor):
|
||||
"""English 'closed' should trigger closure constraint."""
|
||||
# Note: "close" (verb form) vs "closed" (past participle)
|
||||
# The extractor only has "closed" in CLOSURE_KEYWORDS
|
||||
constraint = extractor.extract("When was the archive closed?")
|
||||
assert constraint.constraint_type == "closure"
|
||||
|
||||
def test_closure_dissolved(self, extractor):
|
||||
"""English 'dissolved' should trigger closure constraint."""
|
||||
constraint = extractor.extract("When was the organization dissolved?")
|
||||
assert constraint.constraint_type == "closure"
|
||||
|
||||
# ===== Year Extraction =====
|
||||
|
||||
def test_single_year_point_in_time(self, extractor):
|
||||
"""Single year should trigger point_in_time constraint."""
|
||||
constraint = extractor.extract("Wat was de status van het museum in 1990?")
|
||||
assert constraint.constraint_type == "point_in_time"
|
||||
assert constraint.date_start == "1990-01-01"
|
||||
assert constraint.date_end == "1990-12-31"
|
||||
assert constraint.recommended_template == "point_in_time_state"
|
||||
|
||||
def test_two_years_between(self, extractor):
|
||||
"""Two years should trigger between constraint."""
|
||||
constraint = extractor.extract("Welke veranderingen waren er tussen 1990 en 2000?")
|
||||
assert constraint.constraint_type == "between"
|
||||
assert constraint.date_start == "1990-01-01"
|
||||
assert constraint.date_end == "2000-12-31"
|
||||
assert constraint.recommended_template == "events_in_period"
|
||||
|
||||
def test_three_years_uses_first_and_last(self, extractor):
|
||||
"""Three years should use first and last for range."""
|
||||
constraint = extractor.extract("Musea in 1950, 1975 en 2000")
|
||||
assert constraint.constraint_type == "between"
|
||||
assert constraint.date_start == "1950-01-01"
|
||||
assert constraint.date_end == "2000-12-31"
|
||||
|
||||
def test_year_with_before_dutch(self, extractor):
|
||||
"""Year with Dutch 'voor' should trigger before constraint."""
|
||||
constraint = extractor.extract("Welke archieven bestonden voor 1950?")
|
||||
assert constraint.constraint_type == "before"
|
||||
assert constraint.date_end == "1950-01-01"
|
||||
assert constraint.recommended_template == "point_in_time_state"
|
||||
|
||||
def test_year_with_before_english(self, extractor):
|
||||
"""Year with English 'before' should trigger before constraint."""
|
||||
constraint = extractor.extract("Which museums existed before 1900?")
|
||||
assert constraint.constraint_type == "before"
|
||||
assert constraint.date_end == "1900-01-01"
|
||||
|
||||
def test_year_with_after_dutch(self, extractor):
|
||||
"""Year with Dutch 'na' should trigger after constraint.
|
||||
|
||||
Note: More specific keywords (like 'opgericht') take precedence.
|
||||
We use a neutral query without founding/closure keywords.
|
||||
"""
|
||||
constraint = extractor.extract("Welke veranderingen waren er na 1980?")
|
||||
assert constraint.constraint_type == "after"
|
||||
assert constraint.date_start == "1980-12-31"
|
||||
assert constraint.recommended_template == "point_in_time_state"
|
||||
|
||||
def test_year_with_after_english(self, extractor):
|
||||
"""Year with English 'after' should trigger after constraint."""
|
||||
constraint = extractor.extract("What happened after 2010?")
|
||||
assert constraint.constraint_type == "after"
|
||||
assert constraint.date_start == "2010-12-31"
|
||||
|
||||
def test_year_with_since(self, extractor):
|
||||
"""'Since' should trigger after constraint."""
|
||||
constraint = extractor.extract("Museums opened since 2000")
|
||||
assert constraint.constraint_type == "after"
|
||||
assert constraint.date_start == "2000-12-31"
|
||||
|
||||
# ===== Year Extraction Edge Cases =====
|
||||
|
||||
def test_year_1800s(self, extractor):
|
||||
"""Should extract years from 1800s."""
|
||||
constraint = extractor.extract("Archieven uit 1856")
|
||||
assert constraint.constraint_type == "point_in_time"
|
||||
assert "1856" in constraint.date_start
|
||||
|
||||
def test_year_2020s(self, extractor):
|
||||
"""Should extract years from 2020s."""
|
||||
constraint = extractor.extract("Nieuwe musea in 2023")
|
||||
assert constraint.constraint_type == "point_in_time"
|
||||
assert "2023" in constraint.date_start
|
||||
|
||||
def test_ignore_numbers_that_are_not_years(self, extractor):
|
||||
"""Should not extract non-year numbers as years."""
|
||||
# Numbers like 500 or 50 should not be treated as years
|
||||
constraint = extractor.extract("Het museum heeft 500 werken in de collectie")
|
||||
assert constraint.constraint_type == "none"
|
||||
|
||||
# ===== No Temporal Constraint =====
|
||||
|
||||
def test_no_constraint_simple_query(self, extractor):
|
||||
"""Query without temporal indicators should return none."""
|
||||
constraint = extractor.extract("Welke musea zijn er in Amsterdam?")
|
||||
assert constraint.constraint_type == "none"
|
||||
assert constraint.recommended_template is None
|
||||
|
||||
def test_no_constraint_descriptive_query(self, extractor):
|
||||
"""Descriptive query should return none."""
|
||||
constraint = extractor.extract("Vertel me over de collectie van het Rijksmuseum")
|
||||
assert constraint.constraint_type == "none"
|
||||
|
||||
# ===== Word Boundary Matching =====
|
||||
|
||||
def test_na_in_nationaal_not_matched(self, extractor):
|
||||
"""'na' inside 'nationaal' should NOT trigger after constraint."""
|
||||
constraint = extractor.extract("Nationaal Archief in Den Haag")
|
||||
# 'nationaal' contains 'na' but it's not a word boundary
|
||||
assert constraint.constraint_type == "none"
|
||||
|
||||
def test_na_as_word_is_matched(self, extractor):
|
||||
"""'na' as standalone word SHOULD trigger after constraint."""
|
||||
constraint = extractor.extract("Na de renovatie in 1995 werd het museum heropend")
|
||||
assert constraint.constraint_type == "after"
|
||||
assert "1995" in constraint.date_start
|
||||
|
||||
def test_voor_in_voorwerpen_not_matched(self, extractor):
|
||||
"""'voor' inside 'voorwerpen' should NOT trigger before."""
|
||||
constraint = extractor.extract("De collectie bevat voorwerpen uit de 18e eeuw")
|
||||
# No explicit year, so should be none
|
||||
assert constraint.constraint_type == "none"
|
||||
|
||||
def test_voor_as_word_is_matched(self, extractor):
|
||||
"""'voor' as standalone word SHOULD trigger before constraint."""
|
||||
constraint = extractor.extract("Archieven van voor 1900")
|
||||
assert constraint.constraint_type == "before"
|
||||
assert "1900" in constraint.date_end
|
||||
|
||||
# ===== Template Mapping =====
|
||||
|
||||
def test_template_mapping_point_in_time(self, extractor):
|
||||
"""point_in_time should map to point_in_time_state template."""
|
||||
constraint = extractor.extract("Status in 1990")
|
||||
template = extractor.get_template_for_constraint(constraint)
|
||||
assert template == "point_in_time_state"
|
||||
|
||||
def test_template_mapping_between(self, extractor):
|
||||
"""between should map to events_in_period template."""
|
||||
constraint = extractor.extract("Veranderingen tussen 1990 en 2000")
|
||||
template = extractor.get_template_for_constraint(constraint)
|
||||
assert template == "events_in_period"
|
||||
|
||||
def test_template_mapping_oldest(self, extractor):
|
||||
"""oldest should map to find_by_founding template."""
|
||||
constraint = extractor.extract("Het oudste museum")
|
||||
template = extractor.get_template_for_constraint(constraint)
|
||||
assert template == "find_by_founding"
|
||||
|
||||
def test_template_mapping_timeline(self, extractor):
|
||||
"""timeline should map to institution_timeline template."""
|
||||
constraint = extractor.extract("Geschiedenis van het archief")
|
||||
template = extractor.get_template_for_constraint(constraint)
|
||||
assert template == "institution_timeline"
|
||||
|
||||
def test_template_mapping_none(self, extractor):
|
||||
"""none constraint should return None template."""
|
||||
constraint = extractor.extract("Welke musea zijn er?")
|
||||
template = extractor.get_template_for_constraint(constraint)
|
||||
assert template is None
|
||||
|
||||
# ===== Confidence Scoring =====
|
||||
|
||||
def test_high_confidence_timeline(self, extractor):
|
||||
"""Timeline queries should have high confidence."""
|
||||
constraint = extractor.extract("Geschiedenis van het Rijksmuseum")
|
||||
assert constraint.confidence >= 0.9
|
||||
|
||||
def test_high_confidence_superlative(self, extractor):
|
||||
"""Superlative queries should have high confidence."""
|
||||
constraint = extractor.extract("Het oudste archief")
|
||||
assert constraint.confidence >= 0.9
|
||||
|
||||
def test_moderate_confidence_year_only(self, extractor):
|
||||
"""Year-only queries should have moderate confidence."""
|
||||
constraint = extractor.extract("Musea in 1990")
|
||||
assert 0.7 <= constraint.confidence <= 0.9
|
||||
|
||||
def test_lower_confidence_no_constraint(self, extractor):
|
||||
"""No-constraint queries should have lower confidence."""
|
||||
constraint = extractor.extract("Algemene informatie over erfgoed")
|
||||
assert constraint.confidence <= 0.75
|
||||
|
||||
|
||||
class TestTemporalConstraintDataclass:
|
||||
"""Tests for TemporalConstraint dataclass."""
|
||||
|
||||
def test_default_values(self):
|
||||
"""Test default values of TemporalConstraint."""
|
||||
constraint = TemporalConstraint(constraint_type="none")
|
||||
assert constraint.date_start is None
|
||||
assert constraint.date_end is None
|
||||
assert constraint.reference_event is None
|
||||
assert constraint.confidence == 0.8
|
||||
assert constraint.recommended_template is None
|
||||
|
||||
def test_full_constraint(self):
|
||||
"""Test TemporalConstraint with all fields."""
|
||||
constraint = TemporalConstraint(
|
||||
constraint_type="between",
|
||||
date_start="1990-01-01",
|
||||
date_end="2000-12-31",
|
||||
reference_event=None,
|
||||
confidence=0.95,
|
||||
recommended_template="events_in_period"
|
||||
)
|
||||
assert constraint.constraint_type == "between"
|
||||
assert constraint.date_start == "1990-01-01"
|
||||
assert constraint.date_end == "2000-12-31"
|
||||
assert constraint.confidence == 0.95
|
||||
assert constraint.recommended_template == "events_in_period"
|
||||
|
||||
|
||||
class TestTemporalIntentExtractorModule:
|
||||
"""Tests for the DSPy module (without actual LLM calls)."""
|
||||
|
||||
def test_module_initialization(self):
|
||||
"""Test module initializes correctly."""
|
||||
module = TemporalIntentExtractorModule(confidence_threshold=0.75)
|
||||
assert module.confidence_threshold == 0.75
|
||||
assert module.fast_extractor is not None
|
||||
|
||||
def test_high_confidence_uses_fast_extraction(self):
|
||||
"""High confidence queries should use fast extraction, not LLM."""
|
||||
module = TemporalIntentExtractorModule(confidence_threshold=0.75)
|
||||
|
||||
# This query has high confidence (timeline keyword)
|
||||
constraint = module.forward("Geschiedenis van het Rijksmuseum")
|
||||
|
||||
# Should use fast extraction result
|
||||
assert constraint.constraint_type == "timeline"
|
||||
assert constraint.confidence >= 0.75
|
||||
|
||||
|
||||
class TestSingletonInstance:
|
||||
"""Tests for singleton pattern."""
|
||||
|
||||
def test_get_temporal_extractor_singleton(self):
|
||||
"""get_temporal_extractor should return same instance."""
|
||||
ext1 = get_temporal_extractor()
|
||||
ext2 = get_temporal_extractor()
|
||||
assert ext1 is ext2
|
||||
|
||||
def test_singleton_is_temporal_constraint_extractor(self):
|
||||
"""Singleton should be TemporalConstraintExtractor instance."""
|
||||
ext = get_temporal_extractor()
|
||||
assert isinstance(ext, TemporalConstraintExtractor)
|
||||
|
||||
|
||||
class TestIntegration:
|
||||
"""Integration tests for full temporal extraction flow."""
|
||||
|
||||
def test_dutch_point_in_time_full_flow(self):
|
||||
"""Test complete flow for Dutch point-in-time query."""
|
||||
extractor = get_temporal_extractor()
|
||||
|
||||
constraint = extractor.extract(
|
||||
"Wat was de status van het Rijksmuseum in 1990?"
|
||||
)
|
||||
|
||||
assert constraint.constraint_type == "point_in_time"
|
||||
assert constraint.date_start == "1990-01-01"
|
||||
assert constraint.date_end == "1990-12-31"
|
||||
assert constraint.recommended_template == "point_in_time_state"
|
||||
|
||||
def test_english_timeline_full_flow(self):
|
||||
"""Test complete flow for English timeline query."""
|
||||
extractor = get_temporal_extractor()
|
||||
|
||||
constraint = extractor.extract(
|
||||
"What is the history of the British Museum?"
|
||||
)
|
||||
|
||||
assert constraint.constraint_type == "timeline"
|
||||
assert constraint.recommended_template == "institution_timeline"
|
||||
|
||||
def test_date_range_full_flow(self):
|
||||
"""Test complete flow for date range query."""
|
||||
extractor = get_temporal_extractor()
|
||||
|
||||
constraint = extractor.extract(
|
||||
"Welke fusies vonden plaats tussen 1990 en 2010?"
|
||||
)
|
||||
|
||||
# Should detect "fusie" (merger) keyword first
|
||||
# But since there are two years, it should be change_event or between
|
||||
# Merger keywords take precedence
|
||||
assert constraint.constraint_type == "change_event"
|
||||
assert constraint.reference_event == "merger"
|
||||
|
||||
def test_superlative_with_location(self):
|
||||
"""Test superlative query with location."""
|
||||
extractor = get_temporal_extractor()
|
||||
|
||||
constraint = extractor.extract(
|
||||
"Wat is het oudste archief in Noord-Holland?"
|
||||
)
|
||||
|
||||
assert constraint.constraint_type == "oldest"
|
||||
assert constraint.recommended_template == "find_by_founding"
|
||||
|
||||
def test_complex_query_multiple_indicators(self):
|
||||
"""Test query with multiple temporal indicators."""
|
||||
extractor = get_temporal_extractor()
|
||||
|
||||
# "geschiedenis" (timeline) and "oudste" (oldest) - timeline wins (checked first)
|
||||
constraint = extractor.extract(
|
||||
"Vertel me de geschiedenis van de oudste bibliotheek"
|
||||
)
|
||||
|
||||
assert constraint.constraint_type == "timeline"
|
||||
|
||||
def test_query_templates_for_sparql(self):
|
||||
"""Test that all temporal constraints map to valid templates."""
|
||||
extractor = get_temporal_extractor()
|
||||
|
||||
test_cases = [
|
||||
("Geschiedenis van het archief", "institution_timeline"),
|
||||
("Het oudste museum", "find_by_founding"),
|
||||
("Het nieuwste archief", "find_by_founding"),
|
||||
("Status in 1990", "point_in_time_state"),
|
||||
("Voor 1950", "point_in_time_state"), # Year + before
|
||||
("Na 2000", "point_in_time_state"), # Year + after
|
||||
("Fusies in de regio", "events_in_period"),
|
||||
("Wanneer opgericht", "institution_timeline"),
|
||||
("Wanneer gesloten", "institution_timeline"),
|
||||
]
|
||||
|
||||
for query, expected_template in test_cases:
|
||||
constraint = extractor.extract(query)
|
||||
# Some queries may not extract years, check if template matches expectation
|
||||
if constraint.constraint_type != "none":
|
||||
assert constraint.recommended_template == expected_template, (
|
||||
f"Query '{query}' expected template '{expected_template}', "
|
||||
f"got '{constraint.recommended_template}' "
|
||||
f"(constraint_type: {constraint.constraint_type})"
|
||||
)
|
||||
|
||||
|
||||
class TestRealWorldQueries:
|
||||
"""Tests with real-world heritage queries."""
|
||||
|
||||
@pytest.fixture
|
||||
def extractor(self):
|
||||
return get_temporal_extractor()
|
||||
|
||||
def test_noord_hollands_archief_history(self, extractor):
|
||||
"""Real query about Noord-Hollands Archief history."""
|
||||
constraint = extractor.extract(
|
||||
"Wat is de geschiedenis van het Noord-Hollands Archief sinds de fusie in 2001?"
|
||||
)
|
||||
# "geschiedenis" (timeline) is checked before merger/year
|
||||
assert constraint.constraint_type == "timeline"
|
||||
|
||||
def test_museum_founding_date(self, extractor):
|
||||
"""Real query about museum founding."""
|
||||
constraint = extractor.extract(
|
||||
"Wanneer is het Rijksmuseum in Amsterdam opgericht?"
|
||||
)
|
||||
assert constraint.constraint_type == "founding"
|
||||
|
||||
def test_archives_before_ww2(self, extractor):
|
||||
"""Query about archives before WWII."""
|
||||
constraint = extractor.extract(
|
||||
"Welke gemeentearchieven bestonden voor 1940?"
|
||||
)
|
||||
assert constraint.constraint_type == "before"
|
||||
assert "1940" in constraint.date_end
|
||||
|
||||
def test_oldest_university_library(self, extractor):
|
||||
"""Query about oldest university library."""
|
||||
constraint = extractor.extract(
|
||||
"Wat is de oudste universiteitsbibliotheek van Nederland?"
|
||||
)
|
||||
assert constraint.constraint_type == "oldest"
|
||||
|
||||
def test_museum_closures_pandemic(self, extractor):
|
||||
"""Query about closures during pandemic."""
|
||||
constraint = extractor.extract(
|
||||
"Welke musea zijn gesloten tijdens de pandemie in 2020?"
|
||||
)
|
||||
# "gesloten" (closure) keyword
|
||||
assert constraint.constraint_type == "closure"
|
||||
|
||||
def test_digital_archives_recent(self, extractor):
|
||||
"""Query about recent digital archives."""
|
||||
constraint = extractor.extract(
|
||||
"Welke digitale archieven zijn na 2015 gelanceerd?"
|
||||
)
|
||||
assert constraint.constraint_type == "after"
|
||||
assert "2015" in constraint.date_start
|
||||
|
||||
|
||||
# Run with: pytest backend/rag/test_temporal_intent.py -v
|
||||
|
|
@ -1038,6 +1038,400 @@ templates:
|
|||
- question: "Which museums spend less than 1000 on innovation?"
|
||||
slots: {budget_category: "innovation", amount: 1000, comparison: "<", institution_type: "M"}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# TEMPORAL QUERY TEMPLATES
|
||||
# ---------------------------------------------------------------------------
|
||||
# These templates handle time-based queries about heritage institutions:
|
||||
# - Historical state at point in time
|
||||
# - Institution timelines and history
|
||||
# - Organizational change events (mergers, closures, foundings)
|
||||
# - Finding oldest/newest institutions
|
||||
#
|
||||
# Reference: docs/plan/external_design_patterns/04_temporal_semantic_hypergraph.md
|
||||
|
||||
# Template: Point-in-time institution state
|
||||
point_in_time_state:
|
||||
id: "point_in_time_state"
|
||||
description: "Get institution state at a specific point in time"
|
||||
intent: ["temporal", "entity_lookup"]
|
||||
|
||||
question_patterns:
|
||||
# Dutch
|
||||
- "Wat was de status van {institution_name} in {year}?"
|
||||
- "Hoe zag {institution_name} eruit in {year}?"
|
||||
- "Bestond {institution_name} al in {year}?"
|
||||
- "Wie beheerde {institution_name} in {year}?"
|
||||
- "{institution_name} in {year}"
|
||||
# English
|
||||
- "What was the status of {institution_name} in {year}?"
|
||||
- "How was {institution_name} structured in {year}?"
|
||||
- "Did {institution_name} exist in {year}?"
|
||||
- "State of {institution_name} before {event}?"
|
||||
- "{institution_name} in {year}"
|
||||
|
||||
slots:
|
||||
institution_name:
|
||||
type: string
|
||||
required: false
|
||||
description: "Institution name for lookup (alternative to ghcid)"
|
||||
ghcid:
|
||||
type: string
|
||||
required: false
|
||||
description: "Global Heritage Custodian Identifier"
|
||||
query_date:
|
||||
type: date
|
||||
required: true
|
||||
description: "Point in time to query (ISO format or year)"
|
||||
|
||||
sparql_template: |
|
||||
{{ prefixes }}
|
||||
|
||||
SELECT ?ghcid ?name ?type ?city ?validFrom ?validTo WHERE {
|
||||
?s a crm:E39_Actor ;
|
||||
hc:ghcid ?ghcid ;
|
||||
skos:prefLabel ?name ;
|
||||
hc:institutionType ?type .
|
||||
OPTIONAL { ?s hc:validFrom ?validFrom }
|
||||
OPTIONAL { ?s schema:addressLocality ?city }
|
||||
OPTIONAL { ?s hc:validTo ?validTo }
|
||||
|
||||
{% if ghcid %}
|
||||
FILTER(?ghcid = "{{ ghcid }}")
|
||||
{% elif institution_name %}
|
||||
FILTER(CONTAINS(LCASE(?name), "{{ institution_name | lower }}"))
|
||||
{% endif %}
|
||||
|
||||
# Temporal filter: valid at query_date
|
||||
FILTER(!BOUND(?validFrom) || ?validFrom <= "{{ query_date }}"^^xsd:date)
|
||||
FILTER(!BOUND(?validTo) || ?validTo > "{{ query_date }}"^^xsd:date)
|
||||
}
|
||||
ORDER BY ?validFrom
|
||||
{% if limit %}LIMIT {{ limit }}{% else %}LIMIT 10{% endif %}
|
||||
|
||||
examples:
|
||||
- question: "Wat was de status van Rijksmuseum in 1990?"
|
||||
slots: {institution_name: "Rijksmuseum", query_date: "1990-01-01"}
|
||||
- question: "How was Noord-Hollands Archief structured in 1995?"
|
||||
slots: {institution_name: "Noord-Hollands Archief", query_date: "1995-01-01"}
|
||||
|
||||
# Template: Institution timeline/history
|
||||
institution_timeline:
|
||||
id: "institution_timeline"
|
||||
description: "Get complete history and timeline of changes for an institution"
|
||||
intent: ["temporal", "entity_lookup"]
|
||||
|
||||
question_patterns:
|
||||
# Dutch
|
||||
- "Geschiedenis van {institution_name}"
|
||||
- "Wat is de geschiedenis van {institution_name}?"
|
||||
- "Tijdlijn van {institution_name}"
|
||||
- "Wat is er gebeurd met {institution_name}?"
|
||||
- "Vertel me over de geschiedenis van {institution_name}"
|
||||
- "Hoe is {institution_name} veranderd door de jaren?"
|
||||
# English
|
||||
- "History of {institution_name}"
|
||||
- "Timeline of {institution_name}"
|
||||
- "Timeline of changes for {institution_name}"
|
||||
- "What happened to {institution_name}?"
|
||||
- "Tell me about the history of {institution_name}"
|
||||
- "How has {institution_name} changed over the years?"
|
||||
|
||||
slots:
|
||||
institution_name:
|
||||
type: string
|
||||
required: false
|
||||
ghcid:
|
||||
type: string
|
||||
required: false
|
||||
|
||||
sparql_template: |
|
||||
{{ prefixes }}
|
||||
|
||||
SELECT ?ghcid ?name ?validFrom ?validTo ?changeType ?changeReason ?description WHERE {
|
||||
?entry hc:ghcid ?ghcid ;
|
||||
skos:prefLabel ?name .
|
||||
OPTIONAL { ?entry hc:validFrom ?validFrom }
|
||||
OPTIONAL { ?entry hc:validTo ?validTo }
|
||||
OPTIONAL { ?entry hc:changeType ?changeType }
|
||||
OPTIONAL { ?entry hc:changeReason ?changeReason }
|
||||
OPTIONAL { ?entry schema:description ?description }
|
||||
|
||||
{% if ghcid %}
|
||||
FILTER(?ghcid = "{{ ghcid }}")
|
||||
{% elif institution_name %}
|
||||
FILTER(CONTAINS(LCASE(?name), "{{ institution_name | lower }}"))
|
||||
{% endif %}
|
||||
}
|
||||
ORDER BY ?validFrom
|
||||
|
||||
examples:
|
||||
- question: "Geschiedenis van het Rijksmuseum"
|
||||
slots: {institution_name: "Rijksmuseum"}
|
||||
- question: "What happened to Noord-Hollands Archief?"
|
||||
slots: {institution_name: "Noord-Hollands Archief"}
|
||||
|
||||
# Template: Organizational change events in time period
|
||||
events_in_period:
|
||||
id: "events_in_period"
|
||||
description: "Find organizational change events in a time period"
|
||||
intent: ["temporal", "statistical"]
|
||||
|
||||
question_patterns:
|
||||
# Dutch
|
||||
- "Welke fusies waren er tussen {start_year} en {end_year}?"
|
||||
- "Welke {event_type_nl} waren er in {year}?"
|
||||
- "Welke instellingen zijn gesloten in {year}?"
|
||||
- "Welke archieven zijn gefuseerd na {year}?"
|
||||
- "Nieuwe musea sinds {year}"
|
||||
- "Sluitingen in {year}"
|
||||
- "Fusies tussen {start_year} en {end_year}"
|
||||
# English
|
||||
- "Mergers between {start_year} and {end_year}"
|
||||
- "What {event_type_en} happened in {year}?"
|
||||
- "What institutions closed in {year}?"
|
||||
- "Archives founded before {year}"
|
||||
- "New museums since {year}"
|
||||
- "Closures in {year}"
|
||||
- "Mergers between {start_year} and {end_year}"
|
||||
|
||||
slots:
|
||||
start_date:
|
||||
type: date
|
||||
required: true
|
||||
description: "Start of time period (ISO format or year)"
|
||||
end_date:
|
||||
type: date
|
||||
required: false
|
||||
description: "End of time period (defaults to now)"
|
||||
event_type:
|
||||
type: string
|
||||
required: false
|
||||
valid_values: ["MERGER", "FOUNDING", "CLOSURE", "RELOCATION", "NAME_CHANGE", "SPLIT", "ACQUISITION"]
|
||||
description: "Type of organizational change event"
|
||||
institution_type:
|
||||
type: institution_type
|
||||
required: false
|
||||
|
||||
sparql_template: |
|
||||
{{ prefixes }}
|
||||
|
||||
SELECT ?event ?eventType ?date ?actor1 ?actor1Name ?actor2 ?actor2Name ?description WHERE {
|
||||
?event a hc:OrganizationalChangeEvent ;
|
||||
hc:eventType ?eventType ;
|
||||
hc:eventDate ?date .
|
||||
OPTIONAL {
|
||||
?event hc:affectedActor ?actor1 .
|
||||
?actor1 skos:prefLabel ?actor1Name .
|
||||
}
|
||||
OPTIONAL {
|
||||
?event hc:resultingActor ?actor2 .
|
||||
?actor2 skos:prefLabel ?actor2Name .
|
||||
}
|
||||
OPTIONAL { ?event schema:description ?description }
|
||||
|
||||
FILTER(?date >= "{{ start_date }}"^^xsd:date)
|
||||
{% if end_date %}
|
||||
FILTER(?date <= "{{ end_date }}"^^xsd:date)
|
||||
{% endif %}
|
||||
{% if event_type %}
|
||||
FILTER(?eventType = "{{ event_type }}")
|
||||
{% endif %}
|
||||
{% if institution_type %}
|
||||
?actor1 hc:institutionType "{{ institution_type }}" .
|
||||
{% endif %}
|
||||
}
|
||||
ORDER BY ?date
|
||||
{% if limit %}LIMIT {{ limit }}{% else %}LIMIT 50{% endif %}
|
||||
|
||||
examples:
|
||||
- question: "Welke fusies waren er tussen 2000 en 2010?"
|
||||
slots: {start_date: "2000-01-01", end_date: "2010-12-31", event_type: "MERGER"}
|
||||
- question: "What museums closed in 2020?"
|
||||
slots: {start_date: "2020-01-01", end_date: "2020-12-31", event_type: "CLOSURE", institution_type: "M"}
|
||||
- question: "Archives founded before 1900"
|
||||
slots: {start_date: "1800-01-01", end_date: "1899-12-31", event_type: "FOUNDING", institution_type: "A"}
|
||||
|
||||
# Template: Find oldest/newest institutions
|
||||
find_by_founding:
|
||||
id: "find_by_founding"
|
||||
description: "Find oldest or newest (most recently founded) institutions"
|
||||
intent: ["temporal", "exploration"]
|
||||
|
||||
question_patterns:
|
||||
# Dutch
|
||||
- "Oudste {institution_type_nl} in {location}"
|
||||
- "Oudste {institution_type_nl} van Nederland"
|
||||
- "Nieuwste {institution_type_nl} in {location}"
|
||||
- "Welk {institution_type_nl} is het oudste in {location}?"
|
||||
- "Welk {institution_type_nl} is het nieuwste?"
|
||||
- "Eerst opgerichte {institution_type_nl}"
|
||||
- "Laatst opgerichte {institution_type_nl}"
|
||||
- "{institution_type_nl} opgericht na {year}"
|
||||
- "{institution_type_nl} opgericht voor {year}"
|
||||
# English
|
||||
- "Oldest {institution_type_en} in {location}"
|
||||
- "Oldest {institution_type_en} in the Netherlands"
|
||||
- "Newest {institution_type_en} opened after {year}"
|
||||
- "Most recently founded {institution_type_en}"
|
||||
- "Which {institution_type_en} is the oldest in {location}?"
|
||||
- "First established {institution_type_en}"
|
||||
- "{institution_type_en} founded after {year}"
|
||||
- "{institution_type_en} founded before {year}"
|
||||
|
||||
slots:
|
||||
institution_type:
|
||||
type: institution_type
|
||||
required: true
|
||||
order:
|
||||
type: string
|
||||
required: false
|
||||
valid_values: ["ASC", "DESC"]
|
||||
default: "ASC"
|
||||
description: "ASC for oldest first, DESC for newest first"
|
||||
location:
|
||||
type: city
|
||||
required: false
|
||||
description: "City or region to filter by"
|
||||
country:
|
||||
type: country
|
||||
required: false
|
||||
default: "NL"
|
||||
founding_after:
|
||||
type: date
|
||||
required: false
|
||||
founding_before:
|
||||
type: date
|
||||
required: false
|
||||
|
||||
sparql_template: |
|
||||
{{ prefixes }}
|
||||
|
||||
SELECT ?institution ?name ?foundingDate ?city ?country WHERE {
|
||||
?institution a hcc:Custodian ;
|
||||
hc:institutionType "{{ institution_type }}" ;
|
||||
schema:name ?name .
|
||||
OPTIONAL { ?institution schema:foundingDate ?foundingDate }
|
||||
OPTIONAL { ?institution hc:settlementName ?city }
|
||||
OPTIONAL { ?institution hc:countryCode ?country }
|
||||
|
||||
# Must have founding date for ordering
|
||||
FILTER(BOUND(?foundingDate))
|
||||
|
||||
{% if location %}
|
||||
FILTER(CONTAINS(LCASE(?city), "{{ location | lower }}"))
|
||||
{% endif %}
|
||||
{% if country %}
|
||||
FILTER(?country = "{{ country }}")
|
||||
{% endif %}
|
||||
{% if founding_after %}
|
||||
FILTER(?foundingDate >= "{{ founding_after }}"^^xsd:date)
|
||||
{% endif %}
|
||||
{% if founding_before %}
|
||||
FILTER(?foundingDate <= "{{ founding_before }}"^^xsd:date)
|
||||
{% endif %}
|
||||
}
|
||||
ORDER BY {{ order | default('ASC') }}(?foundingDate)
|
||||
{% if limit %}LIMIT {{ limit }}{% else %}LIMIT 10{% endif %}
|
||||
|
||||
examples:
|
||||
- question: "Oudste musea in Amsterdam"
|
||||
slots: {institution_type: "M", location: "Amsterdam", order: "ASC"}
|
||||
- question: "Newest libraries in the Netherlands"
|
||||
slots: {institution_type: "L", country: "NL", order: "DESC"}
|
||||
- question: "Archives founded after 2000"
|
||||
slots: {institution_type: "A", founding_after: "2000-01-01", order: "ASC"}
|
||||
|
||||
# Template: Institutions by founding decade
|
||||
institutions_by_founding_decade:
|
||||
id: "institutions_by_founding_decade"
|
||||
description: "Count or list institutions by founding decade"
|
||||
intent: ["temporal", "statistical"]
|
||||
|
||||
question_patterns:
|
||||
# Dutch
|
||||
- "Hoeveel {institution_type_nl} zijn opgericht per decennium?"
|
||||
- "{institution_type_nl} opgericht in de jaren {decade}"
|
||||
- "Welke {institution_type_nl} zijn in de 19e eeuw opgericht?"
|
||||
- "Verdeling van oprichtingsjaren voor {institution_type_nl}"
|
||||
# English
|
||||
- "How many {institution_type_en} were founded per decade?"
|
||||
- "{institution_type_en} founded in the {decade}s"
|
||||
- "Which {institution_type_en} were founded in the 19th century?"
|
||||
- "Distribution of founding years for {institution_type_en}"
|
||||
|
||||
slots:
|
||||
institution_type:
|
||||
type: institution_type
|
||||
required: true
|
||||
decade:
|
||||
type: integer
|
||||
required: false
|
||||
description: "Decade start year (e.g., 1990 for 1990s)"
|
||||
century:
|
||||
type: integer
|
||||
required: false
|
||||
description: "Century (e.g., 19 for 19th century)"
|
||||
country:
|
||||
type: country
|
||||
required: false
|
||||
|
||||
sparql_template: |
|
||||
{{ prefixes }}
|
||||
|
||||
SELECT ?decade (COUNT(?institution) AS ?count) WHERE {
|
||||
?institution a hcc:Custodian ;
|
||||
hc:institutionType "{{ institution_type }}" ;
|
||||
schema:foundingDate ?foundingDate .
|
||||
{% if country %}
|
||||
?institution hc:countryCode "{{ country }}" .
|
||||
{% endif %}
|
||||
|
||||
BIND(YEAR(?foundingDate) AS ?year)
|
||||
BIND(FLOOR(?year / 10) * 10 AS ?decade)
|
||||
|
||||
{% if decade %}
|
||||
FILTER(?decade = {{ decade }})
|
||||
{% endif %}
|
||||
{% if century %}
|
||||
FILTER(?year >= {{ (century - 1) * 100 }} && ?year < {{ century * 100 }})
|
||||
{% endif %}
|
||||
}
|
||||
GROUP BY ?decade
|
||||
ORDER BY ?decade
|
||||
|
||||
# Alternative: list institutions in specific decade
|
||||
sparql_template_list: |
|
||||
{{ prefixes }}
|
||||
|
||||
SELECT ?institution ?name ?foundingDate WHERE {
|
||||
?institution a hcc:Custodian ;
|
||||
hc:institutionType "{{ institution_type }}" ;
|
||||
schema:name ?name ;
|
||||
schema:foundingDate ?foundingDate .
|
||||
{% if country %}
|
||||
?institution hc:countryCode "{{ country }}" .
|
||||
{% endif %}
|
||||
|
||||
BIND(YEAR(?foundingDate) AS ?year)
|
||||
|
||||
{% if decade %}
|
||||
FILTER(?year >= {{ decade }} && ?year < {{ decade + 10 }})
|
||||
{% endif %}
|
||||
{% if century %}
|
||||
FILTER(?year >= {{ (century - 1) * 100 }} && ?year < {{ century * 100 }})
|
||||
{% endif %}
|
||||
}
|
||||
ORDER BY ?foundingDate
|
||||
{% if limit %}LIMIT {{ limit }}{% endif %}
|
||||
|
||||
examples:
|
||||
- question: "Hoeveel musea zijn opgericht per decennium?"
|
||||
slots: {institution_type: "M"}
|
||||
- question: "Archives founded in the 1990s"
|
||||
slots: {institution_type: "A", decade: 1990}
|
||||
- question: "Libraries founded in the 19th century"
|
||||
slots: {institution_type: "L", century: 19}
|
||||
|
||||
# =============================================================================
|
||||
# FOLLOW-UP PATTERNS (Conversation Context Resolution)
|
||||
# =============================================================================
|
||||
|
|
|
|||
3360
docs/plan/external_design_patterns/01_graphrag_design_patterns.md
Normal file
3360
docs/plan/external_design_patterns/01_graphrag_design_patterns.md
Normal file
File diff suppressed because it is too large
Load diff
91
docs/plan/external_design_patterns/02_comparison_matrix.md
Normal file
91
docs/plan/external_design_patterns/02_comparison_matrix.md
Normal file
|
|
@ -0,0 +1,91 @@
|
|||
# GraphRAG Pattern Comparison Matrix
|
||||
|
||||
**Purpose**: Quick reference comparing our current implementation against external patterns.
|
||||
|
||||
## Comparison Matrix
|
||||
|
||||
| Capability | Our Current State | Microsoft GraphRAG | ROGRAG | Zep | HyperGraphRAG | LightRAG |
|
||||
|------------|-------------------|-------------------|--------|-----|---------------|----------|
|
||||
| **Vector Search** | Qdrant | Azure Cognitive | Faiss | Custom | Sentence-BERT | Faiss |
|
||||
| **Knowledge Graph** | Oxigraph (RDF) + TypeDB | LanceDB | TuGraph | Neo4j | Custom hypergraph | Neo4j |
|
||||
| **LLM Orchestration** | DSPy | Azure OpenAI | Qwen | OpenAI | GPT-4o | Various |
|
||||
| **Community Detection** | Not implemented | Leiden algorithm | None | Dynamic clustering | None | Louvain |
|
||||
| **Temporal Modeling** | GHCID history | Not built-in | None | Bitemporal (T, T') | None | None |
|
||||
| **Multi-hop Retrieval** | SPARQL traversal | Graph expansion | Logic form | BFS | Hyperedge walk | Graph paths |
|
||||
| **Verification Layer** | Not implemented | Claim extraction | Argument checking | None | None | None |
|
||||
| **N-ary Relations** | CIDOC-CRM events | Binary only | Binary only | Binary only | Hyperedges | Binary only |
|
||||
| **Cost Optimization** | Semantic caching | Community summaries | Minimal graph | Caching | None | Simple graph |
|
||||
|
||||
## Gap Analysis
|
||||
|
||||
### What We Have (Strengths)
|
||||
|
||||
| Feature | Description | Files |
|
||||
|---------|-------------|-------|
|
||||
| Template SPARQL | 65% precision vs 10% LLM-only | `template_sparql.py` |
|
||||
| Semantic caching | Redis-backed, reduces LLM calls | `semantic_cache.py` |
|
||||
| Cost tracking | Token/latency monitoring | `cost_tracker.py` |
|
||||
| Ontology grounding | LinkML schema validation | `schema_loader.py` |
|
||||
| Temporal tracking | GHCID history with valid_from/to | LinkML schema |
|
||||
| Multi-hop SPARQL | Graph traversal via SPARQL | `dspy_heritage_rag.py` |
|
||||
| Entity extraction | Heritage-specific NER | DSPy signatures |
|
||||
|
||||
### What We're Missing (Gaps)
|
||||
|
||||
| Gap | Priority | Implementation Effort | Benefit |
|
||||
|-----|----------|----------------------|---------|
|
||||
| Retrieval verification | High | Low (DSPy signature) | Reduces hallucination |
|
||||
| Community summaries | High | Medium (Leiden + indexing) | Enables global questions |
|
||||
| Dual-level extraction | High | Low (DSPy signature) | Better entity+relation matching |
|
||||
| Graph context enrichment | Medium | Low (extend retrieval) | Fixes weak embeddings |
|
||||
| Exploration suggestions | Medium | Medium (session memory) | Improves user experience |
|
||||
| Hypergraph memory | Low | High (new architecture) | Multi-step reasoning |
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
```
|
||||
Priority 1 (This Sprint)
|
||||
├── Retrieval Verification Layer
|
||||
│ └── ArgumentVerifier DSPy signature
|
||||
├── Dual-Level Entity Extraction
|
||||
│ └── Extend HeritageEntityExtractor
|
||||
└── Temporal SPARQL Templates
|
||||
└── Point-in-time query mode
|
||||
|
||||
Priority 2 (Next Sprint)
|
||||
├── Community Detection Pipeline
|
||||
│ └── Leiden algorithm on institution graph
|
||||
├── Community Summary Indexing
|
||||
│ └── Store in Qdrant with embeddings
|
||||
└── Global Search Mode
|
||||
└── Search summaries for holistic queries
|
||||
|
||||
Priority 3 (Backlog)
|
||||
├── Session Memory Evolution
|
||||
│ └── HGMEM-style working memory
|
||||
├── CIDOC-CRM Event Hyperedges
|
||||
│ └── Rich custody transfer modeling
|
||||
└── Exploration Suggestions
|
||||
└── Suggest related queries
|
||||
```
|
||||
|
||||
## Quick Reference: Pattern Mapping
|
||||
|
||||
| External Pattern | Our Implementation Approach |
|
||||
|-----------------|----------------------------|
|
||||
| GraphRAG communities | Pre-compute Leiden clusters in Oxigraph, store summaries in Qdrant |
|
||||
| ROGRAG dual-level | DSPy signature: entities (low) + relations (high) |
|
||||
| ROGRAG verification | DSPy signature: ArgumentVerifier before generation |
|
||||
| Zep bitemporal | Already have via GHCID history (extend SPARQL templates) |
|
||||
| HyperGraphRAG hyperedges | CIDOC-CRM events (crm:E10_Transfer_of_Custody) |
|
||||
| LightRAG simple graph | We use more complete graph, but can adopt "star graph sufficiency" thinking |
|
||||
|
||||
## Files to Modify
|
||||
|
||||
| File | Changes |
|
||||
|------|---------|
|
||||
| `dspy_heritage_rag.py` | Add ArgumentVerifier, DualLevelExtractor, global_search mode |
|
||||
| `template_sparql.py` | Add temporal query templates |
|
||||
| `session_manager.py` | Add working memory evolution |
|
||||
| **New**: `community_indexer.py` | Leiden detection, summary generation |
|
||||
| **New**: `exploration_suggester.py` | Pattern-based query suggestions |
|
||||
855
docs/plan/external_design_patterns/03_implementation_guide.md
Normal file
855
docs/plan/external_design_patterns/03_implementation_guide.md
Normal file
|
|
@ -0,0 +1,855 @@
|
|||
# Implementation Guide: GraphRAG Patterns for GLAM
|
||||
|
||||
**Purpose**: Concrete implementation patterns for integrating external GraphRAG techniques into our TypeDB-Oxigraph-DSPy stack.
|
||||
|
||||
---
|
||||
|
||||
## Pattern A: Retrieval Verification Layer
|
||||
|
||||
### Rationale
|
||||
From ROGRAG research: Argument checking (verify context before generation) outperforms result checking (verify after generation) with 75% vs 72% accuracy.
|
||||
|
||||
### Implementation
|
||||
|
||||
Add to `dspy_heritage_rag.py`:
|
||||
|
||||
```python
|
||||
# =============================================================================
|
||||
# RETRIEVAL VERIFICATION (ROGRAG Pattern)
|
||||
# =============================================================================
|
||||
|
||||
class ArgumentVerifier(dspy.Signature):
|
||||
"""
|
||||
Verify if retrieved context can answer the query before generation.
|
||||
Prevents hallucination from insufficient context.
|
||||
|
||||
Based on ROGRAG (arxiv:2503.06474) finding that argument checking
|
||||
outperforms result checking (75% vs 72% accuracy).
|
||||
"""
|
||||
__doc__ = """
|
||||
You are a verification assistant for heritage institution queries.
|
||||
|
||||
Given a user query and retrieved context, determine if the context
|
||||
contains sufficient information to answer the query accurately.
|
||||
|
||||
Be strict:
|
||||
- If key entities (institutions, cities, dates) are mentioned in the query
|
||||
but not found in the context, return can_answer=False
|
||||
- If the query asks for counts but context doesn't provide them, return False
|
||||
- If the query asks about relationships but context only has entity lists, return False
|
||||
|
||||
Examples of INSUFFICIENT context:
|
||||
- Query: "How many archives are in Haarlem?" / Context: mentions Haarlem archives but no count
|
||||
- Query: "When was Rijksmuseum founded?" / Context: describes Rijksmuseum but no founding date
|
||||
|
||||
Examples of SUFFICIENT context:
|
||||
- Query: "What archives are in Haarlem?" / Context: lists 3 specific archives in Haarlem
|
||||
- Query: "Tell me about the Rijksmuseum" / Context: contains name, location, type, description
|
||||
"""
|
||||
|
||||
query: str = dspy.InputField(desc="User's original question")
|
||||
context: str = dspy.InputField(desc="Retrieved information from KG and vector search")
|
||||
|
||||
can_answer: bool = dspy.OutputField(
|
||||
desc="True if context contains sufficient information to answer accurately"
|
||||
)
|
||||
missing_info: str = dspy.OutputField(
|
||||
desc="What specific information is missing (empty if can_answer=True)"
|
||||
)
|
||||
confidence: float = dspy.OutputField(
|
||||
desc="Confidence score 0-1 that context is sufficient"
|
||||
)
|
||||
suggested_refinement: str = dspy.OutputField(
|
||||
desc="Suggested query refinement if context is insufficient (empty if can_answer=True)"
|
||||
)
|
||||
|
||||
|
||||
class VerifiedHeritageRAG(dspy.Module):
|
||||
"""
|
||||
RAG pipeline with verification layer before answer generation.
|
||||
"""
|
||||
|
||||
def __init__(self, max_verification_retries: int = 2):
|
||||
super().__init__()
|
||||
self.max_retries = max_verification_retries
|
||||
self.verifier = dspy.ChainOfThought(ArgumentVerifier)
|
||||
self.retriever = HeritageRetriever() # Existing retriever
|
||||
self.generator = dspy.ChainOfThought(HeritageAnswerSignature) # Existing generator
|
||||
|
||||
def forward(
|
||||
self,
|
||||
query: str,
|
||||
conversation_history: Optional[list[dict]] = None
|
||||
) -> dspy.Prediction:
|
||||
"""
|
||||
Retrieve, verify, then generate - with retry on insufficient context.
|
||||
"""
|
||||
context = ""
|
||||
verification_attempts = []
|
||||
|
||||
for attempt in range(self.max_retries + 1):
|
||||
# Expand search if this is a retry
|
||||
expand_search = attempt > 0
|
||||
|
||||
# Retrieve context
|
||||
retrieval_result = self.retriever(
|
||||
query=query,
|
||||
expand=expand_search,
|
||||
previous_context=context
|
||||
)
|
||||
context = retrieval_result.context
|
||||
|
||||
# Verify sufficiency
|
||||
verification = self.verifier(query=query, context=context)
|
||||
verification_attempts.append({
|
||||
"attempt": attempt,
|
||||
"can_answer": verification.can_answer,
|
||||
"confidence": verification.confidence,
|
||||
"missing": verification.missing_info
|
||||
})
|
||||
|
||||
if verification.can_answer and verification.confidence >= 0.7:
|
||||
break
|
||||
|
||||
# Log retry
|
||||
logger.info(
|
||||
f"Verification attempt {attempt + 1}/{self.max_retries + 1}: "
|
||||
f"Insufficient context. Missing: {verification.missing_info}"
|
||||
)
|
||||
|
||||
# Generate answer (with caveat if low confidence)
|
||||
if not verification.can_answer:
|
||||
context = f"[NOTE: Limited information available]\n\n{context}"
|
||||
|
||||
answer = self.generator(query=query, context=context)
|
||||
|
||||
return dspy.Prediction(
|
||||
answer=answer.response,
|
||||
context=context,
|
||||
verification=verification_attempts[-1],
|
||||
retries=len(verification_attempts) - 1
|
||||
)
|
||||
```
|
||||
|
||||
### Integration Point
|
||||
|
||||
In `dspy_heritage_rag.py`, modify `HeritageRAGModule.forward()` to use verification:
|
||||
|
||||
```python
|
||||
# Before (current):
|
||||
# answer = self.generate_answer(query, context)
|
||||
|
||||
# After (with verification):
|
||||
verification = self.verifier(query=query, context=context)
|
||||
if not verification.can_answer and verification.confidence < 0.7:
|
||||
# Expand search and retry
|
||||
context = self._expand_retrieval(query, context, verification.missing_info)
|
||||
verification = self.verifier(query=query, context=context)
|
||||
|
||||
answer = self.generate_answer(query, context)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern B: Dual-Level Entity Extraction
|
||||
|
||||
### Rationale
|
||||
From ROGRAG: Separating low-level (entities) from high-level (relations) enables:
|
||||
- Low-level: Fuzzy string matching for names, places, IDs
|
||||
- High-level: Semantic similarity for concepts, relationships
|
||||
|
||||
### Implementation
|
||||
|
||||
Add to `dspy_heritage_rag.py`:
|
||||
|
||||
```python
|
||||
# =============================================================================
|
||||
# DUAL-LEVEL EXTRACTION (ROGRAG Pattern)
|
||||
# =============================================================================
|
||||
|
||||
class DualLevelEntityExtractor(dspy.Signature):
|
||||
"""
|
||||
Extract both entity-level and relation-level keywords from heritage queries.
|
||||
|
||||
Based on ROGRAG (arxiv:2503.06474) dual-level retrieval method.
|
||||
Low-level: Named entities for fuzzy graph matching
|
||||
High-level: Relation descriptions for semantic vector matching
|
||||
"""
|
||||
__doc__ = """
|
||||
You are a heritage query analyzer. Extract two types of information:
|
||||
|
||||
LOW-LEVEL (Entities):
|
||||
- Institution names: Rijksmuseum, Nationaal Archief, etc.
|
||||
- Place names: Amsterdam, Limburg, Noord-Holland
|
||||
- Person names: Staff, directors, curators
|
||||
- Identifiers: GHCID, ISIL codes (NL-XXXX)
|
||||
- Dates: Years, date ranges
|
||||
|
||||
HIGH-LEVEL (Relations/Concepts):
|
||||
- Collection types: "digitized collections", "medieval manuscripts"
|
||||
- Institution attributes: "oldest", "largest", "founded before 1900"
|
||||
- Relationship phrases: "collaborated with", "merged into", "part of"
|
||||
- Activities: "preserves", "exhibits", "researches"
|
||||
|
||||
Examples:
|
||||
|
||||
Query: "Which archives in Haarlem have digitized medieval manuscripts?"
|
||||
Entities: ["Haarlem", "archives"]
|
||||
Relations: ["digitized collections", "medieval manuscripts"]
|
||||
Strategy: entity_first (narrow by location, then filter by collection type)
|
||||
|
||||
Query: "What museums were founded before 1850 in the Netherlands?"
|
||||
Entities: ["Netherlands", "museums", "1850"]
|
||||
Relations: ["founded before", "historical institution"]
|
||||
Strategy: relation_first (semantic search for founding dates, then verify entities)
|
||||
|
||||
Query: "Tell me about the Rijksmuseum"
|
||||
Entities: ["Rijksmuseum"]
|
||||
Relations: ["general information", "institution overview"]
|
||||
Strategy: entity_first (direct lookup)
|
||||
"""
|
||||
|
||||
query: str = dspy.InputField(desc="User's heritage question")
|
||||
|
||||
entities: list[str] = dspy.OutputField(
|
||||
desc="Low-level: Named entities (institutions, places, people, dates, IDs)"
|
||||
)
|
||||
relations: list[str] = dspy.OutputField(
|
||||
desc="High-level: Relation/concept phrases for semantic matching"
|
||||
)
|
||||
search_strategy: Literal["entity_first", "relation_first", "parallel"] = dspy.OutputField(
|
||||
desc="Recommended search strategy based on query structure"
|
||||
)
|
||||
entity_types: list[str] = dspy.OutputField(
|
||||
desc="Types of entities found: institution, place, person, date, identifier"
|
||||
)
|
||||
|
||||
|
||||
class DualLevelRetriever(dspy.Module):
|
||||
"""
|
||||
Combines entity-level graph search with relation-level semantic search.
|
||||
"""
|
||||
|
||||
def __init__(self, qdrant_client, oxigraph_endpoint: str):
|
||||
super().__init__()
|
||||
self.extractor = dspy.ChainOfThought(DualLevelEntityExtractor)
|
||||
self.qdrant = qdrant_client
|
||||
self.oxigraph = oxigraph_endpoint
|
||||
|
||||
def match_entities_in_graph(self, entities: list[str]) -> set[str]:
|
||||
"""
|
||||
Fuzzy match entities against Oxigraph nodes.
|
||||
Returns matching GHCIDs.
|
||||
"""
|
||||
ghcids = set()
|
||||
|
||||
for entity in entities:
|
||||
# Use FILTER with CONTAINS for fuzzy matching
|
||||
sparql = f"""
|
||||
PREFIX hc: <https://nde.nl/ontology/hc/>
|
||||
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
|
||||
|
||||
SELECT DISTINCT ?ghcid WHERE {{
|
||||
?s hc:ghcid ?ghcid .
|
||||
{{
|
||||
?s skos:prefLabel ?name .
|
||||
FILTER(CONTAINS(LCASE(?name), LCASE("{entity}")))
|
||||
}} UNION {{
|
||||
?s schema:addressLocality ?city .
|
||||
FILTER(CONTAINS(LCASE(?city), LCASE("{entity}")))
|
||||
}} UNION {{
|
||||
?s hc:ghcid ?ghcid .
|
||||
FILTER(CONTAINS(?ghcid, "{entity.upper()}"))
|
||||
}}
|
||||
}}
|
||||
LIMIT 50
|
||||
"""
|
||||
results = self._execute_sparql(sparql)
|
||||
ghcids.update(r["ghcid"] for r in results)
|
||||
|
||||
return ghcids
|
||||
|
||||
def match_relations_semantically(
|
||||
self,
|
||||
relations: list[str],
|
||||
ghcid_filter: Optional[set[str]] = None
|
||||
) -> list[dict]:
|
||||
"""
|
||||
Semantic search for relation descriptions in vector store.
|
||||
Optionally filter by GHCID set from entity matching.
|
||||
"""
|
||||
# Combine relation phrases into search query
|
||||
relation_query = " ".join(relations)
|
||||
|
||||
# Build filter
|
||||
qdrant_filter = None
|
||||
if ghcid_filter:
|
||||
qdrant_filter = models.Filter(
|
||||
must=[
|
||||
models.FieldCondition(
|
||||
key="ghcid",
|
||||
match=models.MatchAny(any=list(ghcid_filter))
|
||||
)
|
||||
]
|
||||
)
|
||||
|
||||
# Vector search
|
||||
results = self.qdrant.search(
|
||||
collection_name="heritage_chunks",
|
||||
query_vector=self._embed(relation_query),
|
||||
query_filter=qdrant_filter,
|
||||
limit=20
|
||||
)
|
||||
|
||||
return [
|
||||
{
|
||||
"ghcid": r.payload.get("ghcid"),
|
||||
"text": r.payload.get("text"),
|
||||
"score": r.score
|
||||
}
|
||||
for r in results
|
||||
]
|
||||
|
||||
def forward(self, query: str) -> dspy.Prediction:
|
||||
"""
|
||||
Dual-level retrieval: entities narrow search, relations refine results.
|
||||
"""
|
||||
# Extract dual levels
|
||||
extraction = self.extractor(query=query)
|
||||
|
||||
if extraction.search_strategy == "entity_first":
|
||||
# Step 1: Entity matching in graph
|
||||
ghcid_set = self.match_entities_in_graph(extraction.entities)
|
||||
|
||||
# Step 2: Relation matching with GHCID filter
|
||||
results = self.match_relations_semantically(
|
||||
extraction.relations,
|
||||
ghcid_filter=ghcid_set if ghcid_set else None
|
||||
)
|
||||
|
||||
elif extraction.search_strategy == "relation_first":
|
||||
# Step 1: Broad relation matching
|
||||
results = self.match_relations_semantically(extraction.relations)
|
||||
|
||||
# Step 2: Filter by entity matching
|
||||
result_ghcids = {r["ghcid"] for r in results if r.get("ghcid")}
|
||||
entity_ghcids = self.match_entities_in_graph(extraction.entities)
|
||||
|
||||
# Prioritize intersection
|
||||
intersection = result_ghcids & entity_ghcids
|
||||
if intersection:
|
||||
results = [r for r in results if r.get("ghcid") in intersection]
|
||||
|
||||
else: # parallel
|
||||
# Run both in parallel, merge results
|
||||
ghcid_set = self.match_entities_in_graph(extraction.entities)
|
||||
semantic_results = self.match_relations_semantically(extraction.relations)
|
||||
|
||||
# Score boost for results matching both
|
||||
for r in semantic_results:
|
||||
if r.get("ghcid") in ghcid_set:
|
||||
r["score"] *= 1.5 # Boost intersection
|
||||
|
||||
results = sorted(semantic_results, key=lambda x: -x["score"])
|
||||
|
||||
return dspy.Prediction(
|
||||
results=results,
|
||||
entities=extraction.entities,
|
||||
relations=extraction.relations,
|
||||
strategy=extraction.search_strategy,
|
||||
ghcid_set=list(ghcid_set) if 'ghcid_set' in locals() else []
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern C: Community Detection and Summaries
|
||||
|
||||
### Rationale
|
||||
From Microsoft GraphRAG: Community summaries enable answering holistic questions like "What are the main archival themes in the Netherlands?"
|
||||
|
||||
### Implementation
|
||||
|
||||
Create new file `backend/rag/community_indexer.py`:
|
||||
|
||||
```python
|
||||
"""
|
||||
Community Detection and Summary Indexing for Global Search
|
||||
|
||||
Based on Microsoft GraphRAG (arxiv:2404.16130) community hierarchy pattern.
|
||||
Uses Leiden algorithm for community detection on institution graph.
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional
|
||||
|
||||
import dspy
|
||||
import igraph as ig
|
||||
import leidenalg
|
||||
from qdrant_client import QdrantClient, models
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Community:
|
||||
"""A community of related heritage institutions."""
|
||||
community_id: str
|
||||
ghcids: list[str]
|
||||
summary: str
|
||||
institution_count: int
|
||||
dominant_type: str # Most common institution type
|
||||
dominant_region: str # Most common region
|
||||
themes: list[str] # Extracted themes
|
||||
|
||||
|
||||
class CommunitySummarizer(dspy.Signature):
|
||||
"""Generate a summary for a community of heritage institutions."""
|
||||
__doc__ = """
|
||||
You are a heritage domain expert. Given a list of institutions in a community,
|
||||
generate a concise summary describing:
|
||||
1. What types of institutions are in this community
|
||||
2. Geographic concentration (if any)
|
||||
3. Common themes or specializations
|
||||
4. Notable relationships between institutions
|
||||
|
||||
Keep the summary to 2-3 sentences. Focus on what makes this community distinctive.
|
||||
"""
|
||||
|
||||
institutions: str = dspy.InputField(desc="JSON list of institution metadata")
|
||||
|
||||
summary: str = dspy.OutputField(desc="2-3 sentence community summary")
|
||||
themes: list[str] = dspy.OutputField(desc="Key themes (3-5 keywords)")
|
||||
notable_features: str = dspy.OutputField(desc="What makes this community distinctive")
|
||||
|
||||
|
||||
class CommunityIndexer:
|
||||
"""
|
||||
Builds and indexes institution communities for global search.
|
||||
|
||||
Usage:
|
||||
indexer = CommunityIndexer(oxigraph_url, qdrant_client)
|
||||
indexer.build_communities()
|
||||
indexer.index_summaries()
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
oxigraph_endpoint: str,
|
||||
qdrant_client: QdrantClient,
|
||||
collection_name: str = "heritage_communities"
|
||||
):
|
||||
self.oxigraph = oxigraph_endpoint
|
||||
self.qdrant = qdrant_client
|
||||
self.collection_name = collection_name
|
||||
self.summarizer = dspy.ChainOfThought(CommunitySummarizer)
|
||||
|
||||
def build_institution_graph(self) -> ig.Graph:
|
||||
"""
|
||||
Query Oxigraph for institution relationships.
|
||||
Build igraph for community detection.
|
||||
"""
|
||||
# Get all institutions with their properties
|
||||
sparql = """
|
||||
PREFIX hc: <https://nde.nl/ontology/hc/>
|
||||
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
|
||||
PREFIX schema: <http://schema.org/>
|
||||
|
||||
SELECT ?ghcid ?name ?type ?city ?region WHERE {
|
||||
?s hc:ghcid ?ghcid ;
|
||||
skos:prefLabel ?name ;
|
||||
hc:institutionType ?type .
|
||||
OPTIONAL { ?s schema:addressLocality ?city }
|
||||
OPTIONAL { ?s hc:regionCode ?region }
|
||||
}
|
||||
"""
|
||||
institutions = self._execute_sparql(sparql)
|
||||
|
||||
# Build graph: nodes are institutions, edges connect those sharing:
|
||||
# - Same city
|
||||
# - Same region
|
||||
# - Same type
|
||||
# - Part-of relationships
|
||||
|
||||
g = ig.Graph()
|
||||
ghcid_to_idx = {}
|
||||
|
||||
# Add nodes
|
||||
for inst in institutions:
|
||||
idx = g.add_vertex(
|
||||
ghcid=inst["ghcid"],
|
||||
name=inst.get("name", ""),
|
||||
type=inst.get("type", ""),
|
||||
city=inst.get("city", ""),
|
||||
region=inst.get("region", "")
|
||||
)
|
||||
ghcid_to_idx[inst["ghcid"]] = idx.index
|
||||
|
||||
# Add edges based on shared properties
|
||||
for i, inst1 in enumerate(institutions):
|
||||
for j, inst2 in enumerate(institutions[i+1:], i+1):
|
||||
weight = 0
|
||||
|
||||
# Same city: strong connection
|
||||
if inst1.get("city") and inst1["city"] == inst2.get("city"):
|
||||
weight += 2
|
||||
|
||||
# Same region: medium connection
|
||||
if inst1.get("region") and inst1["region"] == inst2.get("region"):
|
||||
weight += 1
|
||||
|
||||
# Same type: weak connection
|
||||
if inst1.get("type") and inst1["type"] == inst2.get("type"):
|
||||
weight += 0.5
|
||||
|
||||
if weight > 0:
|
||||
g.add_edge(
|
||||
ghcid_to_idx[inst1["ghcid"]],
|
||||
ghcid_to_idx[inst2["ghcid"]],
|
||||
weight=weight
|
||||
)
|
||||
|
||||
return g
|
||||
|
||||
def detect_communities(self, graph: ig.Graph) -> dict[str, list[str]]:
|
||||
"""
|
||||
Apply Leiden algorithm for community detection.
|
||||
Returns mapping: community_id -> [ghcid_list]
|
||||
"""
|
||||
# Leiden with modularity optimization
|
||||
partition = leidenalg.find_partition(
|
||||
graph,
|
||||
leidenalg.ModularityVertexPartition,
|
||||
weights="weight"
|
||||
)
|
||||
|
||||
communities = {}
|
||||
for comm_idx, members in enumerate(partition):
|
||||
comm_id = f"comm_{comm_idx:04d}"
|
||||
ghcids = [graph.vs[idx]["ghcid"] for idx in members]
|
||||
communities[comm_id] = ghcids
|
||||
|
||||
logger.info(f"Detected {len(communities)} communities")
|
||||
return communities
|
||||
|
||||
def generate_community_summary(
|
||||
self,
|
||||
community_id: str,
|
||||
ghcids: list[str]
|
||||
) -> Community:
|
||||
"""
|
||||
Generate LLM summary for a community.
|
||||
"""
|
||||
# Fetch metadata for all institutions
|
||||
institutions = self._fetch_institutions(ghcids)
|
||||
|
||||
# Generate summary
|
||||
result = self.summarizer(
|
||||
institutions=json.dumps(institutions, indent=2)
|
||||
)
|
||||
|
||||
# Determine dominant type and region
|
||||
types = [i.get("type", "") for i in institutions]
|
||||
regions = [i.get("region", "") for i in institutions]
|
||||
|
||||
dominant_type = max(set(types), key=types.count) if types else ""
|
||||
dominant_region = max(set(regions), key=regions.count) if regions else ""
|
||||
|
||||
return Community(
|
||||
community_id=community_id,
|
||||
ghcids=ghcids,
|
||||
summary=result.summary,
|
||||
institution_count=len(ghcids),
|
||||
dominant_type=dominant_type,
|
||||
dominant_region=dominant_region,
|
||||
themes=result.themes
|
||||
)
|
||||
|
||||
def index_summaries(self, communities: list[Community]) -> None:
|
||||
"""
|
||||
Store community summaries in Qdrant for global search.
|
||||
"""
|
||||
# Create collection if not exists
|
||||
self.qdrant.recreate_collection(
|
||||
collection_name=self.collection_name,
|
||||
vectors_config=models.VectorParams(
|
||||
size=384, # MiniLM embedding size
|
||||
distance=models.Distance.COSINE
|
||||
)
|
||||
)
|
||||
|
||||
# Index each community
|
||||
points = []
|
||||
for comm in communities:
|
||||
embedding = self._embed(comm.summary)
|
||||
|
||||
points.append(models.PointStruct(
|
||||
id=hash(comm.community_id) % (2**63),
|
||||
vector=embedding,
|
||||
payload={
|
||||
"community_id": comm.community_id,
|
||||
"summary": comm.summary,
|
||||
"ghcids": comm.ghcids,
|
||||
"institution_count": comm.institution_count,
|
||||
"dominant_type": comm.dominant_type,
|
||||
"dominant_region": comm.dominant_region,
|
||||
"themes": comm.themes
|
||||
}
|
||||
))
|
||||
|
||||
self.qdrant.upsert(
|
||||
collection_name=self.collection_name,
|
||||
points=points
|
||||
)
|
||||
|
||||
logger.info(f"Indexed {len(points)} community summaries")
|
||||
|
||||
def global_search(self, query: str, limit: int = 5) -> list[dict]:
|
||||
"""
|
||||
Search community summaries for holistic questions.
|
||||
"""
|
||||
embedding = self._embed(query)
|
||||
|
||||
results = self.qdrant.search(
|
||||
collection_name=self.collection_name,
|
||||
query_vector=embedding,
|
||||
limit=limit
|
||||
)
|
||||
|
||||
return [
|
||||
{
|
||||
"community_id": r.payload["community_id"],
|
||||
"summary": r.payload["summary"],
|
||||
"themes": r.payload["themes"],
|
||||
"institution_count": r.payload["institution_count"],
|
||||
"score": r.score
|
||||
}
|
||||
for r in results
|
||||
]
|
||||
|
||||
def build_and_index(self) -> int:
|
||||
"""
|
||||
Full pipeline: build graph, detect communities, generate summaries, index.
|
||||
Returns number of communities indexed.
|
||||
"""
|
||||
logger.info("Building institution graph...")
|
||||
graph = self.build_institution_graph()
|
||||
|
||||
logger.info("Detecting communities...")
|
||||
community_map = self.detect_communities(graph)
|
||||
|
||||
logger.info("Generating community summaries...")
|
||||
communities = []
|
||||
for comm_id, ghcids in community_map.items():
|
||||
if len(ghcids) >= 3: # Only summarize communities with 3+ members
|
||||
comm = self.generate_community_summary(comm_id, ghcids)
|
||||
communities.append(comm)
|
||||
|
||||
logger.info(f"Indexing {len(communities)} community summaries...")
|
||||
self.index_summaries(communities)
|
||||
|
||||
return len(communities)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern D: Temporal Query Templates
|
||||
|
||||
### Rationale
|
||||
From Zep: Bitemporal modeling enables point-in-time queries and provenance tracking.
|
||||
|
||||
### Implementation
|
||||
|
||||
Add to `template_sparql.py`:
|
||||
|
||||
```python
|
||||
# =============================================================================
|
||||
# TEMPORAL QUERY TEMPLATES (Zep Pattern)
|
||||
# =============================================================================
|
||||
|
||||
TEMPORAL_QUERY_TEMPLATES = {
|
||||
"point_in_time_state": TemplateDefinition(
|
||||
id="temporal_pit",
|
||||
name="Point-in-Time Institution State",
|
||||
description="Get institution state at a specific point in time",
|
||||
intent_patterns=["what was", "in [year]", "before", "after", "at that time"],
|
||||
sparql_template="""
|
||||
PREFIX hc: <https://nde.nl/ontology/hc/>
|
||||
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
|
||||
PREFIX schema: <http://schema.org/>
|
||||
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
|
||||
|
||||
SELECT ?ghcid ?name ?type ?city ?validFrom ?validTo WHERE {
|
||||
?s hc:ghcid ?ghcid ;
|
||||
skos:prefLabel ?name ;
|
||||
hc:institutionType ?type ;
|
||||
hc:validFrom ?validFrom .
|
||||
OPTIONAL { ?s schema:addressLocality ?city }
|
||||
OPTIONAL { ?s hc:validTo ?validTo }
|
||||
|
||||
# Temporal filter: valid at query date
|
||||
FILTER(?validFrom <= "{{ query_date }}"^^xsd:date)
|
||||
FILTER(!BOUND(?validTo) || ?validTo > "{{ query_date }}"^^xsd:date)
|
||||
|
||||
{% if ghcid_filter %}
|
||||
FILTER(STRSTARTS(?ghcid, "{{ ghcid_filter }}"))
|
||||
{% endif %}
|
||||
}
|
||||
ORDER BY ?ghcid
|
||||
""",
|
||||
slots=[
|
||||
SlotDefinition(type=SlotType.STRING, name="query_date", required=True),
|
||||
SlotDefinition(type=SlotType.STRING, name="ghcid_filter", required=False)
|
||||
]
|
||||
),
|
||||
|
||||
"institution_history": TemplateDefinition(
|
||||
id="temporal_history",
|
||||
name="Institution Change History",
|
||||
description="Get full history of changes for an institution",
|
||||
intent_patterns=["history of", "changes to", "evolution of", "timeline"],
|
||||
sparql_template="""
|
||||
PREFIX hc: <https://nde.nl/ontology/hc/>
|
||||
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
|
||||
|
||||
SELECT ?ghcid ?name ?validFrom ?validTo ?changeType ?description WHERE {
|
||||
?entry hc:ghcid "{{ ghcid }}" ;
|
||||
skos:prefLabel ?name ;
|
||||
hc:validFrom ?validFrom .
|
||||
OPTIONAL { ?entry hc:validTo ?validTo }
|
||||
OPTIONAL { ?entry hc:changeType ?changeType }
|
||||
OPTIONAL { ?entry hc:changeDescription ?description }
|
||||
}
|
||||
ORDER BY ?validFrom
|
||||
""",
|
||||
slots=[
|
||||
SlotDefinition(type=SlotType.STRING, name="ghcid", required=True)
|
||||
]
|
||||
),
|
||||
|
||||
"institutions_founded_before": TemplateDefinition(
|
||||
id="temporal_founded_before",
|
||||
name="Institutions Founded Before Date",
|
||||
description="Find institutions founded before a specific date",
|
||||
intent_patterns=["founded before", "established before", "older than", "before [year]"],
|
||||
sparql_template="""
|
||||
PREFIX hc: <https://nde.nl/ontology/hc/>
|
||||
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
|
||||
PREFIX schema: <http://schema.org/>
|
||||
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
|
||||
|
||||
SELECT ?ghcid ?name ?type ?city ?foundingDate WHERE {
|
||||
?s hc:ghcid ?ghcid ;
|
||||
skos:prefLabel ?name ;
|
||||
hc:institutionType ?type ;
|
||||
schema:foundingDate ?foundingDate .
|
||||
OPTIONAL { ?s schema:addressLocality ?city }
|
||||
|
||||
FILTER(?foundingDate < "{{ cutoff_date }}"^^xsd:date)
|
||||
|
||||
{% if institution_type %}
|
||||
FILTER(?type = "{{ institution_type }}")
|
||||
{% endif %}
|
||||
}
|
||||
ORDER BY ?foundingDate
|
||||
LIMIT {{ limit | default(50) }}
|
||||
""",
|
||||
slots=[
|
||||
SlotDefinition(type=SlotType.STRING, name="cutoff_date", required=True),
|
||||
SlotDefinition(type=SlotType.INSTITUTION_TYPE, name="institution_type", required=False),
|
||||
SlotDefinition(type=SlotType.INTEGER, name="limit", required=False, default="50")
|
||||
]
|
||||
),
|
||||
|
||||
"merger_history": TemplateDefinition(
|
||||
id="temporal_mergers",
|
||||
name="Institution Merger History",
|
||||
description="Find institutions that merged or were absorbed",
|
||||
intent_patterns=["merged", "merger", "combined", "absorbed", "joined"],
|
||||
sparql_template="""
|
||||
PREFIX hc: <https://nde.nl/ontology/hc/>
|
||||
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
|
||||
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
|
||||
|
||||
SELECT ?event ?eventDate ?description
|
||||
?sourceGhcid ?sourceName
|
||||
?targetGhcid ?targetName WHERE {
|
||||
?event a hc:MergerEvent ;
|
||||
hc:eventDate ?eventDate ;
|
||||
hc:description ?description .
|
||||
|
||||
OPTIONAL {
|
||||
?event hc:sourceInstitution ?source .
|
||||
?source hc:ghcid ?sourceGhcid ;
|
||||
skos:prefLabel ?sourceName .
|
||||
}
|
||||
|
||||
OPTIONAL {
|
||||
?event hc:resultingInstitution ?target .
|
||||
?target hc:ghcid ?targetGhcid ;
|
||||
skos:prefLabel ?targetName .
|
||||
}
|
||||
|
||||
{% if region_filter %}
|
||||
FILTER(STRSTARTS(?sourceGhcid, "{{ region_filter }}") ||
|
||||
STRSTARTS(?targetGhcid, "{{ region_filter }}"))
|
||||
{% endif %}
|
||||
}
|
||||
ORDER BY ?eventDate
|
||||
""",
|
||||
slots=[
|
||||
SlotDefinition(type=SlotType.STRING, name="region_filter", required=False)
|
||||
]
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration Checklist
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
- [ ] Add `ArgumentVerifier` signature to `dspy_heritage_rag.py`
|
||||
- [ ] Add `DualLevelEntityExtractor` signature
|
||||
- [ ] Integrate verification into retrieval pipeline
|
||||
- [ ] Add temporal query templates to `template_sparql.py`
|
||||
|
||||
### Short-Term Actions
|
||||
|
||||
- [ ] Create `backend/rag/community_indexer.py`
|
||||
- [ ] Add Leiden algorithm dependency: `pip install leidenalg python-igraph`
|
||||
- [ ] Create Qdrant collection for community summaries
|
||||
- [ ] Add global search mode to RAG pipeline
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# Test verification layer
|
||||
python -c "
|
||||
from backend.rag.dspy_heritage_rag import ArgumentVerifier
|
||||
import dspy
|
||||
dspy.configure(lm=...)
|
||||
verifier = dspy.ChainOfThought(ArgumentVerifier)
|
||||
result = verifier(
|
||||
query='How many archives are in Haarlem?',
|
||||
context='Haarlem has several heritage institutions including archives.'
|
||||
)
|
||||
print(f'Can answer: {result.can_answer}')
|
||||
print(f'Missing: {result.missing_info}')
|
||||
"
|
||||
|
||||
# Test dual-level extraction
|
||||
python -c "
|
||||
from backend.rag.dspy_heritage_rag import DualLevelEntityExtractor
|
||||
import dspy
|
||||
dspy.configure(lm=...)
|
||||
extractor = dspy.ChainOfThought(DualLevelEntityExtractor)
|
||||
result = extractor(query='Which archives in Haarlem have digitized medieval manuscripts?')
|
||||
print(f'Entities: {result.entities}')
|
||||
print(f'Relations: {result.relations}')
|
||||
print(f'Strategy: {result.search_strategy}')
|
||||
"
|
||||
```
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -164,10 +164,10 @@ imports:
|
|||
- modules/slots/managing_unit
|
||||
- modules/slots/managed_collections
|
||||
|
||||
# Enums (12 files - added CustodianPrimaryTypeEnum + EncompassingBodyTypeEnum)
|
||||
# Enums (11 files - CustodianPrimaryTypeEnum ARCHIVED per Rule 9: Enum-to-Class Promotion)
|
||||
# See: schemas/20251121/linkml/archive/enums/CustodianPrimaryTypeEnum.yaml.archived_20260105
|
||||
- modules/enums/AgentTypeEnum
|
||||
- modules/enums/AppellationTypeEnum
|
||||
- modules/enums/CustodianPrimaryTypeEnum
|
||||
- modules/enums/EncompassingBodyTypeEnum
|
||||
- modules/enums/EntityTypeEnum
|
||||
- modules/enums/LegalStatusEnum
|
||||
|
|
@ -423,6 +423,102 @@ imports:
|
|||
- modules/classes/WebArchive
|
||||
- modules/classes/WomensArchives
|
||||
|
||||
# Archive RecordSetTypes - concrete subclasses of rico:RecordSetType (v0.9.12)
|
||||
# These define the types of record sets held by each archive type
|
||||
# Updated: all 92 archive types now have RecordSetTypes files
|
||||
- modules/classes/AcademicArchiveRecordSetTypes
|
||||
- modules/classes/AdvertisingRadioArchiveRecordSetTypes
|
||||
- modules/classes/AnimalSoundArchiveRecordSetTypes
|
||||
- modules/classes/ArchitecturalArchiveRecordSetTypes
|
||||
- modules/classes/ArchiveOfInternationalOrganizationRecordSetTypes
|
||||
- modules/classes/ArchivesForBuildingRecordsRecordSetTypes
|
||||
- modules/classes/ArchivesRegionalesRecordSetTypes
|
||||
- modules/classes/ArtArchiveRecordSetTypes
|
||||
- modules/classes/AudiovisualArchiveRecordSetTypes
|
||||
- modules/classes/BankArchiveRecordSetTypes
|
||||
- modules/classes/CantonalArchiveRecordSetTypes
|
||||
- modules/classes/CathedralArchiveRecordSetTypes
|
||||
- modules/classes/ChurchArchiveRecordSetTypes
|
||||
- modules/classes/ChurchArchiveSwedenRecordSetTypes
|
||||
- modules/classes/ClimateArchiveRecordSetTypes
|
||||
- modules/classes/CollectingArchivesRecordSetTypes
|
||||
- modules/classes/ComarcalArchiveRecordSetTypes
|
||||
- modules/classes/CommunityArchiveRecordSetTypes
|
||||
- modules/classes/CompanyArchiveRecordSetTypes
|
||||
- modules/classes/CurrentArchiveRecordSetTypes
|
||||
- modules/classes/CustodianArchiveRecordSetTypes
|
||||
- modules/classes/DarkArchiveRecordSetTypes
|
||||
- modules/classes/DepartmentalArchivesRecordSetTypes
|
||||
- modules/classes/DepositArchiveRecordSetTypes
|
||||
- modules/classes/DigitalArchiveRecordSetTypes
|
||||
- modules/classes/DimArchivesRecordSetTypes
|
||||
- modules/classes/DiocesanArchiveRecordSetTypes
|
||||
- modules/classes/DistrictArchiveGermanyRecordSetTypes
|
||||
- modules/classes/DistritalArchiveRecordSetTypes
|
||||
- modules/classes/EconomicArchiveRecordSetTypes
|
||||
- modules/classes/FilmArchiveRecordSetTypes
|
||||
- modules/classes/FoundationArchiveRecordSetTypes
|
||||
- modules/classes/FreeArchiveRecordSetTypes
|
||||
- modules/classes/FrenchPrivateArchivesRecordSetTypes
|
||||
- modules/classes/GovernmentArchiveRecordSetTypes
|
||||
- modules/classes/HistoricalArchiveRecordSetTypes
|
||||
- modules/classes/HospitalArchiveRecordSetTypes
|
||||
- modules/classes/HouseArchiveRecordSetTypes
|
||||
- modules/classes/IconographicArchivesRecordSetTypes
|
||||
- modules/classes/InstitutionalArchiveRecordSetTypes
|
||||
- modules/classes/JointArchivesRecordSetTypes
|
||||
- modules/classes/LGBTArchiveRecordSetTypes
|
||||
- modules/classes/LightArchivesRecordSetTypes
|
||||
- modules/classes/LiteraryArchiveRecordSetTypes
|
||||
- modules/classes/LocalGovernmentArchiveRecordSetTypes
|
||||
- modules/classes/LocalHistoryArchiveRecordSetTypes
|
||||
- modules/classes/MailingListArchiveRecordSetTypes
|
||||
- modules/classes/MediaArchiveRecordSetTypes
|
||||
- modules/classes/MilitaryArchiveRecordSetTypes
|
||||
- modules/classes/MonasteryArchiveRecordSetTypes
|
||||
- modules/classes/MunicipalArchiveRecordSetTypes
|
||||
- modules/classes/MuseumArchiveRecordSetTypes
|
||||
- modules/classes/MusicArchiveRecordSetTypes
|
||||
- modules/classes/NationalArchivesRecordSetTypes
|
||||
- modules/classes/NewspaperClippingsArchiveRecordSetTypes
|
||||
- modules/classes/NobilityArchiveRecordSetTypes
|
||||
- modules/classes/NotarialArchiveRecordSetTypes
|
||||
- modules/classes/OnlineNewsArchiveRecordSetTypes
|
||||
- modules/classes/ParishArchiveRecordSetTypes
|
||||
- modules/classes/ParliamentaryArchivesRecordSetTypes
|
||||
- modules/classes/PartyArchiveRecordSetTypes
|
||||
- modules/classes/PerformingArtsArchiveRecordSetTypes
|
||||
- modules/classes/PhotoArchiveRecordSetTypes
|
||||
- modules/classes/PoliticalArchiveRecordSetTypes
|
||||
- modules/classes/PostcustodialArchiveRecordSetTypes
|
||||
- modules/classes/PressArchiveRecordSetTypes
|
||||
- modules/classes/ProvincialArchiveRecordSetTypes
|
||||
- modules/classes/ProvincialHistoricalArchiveRecordSetTypes
|
||||
- modules/classes/PublicArchiveRecordSetTypes
|
||||
- modules/classes/PublicArchivesInFranceRecordSetTypes
|
||||
- modules/classes/RadioArchiveRecordSetTypes
|
||||
- modules/classes/RegionalArchiveRecordSetTypes
|
||||
- modules/classes/RegionalArchivesInIcelandRecordSetTypes
|
||||
- modules/classes/RegionalEconomicArchiveRecordSetTypes
|
||||
- modules/classes/RegionalStateArchivesRecordSetTypes
|
||||
- modules/classes/ReligiousArchiveRecordSetTypes
|
||||
- modules/classes/SchoolArchiveRecordSetTypes
|
||||
- modules/classes/ScientificArchiveRecordSetTypes
|
||||
- modules/classes/SectorOfArchivesInSwedenRecordSetTypes
|
||||
- modules/classes/SecurityArchivesRecordSetTypes
|
||||
- modules/classes/SoundArchiveRecordSetTypes
|
||||
- modules/classes/SpecializedArchiveRecordSetTypes
|
||||
- modules/classes/SpecializedArchivesCzechiaRecordSetTypes
|
||||
- modules/classes/StateArchivesRecordSetTypes
|
||||
- modules/classes/StateArchivesSectionRecordSetTypes
|
||||
- modules/classes/StateDistrictArchiveRecordSetTypes
|
||||
- modules/classes/StateRegionalArchiveCzechiaRecordSetTypes
|
||||
- modules/classes/TelevisionArchiveRecordSetTypes
|
||||
- modules/classes/TradeUnionArchiveRecordSetTypes
|
||||
- modules/classes/UniversityArchiveRecordSetTypes
|
||||
- modules/classes/WebArchiveRecordSetTypes
|
||||
- modules/classes/WomensArchivesRecordSetTypes
|
||||
|
||||
# New slots for registration info
|
||||
- modules/slots/country
|
||||
- modules/slots/description
|
||||
|
|
@ -468,6 +564,9 @@ imports:
|
|||
- modules/slots/is_legal_status_of
|
||||
- modules/slots/has_derived_observation
|
||||
- modules/slots/offers_donation_schemes
|
||||
|
||||
# Rico:isOrWasHolderOf relationship slot (links custodians to record set types)
|
||||
- modules/slots/holds_record_set_types
|
||||
|
||||
comments:
|
||||
- "HYPER-MODULAR STRUCTURE: Direct imports of all component files"
|
||||
|
|
@ -476,7 +575,7 @@ comments:
|
|||
- "Namespace structure: https://nde.nl/ontology/hc/{class|enum|slot}/[Name]"
|
||||
- "Total components: 44 classes + 12 enums + 102 slots = 158 definition files"
|
||||
- "Legal entity classes (5): LegalEntityType, LegalForm, LegalName, RegistrationInfo (4 classes within), total 8 classes"
|
||||
- "Type classification: CustodianType (base) + specialized subclasses (ArchiveOrganizationType, MuseumType, LibraryType, GalleryType, ResearchOrganizationType, OfficialInstitutionType, BioCustodianType, EducationProviderType) + CustodianPrimaryTypeEnum (19 types)"
|
||||
- "Type classification: CustodianType (base) + 19 specialized subclasses (ArchiveOrganizationType, MuseumType, LibraryType, GalleryType, ResearchOrganizationType, OfficialInstitutionType, BioCustodianType, EducationProviderType, HeritageSocietyType, FeatureCustodianType, IntangibleHeritageGroupType, PersonalCollectionType, HolySacredSiteType, DigitalPlatformType, NonProfitType, TasteScentHeritageType, CommercialOrganizationType, MixedCustodianType, UnspecifiedType)"
|
||||
- "Specialized types: ArchiveOrganizationType (144 Wikidata), MuseumType (187), LibraryType (60), GalleryType (78), ResearchOrganizationType (44), OfficialInstitutionType (50+), BioCustodianType (1,393 Wikidata), EducationProviderType (60+ Wikidata) with domain-specific slots"
|
||||
- "Collection aspect: CustodianCollection with 10 collection-specific slots (added managing_unit in v0.7.0, managed_by_cms in v0.8.9)"
|
||||
- "Organizational aspect: OrganizationalStructure with 7 unit-specific slots (staff_members, managed_collections)"
|
||||
|
|
|
|||
|
|
@ -0,0 +1,922 @@
|
|||
id: https://nde.nl/ontology/hc/enum/ArchiveTypeEnum
|
||||
name: ArchiveTypeEnum
|
||||
title: Archive Type Classification
|
||||
description: 'Types of archives extracted from Wikidata hyponyms of Q166118 (archive).
|
||||
|
||||
|
||||
Generated: 2025-12-01T16:01:19Z
|
||||
|
||||
Total values: 144'
|
||||
enums:
|
||||
ArchiveTypeEnum:
|
||||
permissible_values:
|
||||
ACADEMIC_ARCHIVE:
|
||||
description: archive of a higher education institution
|
||||
meaning: wikidata:Q27032435
|
||||
comments:
|
||||
- Hochschularchiv (de)
|
||||
- archivo académico (es)
|
||||
- archives académiques (fr)
|
||||
ADVERTISING_RADIO_ARCHIVE:
|
||||
description: sound archive with advertising radio productions
|
||||
meaning: wikidata:Q60658673
|
||||
comments:
|
||||
- Werbefunkarchiv (de)
|
||||
- Archives radiophoniques publicitaires (fr)
|
||||
- Archivio radio pubblicitaria (it)
|
||||
ANIMAL_SOUND_ARCHIVE:
|
||||
description: collection of animal sound recordings
|
||||
meaning: wikidata:Q18574935
|
||||
comments:
|
||||
- Tierstimmenarchiv (de)
|
||||
- Archives de voix d'animaux (fr)
|
||||
- Archivio vocale degli animali (it)
|
||||
ARCHITECTURAL_ARCHIVE:
|
||||
description: archive that safeguards architectural heritage
|
||||
meaning: wikidata:Q121409581
|
||||
comments:
|
||||
- Architekturarchiv (de)
|
||||
- archives architecturales (fr)
|
||||
- architectonisch archief (nl)
|
||||
ARCHIVAL_LIBRARY:
|
||||
description: library of an archive
|
||||
meaning: wikidata:Q25504402
|
||||
comments:
|
||||
- Archivbibliothek (de)
|
||||
- biblioteca de archivo (es)
|
||||
- bibliothèque liée à une institution conservant des archives (fr)
|
||||
ARCHIVAL_REPOSITORY:
|
||||
description: digital repository for archival purposes
|
||||
meaning: wikidata:Q66656823
|
||||
comments:
|
||||
- Archivierungsstelle (de)
|
||||
- repositorio (es)
|
||||
ARCHIVE:
|
||||
description: agency or institution responsible for the preservation and communication of records
|
||||
selected for permanent preservation
|
||||
meaning: wikidata:Q166118
|
||||
comments:
|
||||
- Archiv (de)
|
||||
- archivo (es)
|
||||
- archives (fr)
|
||||
ARCHIVE_ASSOCIATION:
|
||||
description: Booster, history and heritage societies for archival institutions
|
||||
meaning: wikidata:Q130427366
|
||||
comments:
|
||||
- Archivverein (de)
|
||||
- Association des amis des archives (fr)
|
||||
ARCHIVE_NETWORK:
|
||||
description: consortium among archives for co-operation
|
||||
meaning: wikidata:Q96636857
|
||||
comments:
|
||||
- Archivverbund (de)
|
||||
- rete di archivi (it)
|
||||
ARCHIVE_OF_AN_INTERNATIONAL_ORGANIZATION:
|
||||
description: archive of an inter-governmental organization or of an international umbrella organization
|
||||
meaning: wikidata:Q27031014
|
||||
comments:
|
||||
- Archiv einer internationalen Organisation (de)
|
||||
- archives d'une organisation internationale (fr)
|
||||
ARCHIVES_FOR_BUILDING_RECORDS:
|
||||
description: Public archives for building records or construction documents
|
||||
meaning: wikidata:Q136027937
|
||||
comments:
|
||||
- Bauaktenarchiv (de)
|
||||
ARCHIVES_RÉGIONALES:
|
||||
description: archives régionales (Q2860567)
|
||||
meaning: wikidata:Q2860567
|
||||
comments:
|
||||
- Regionsarchiv (Frankreich) (de)
|
||||
- archives régionales (fr)
|
||||
ART_ARCHIVE:
|
||||
description: specialized archive
|
||||
meaning: wikidata:Q27032254
|
||||
comments:
|
||||
- Kunstarchiv (de)
|
||||
- archivo de arte (es)
|
||||
- archives artistiques (fr)
|
||||
ASSOCIATION_ARCHIVE:
|
||||
description: association archive (Q27030820)
|
||||
meaning: wikidata:Q27030820
|
||||
comments:
|
||||
- Verbandsarchiv (de)
|
||||
- archivo de asociación (es)
|
||||
- archives associatives (fr)
|
||||
AUDIOVISUAL_ARCHIVE:
|
||||
description: archive that contains audio-visual materials
|
||||
meaning: wikidata:Q27030766
|
||||
comments:
|
||||
- audio-visuelles Archiv (de)
|
||||
- archivo audiovisual (es)
|
||||
- archive audiovisuelle (fr)
|
||||
BANK_ARCHIVE:
|
||||
description: bank archive (Q52718263)
|
||||
meaning: wikidata:Q52718263
|
||||
comments:
|
||||
- Bankarchiv (de)
|
||||
- archivo bancario (es)
|
||||
- archives bancaires (fr)
|
||||
BILDSTELLE:
|
||||
description: German institutions that build and manage collections of visual media for teaching
|
||||
and research
|
||||
meaning: wikidata:Q861125
|
||||
comments:
|
||||
- Bildstelle (de)
|
||||
BRANCH:
|
||||
description: local subdivision of an organization
|
||||
meaning: wikidata:Q232846
|
||||
comments:
|
||||
- Zweigniederlassung (de)
|
||||
- branche (fr)
|
||||
BRANCH_OFFICE:
|
||||
description: outlet of an organization or a company that – unlike a subsidiary – does not constitute
|
||||
a separate legal entity, while being physically separated from the organization's main office
|
||||
meaning: wikidata:Q1880737
|
||||
comments:
|
||||
- Filiale (de)
|
||||
- sucursal (es)
|
||||
- succursale (fr)
|
||||
CANTONAL_ARCHIVE:
|
||||
description: state archives of one of the cantons of Switzerland
|
||||
meaning: wikidata:Q2860410
|
||||
comments:
|
||||
- Kantonsarchiv (de)
|
||||
- archivo cantonal (es)
|
||||
- archives cantonales (fr)
|
||||
CAST_COLLECTION:
|
||||
description: art-historical or archeological collection, usually for education, where copies,
|
||||
usually of gypsum, of art works are collected and shown
|
||||
meaning: wikidata:Q29380643
|
||||
comments:
|
||||
- Abgusssammlung (de)
|
||||
- Afgietsel verzameling (nl)
|
||||
CATHEDRAL_ARCHIVE:
|
||||
description: cathedral archive (Q132201761)
|
||||
meaning: wikidata:Q132201761
|
||||
comments:
|
||||
- archivo catedralicio (es)
|
||||
CHURCH_ARCHIVE:
|
||||
description: archive for church books about a parish
|
||||
meaning: wikidata:Q64166606
|
||||
comments:
|
||||
- Kirchenarchiv (Schweden) (de)
|
||||
- archives paroissiales (fr)
|
||||
- kerkarchief (nl)
|
||||
CHURCH_ARCHIVE_1:
|
||||
description: archive kept by a church or ecclesiastical organisation
|
||||
meaning: wikidata:Q2877653
|
||||
comments:
|
||||
- Kirchenarchiv (de)
|
||||
- archivo eclesiástico (es)
|
||||
- archives ecclésiastiques (fr)
|
||||
CINEMATHEQUE:
|
||||
description: organisation responsible for preserving and restoring cinematographic heritage
|
||||
meaning: wikidata:Q1352795
|
||||
comments:
|
||||
- Kinemathek (de)
|
||||
- filmoteca (es)
|
||||
- cinémathèque (fr)
|
||||
CLIMATE_ARCHIVE:
|
||||
description: archive that provides information about the climatic past
|
||||
meaning: wikidata:Q1676725
|
||||
comments:
|
||||
- Klimaarchiv (de)
|
||||
CLOSED_SPACE:
|
||||
description: an abstract space with borders
|
||||
meaning: wikidata:Q78642244
|
||||
comments:
|
||||
- geschlossener Raum (de)
|
||||
- espacio cerrado (es)
|
||||
- spazio chiuso (it)
|
||||
COLLECTING_ARCHIVES:
|
||||
description: archive that collects materials from multiple sources
|
||||
meaning: wikidata:Q117246276
|
||||
COMARCAL_ARCHIVE:
|
||||
description: comarcal archive (Q21086734)
|
||||
meaning: wikidata:Q21086734
|
||||
comments:
|
||||
- Bezirksarchiv (Katalonien) (de)
|
||||
- archivo comarcal (es)
|
||||
COMMUNITY_ARCHIVE:
|
||||
description: archive created by individuals and community groups who desire to document their
|
||||
cultural heritage
|
||||
meaning: wikidata:Q25105971
|
||||
comments:
|
||||
- Gemeinschaftsarchiv (de)
|
||||
- archivo comunitario (es)
|
||||
- archives communautaires (fr)
|
||||
COMPANY_ARCHIVES:
|
||||
description: organizational entity that keeps or archives fonds of a company
|
||||
meaning: wikidata:Q10605195
|
||||
comments:
|
||||
- Unternehmensarchiv (de)
|
||||
- archivo empresarial (es)
|
||||
- archives d'entreprise (fr)
|
||||
CONSERVATÓRIA:
|
||||
description: Conservatória (Q9854379)
|
||||
meaning: wikidata:Q9854379
|
||||
COUNTY_RECORD_OFFICE:
|
||||
description: local authority repository
|
||||
meaning: wikidata:Q5177943
|
||||
comments:
|
||||
- archivio pubblico territoriale (it)
|
||||
COURT_RECORDS:
|
||||
description: court records (Q11906844)
|
||||
meaning: wikidata:Q11906844
|
||||
comments:
|
||||
- Justizarchiv (de)
|
||||
- archivo judicial (es)
|
||||
- archives judiciaires (fr)
|
||||
CULTURAL_INSTITUTION:
|
||||
description: organization that works for the preservation or promotion of culture
|
||||
meaning: wikidata:Q3152824
|
||||
comments:
|
||||
- kulturelle Organisation (de)
|
||||
- institución cultural (es)
|
||||
- institution culturelle (fr)
|
||||
CURRENT_ARCHIVE:
|
||||
description: type of archive
|
||||
meaning: wikidata:Q3621648
|
||||
comments:
|
||||
- archivo corriente (es)
|
||||
- archive courante (fr)
|
||||
- archivio corrente (it)
|
||||
DARK_ARCHIVE:
|
||||
description: collection of materials preserved for future use but with no current access
|
||||
meaning: wikidata:Q112796578
|
||||
comments:
|
||||
- Dark Archive (de)
|
||||
DEPARTMENT:
|
||||
description: office within an organization
|
||||
meaning: wikidata:Q2366457
|
||||
comments:
|
||||
- Abteilung (de)
|
||||
- departamento (es)
|
||||
- service (fr)
|
||||
DEPARTMENTAL_ARCHIVES:
|
||||
description: departmental archives in France
|
||||
meaning: wikidata:Q2860456
|
||||
comments:
|
||||
- Département-Archiv (de)
|
||||
- archivos departamentales (es)
|
||||
- archives départementales (fr)
|
||||
DEPOSIT_ARCHIVE:
|
||||
description: part of an archive
|
||||
meaning: wikidata:Q244904
|
||||
comments:
|
||||
- Zwischenarchiv (de)
|
||||
- archivo de depósito (es)
|
||||
- archive intermédiaire (fr)
|
||||
DIGITAL_ARCHIVE:
|
||||
description: information system whose aim is to collect different digital resources and to make
|
||||
them available to a defined group of users
|
||||
meaning: wikidata:Q1224984
|
||||
comments:
|
||||
- digitales Archiv (de)
|
||||
- archivo digital (es)
|
||||
- archives numériques (fr)
|
||||
DIM_ARCHIVES:
|
||||
description: archive with only limited access
|
||||
meaning: wikidata:Q112796779
|
||||
comments:
|
||||
- Dim Archive (de)
|
||||
DIOCESAN_ARCHIVE:
|
||||
description: archive of a bishopric
|
||||
meaning: wikidata:Q11906839
|
||||
comments:
|
||||
- Bischöfliches Archiv (de)
|
||||
- archivo diocesano (es)
|
||||
- archives diocésaines (fr)
|
||||
DISTRICT_ARCHIVE_GERMANY:
|
||||
description: Archive type in Germany
|
||||
meaning: wikidata:Q130757255
|
||||
comments:
|
||||
- Kreisarchiv (de)
|
||||
DISTRITAL_ARCHIVE:
|
||||
description: distrital archives in Portugal
|
||||
meaning: wikidata:Q10296259
|
||||
comments:
|
||||
- Bezirksarchiv (Portugal) (de)
|
||||
DIVISION:
|
||||
description: distinct and large part of an organization
|
||||
meaning: wikidata:Q334453
|
||||
comments:
|
||||
- Abteilung (de)
|
||||
- división (es)
|
||||
- division (fr)
|
||||
DOCUMENTATION_CENTRE:
|
||||
description: organisation that deals with documentation
|
||||
meaning: wikidata:Q2945282
|
||||
comments:
|
||||
- Dokumentationszentrum (de)
|
||||
- centro de documentación (es)
|
||||
- centre de documentation (fr)
|
||||
ECONOMIC_ARCHIVE:
|
||||
description: archive documenting the economic history of a country, region etc.
|
||||
meaning: wikidata:Q27032167
|
||||
comments:
|
||||
- Wirtschaftsarchiv (de)
|
||||
- archivo económico (es)
|
||||
- archives économiques (fr)
|
||||
FILM_ARCHIVE:
|
||||
description: archive that safeguards film heritage
|
||||
meaning: wikidata:Q726929
|
||||
comments:
|
||||
- Filmarchiv (de)
|
||||
- archivo fílmico (es)
|
||||
- archives cinématographiques (fr)
|
||||
FOUNDATION_ARCHIVE:
|
||||
description: foundation archive (Q27030827)
|
||||
meaning: wikidata:Q27030827
|
||||
comments:
|
||||
- Stiftungsarchiv (de)
|
||||
- archivo de fundación (es)
|
||||
FREE_ARCHIVE:
|
||||
description: Archive that preserves documents on the history of a social movement
|
||||
meaning: wikidata:Q635801
|
||||
comments:
|
||||
- freies Archiv (de)
|
||||
- archivio libero (it)
|
||||
FRENCH_PRIVATE_ARCHIVES:
|
||||
description: non-public archives in France
|
||||
meaning: wikidata:Q2860565
|
||||
comments:
|
||||
- Privatarchiv (Frankreich) (de)
|
||||
- archives privées en France (fr)
|
||||
FYLKESARKIV:
|
||||
description: fylkesarkiv (Q15119463)
|
||||
meaning: wikidata:Q15119463
|
||||
FÖREMÅLSARKIV:
|
||||
description: Föremålsarkiv (Q10501208)
|
||||
meaning: wikidata:Q10501208
|
||||
GLAM:
|
||||
description: acronym for "galleries, libraries, archives, and museums" that refers to cultural
|
||||
institutions that have access to knowledge as their mission
|
||||
meaning: wikidata:Q1030034
|
||||
comments:
|
||||
- GLAM (de)
|
||||
- GLAM (es)
|
||||
- GLAM (fr)
|
||||
GOVERNMENT_ARCHIVE:
|
||||
description: official archive of a government
|
||||
meaning: wikidata:Q119712417
|
||||
comments:
|
||||
- Staatsarchiv (de)
|
||||
- archivos gubernamentales (es)
|
||||
- archives gouvernementales (fr)
|
||||
HISTORICAL_ARCHIVE:
|
||||
description: historical archive (Q3621673)
|
||||
meaning: wikidata:Q3621673
|
||||
comments:
|
||||
- Historisches Archiv (de)
|
||||
- archivo histórico (es)
|
||||
- archive historique (fr)
|
||||
HOSPITAL_ARCHIVE:
|
||||
description: hospital archive (Q17301917)
|
||||
meaning: wikidata:Q17301917
|
||||
comments:
|
||||
- Krankenhausarchiv (de)
|
||||
- archivo hospitalario (es)
|
||||
- archives hospitalières (fr)
|
||||
HOUSE_ARCHIVE:
|
||||
description: archive containing documents and letters that concern a family
|
||||
meaning: wikidata:Q4344572
|
||||
comments:
|
||||
- Familienarchiv (de)
|
||||
- archivo familiar (es)
|
||||
- archives familiales (fr)
|
||||
ICONOGRAPHIC_ARCHIVES:
|
||||
description: archives containing predominantly pictorial materials
|
||||
meaning: wikidata:Q117810712
|
||||
INSTITUTION:
|
||||
description: structure or mechanism of social order and cooperation governing the behaviour of
|
||||
a set of individuals within a given community
|
||||
meaning: wikidata:Q178706
|
||||
comments:
|
||||
- Institution (de)
|
||||
- institución (es)
|
||||
- institution sociale (fr)
|
||||
INSTITUTIONAL_ARCHIVE:
|
||||
description: repository that holds records created or received by its parent institution
|
||||
meaning: wikidata:Q124762372
|
||||
comments:
|
||||
- Institutionsarchiv (de)
|
||||
- archivo institucional (es)
|
||||
INSTITUTIONAL_REPOSITORY:
|
||||
description: archive of publications by an institution's staff
|
||||
meaning: wikidata:Q1065413
|
||||
comments:
|
||||
- Instituts-Repository (de)
|
||||
- repositorio institucional (es)
|
||||
- dépôt institutionnel (fr)
|
||||
JOINT_ARCHIVES:
|
||||
description: archive containing records or two or more entities
|
||||
meaning: wikidata:Q117442301
|
||||
comments:
|
||||
- Gemeinsames Archiv (de)
|
||||
KUSTODIE:
|
||||
description: Archives and administration of art collections in higher educational institutions
|
||||
meaning: wikidata:Q58482422
|
||||
comments:
|
||||
- Kustodie (de)
|
||||
LANDSARKIV:
|
||||
description: Landsarkiv (Q16324008)
|
||||
meaning: wikidata:Q16324008
|
||||
comments:
|
||||
- Landesarchiv (de)
|
||||
LGBT_ARCHIVE:
|
||||
description: archive related to LGBT topics
|
||||
meaning: wikidata:Q61710689
|
||||
comments:
|
||||
- LGBT-Archiv (de)
|
||||
- archivo LGBT (es)
|
||||
- archives LGBT (fr)
|
||||
LIGHT_ARCHIVES:
|
||||
description: repository whose holdings are broadly accessible
|
||||
meaning: wikidata:Q112815447
|
||||
comments:
|
||||
- Light Archive (de)
|
||||
LITERARY_ARCHIVE:
|
||||
description: archive for literary works
|
||||
meaning: wikidata:Q28607652
|
||||
comments:
|
||||
- Literaturarchiv (de)
|
||||
- archivo literario (es)
|
||||
- archives littéraires (fr)
|
||||
LOCAL_GOVERNMENT_ARCHIVE:
|
||||
description: archive of records belonging to a local government
|
||||
meaning: wikidata:Q118281267
|
||||
comments:
|
||||
- Kommunalarchiv (de)
|
||||
LOCAL_HERITAGE_INSTITUTION_IN_SWEDEN:
|
||||
description: a Swedish type of local history and cultural heritage museums
|
||||
meaning: wikidata:Q10520688
|
||||
comments:
|
||||
- Heimatmuseen in Schweden (de)
|
||||
- Hembygdsgård (nl)
|
||||
LOCAL_HISTORY_ARCHIVE:
|
||||
description: archive dealing with local history
|
||||
meaning: wikidata:Q12324798
|
||||
comments:
|
||||
- Lokalarchiv (de)
|
||||
- archivo de historia local (es)
|
||||
- archives d'histoire locale (fr)
|
||||
LOCATION_LIBRARY:
|
||||
description: a collection of visual and references information of locations, or places that might
|
||||
be used for filming or photography.
|
||||
meaning: wikidata:Q6664811
|
||||
comments:
|
||||
- biblioteca de localizaciones (es)
|
||||
MAILING_LIST_ARCHIVE:
|
||||
description: mailing list archive (Q104018626)
|
||||
meaning: wikidata:Q104018626
|
||||
comments:
|
||||
- Archiv der Mailingliste (de)
|
||||
- archive de la liste de diffusion (fr)
|
||||
- archief van mailinglijst (nl)
|
||||
MEDIA_ARCHIVE:
|
||||
description: media archive (Q116809817)
|
||||
meaning: wikidata:Q116809817
|
||||
comments:
|
||||
- Medienarchiv (de)
|
||||
- archives de médias (fr)
|
||||
- media-achief (nl)
|
||||
MEDIENZENTRUM:
|
||||
description: Medienzentrum (Q1284615)
|
||||
meaning: wikidata:Q1284615
|
||||
comments:
|
||||
- Medienzentrum (de)
|
||||
MEMORY_INSTITUTION:
|
||||
description: institution which has curatorial care over a collection and whose mission it is to
|
||||
preserve the collection for future generations
|
||||
meaning: wikidata:Q1497649
|
||||
comments:
|
||||
- Gedächtnisinstitution (de)
|
||||
- institución del patrimonio (es)
|
||||
- institution patrimoniale (fr)
|
||||
MILITARY_ARCHIVE:
|
||||
description: archive for documents regarding military topics
|
||||
meaning: wikidata:Q1934883
|
||||
comments:
|
||||
- Militärarchiv (de)
|
||||
- archivo militar (es)
|
||||
- archive militaire (fr)
|
||||
MONASTERY_ARCHIVE:
|
||||
description: archive of a monastery
|
||||
meaning: wikidata:Q27030561
|
||||
comments:
|
||||
- Klosterarchiv (de)
|
||||
- archivo monástico (es)
|
||||
MUNICIPAL_ARCHIVE:
|
||||
description: accumulation of historical records of a town or city
|
||||
meaning: wikidata:Q604177
|
||||
comments:
|
||||
- Stadt- oder Gemeindearchiv (de)
|
||||
- archivo municipal (es)
|
||||
- archives communales (fr)
|
||||
MUSEUM_ARCHIVE:
|
||||
description: archive established by a museum to collect, organize, preserve, and provide access
|
||||
to its organizational records
|
||||
meaning: wikidata:Q53566456
|
||||
comments:
|
||||
- Museumsarchiv (de)
|
||||
- archivo de museo (es)
|
||||
- museumarchief (nl)
|
||||
MUSIC_ARCHIVE:
|
||||
description: archive of musical recordings and documents
|
||||
meaning: wikidata:Q53759838
|
||||
comments:
|
||||
- Musikarchiv (de)
|
||||
- archivo musical (es)
|
||||
- archives musicales (fr)
|
||||
NACHLASS:
|
||||
description: collection of manuscripts, notes, correspondence, and so on left behind when a scholar
|
||||
or an artist dies
|
||||
meaning: wikidata:Q3827332
|
||||
comments:
|
||||
- Nachlass (de)
|
||||
- Nachlass (es)
|
||||
- archives formées du legs (fr)
|
||||
NATIONAL_ARCHIVES:
|
||||
description: archives of a country
|
||||
meaning: wikidata:Q2122214
|
||||
comments:
|
||||
- Nationalarchiv (de)
|
||||
- archivo nacional (es)
|
||||
- archives nationales (fr)
|
||||
NATIONAL_TREASURE:
|
||||
description: treasure or artifact that is regarded as emblematic as a nation's cultural heritage,
|
||||
identity or significance
|
||||
meaning: wikidata:Q60606520
|
||||
NATIONAL_TREASURE_OF_FRANCE:
|
||||
description: designation for entities of cultural significance in France
|
||||
meaning: wikidata:Q2986426
|
||||
comments:
|
||||
- trésor national (fr)
|
||||
NEWSPAPER_CLIPPINGS_ARCHIVE:
|
||||
description: archive of press clippings, organized by topics
|
||||
meaning: wikidata:Q65651503
|
||||
comments:
|
||||
- Zeitungsausschnittsarchiv (de)
|
||||
- archivo de recortes de periódicos (es)
|
||||
- tijdschriftenknipselarchief (nl)
|
||||
NOBILITY_ARCHIVE:
|
||||
description: collection of historical documents and information about members of the nobility
|
||||
meaning: wikidata:Q355358
|
||||
comments:
|
||||
- Adelsarchiv (de)
|
||||
- archivo nobiliario (es)
|
||||
- archive de noblesse (fr)
|
||||
NOTARIAL_ARCHIVE:
|
||||
description: type of archive housing notarial records
|
||||
meaning: wikidata:Q8203685
|
||||
comments:
|
||||
- Notariatsarchiv (de)
|
||||
- archivo notarial (es)
|
||||
- archives notariales (fr)
|
||||
ONLINE_NEWS_ARCHIVE:
|
||||
description: archive of newspapers, magazines and other periodicals that can be consulted online
|
||||
meaning: wikidata:Q2001867
|
||||
comments:
|
||||
- Zeitungsbank (de)
|
||||
- archivo de periódicos (es)
|
||||
- archives de journaux (fr)
|
||||
ORGANIZATION:
|
||||
description: social entity established to meet needs or pursue goals
|
||||
meaning: wikidata:Q43229
|
||||
comments:
|
||||
- Organisation (de)
|
||||
- organización (es)
|
||||
- organisation (fr)
|
||||
ORGANIZATIONAL_SUBDIVISION:
|
||||
description: organization that is a part of a larger organization
|
||||
meaning: wikidata:Q9261468
|
||||
comments:
|
||||
- Untereinheit (de)
|
||||
- subdivisión organizacional (es)
|
||||
- sous-division organisationnelle (fr)
|
||||
PARENT_ORGANIZATIONUNIT:
|
||||
description: organization that has a subsidiary unit, e.g. for companies, which owns enough voting
|
||||
stock in another firm to control management and operations
|
||||
meaning: wikidata:Q1956113
|
||||
comments:
|
||||
- Mutterunternehmen (de)
|
||||
- organización matriz (es)
|
||||
- société mère (fr)
|
||||
PARISH_ARCHIVE:
|
||||
description: parish archive (Q34544468)
|
||||
meaning: wikidata:Q34544468
|
||||
comments:
|
||||
- Pfarrarchiv (de)
|
||||
- archivo parroquial (es)
|
||||
- archivio parrocchiale (it)
|
||||
PARLIAMENTARY_ARCHIVES:
|
||||
description: political archives
|
||||
meaning: wikidata:Q53251146
|
||||
comments:
|
||||
- Parlamentsarchiv (de)
|
||||
- archivo parlamentario (es)
|
||||
- archives parlementaires (fr)
|
||||
PARTY_ARCHIVE:
|
||||
description: subclass of political archive
|
||||
meaning: wikidata:Q53252161
|
||||
comments:
|
||||
- Parteiarchiv (de)
|
||||
- archivo de partido político (es)
|
||||
PERFORMING_ARTS_ARCHIVE:
|
||||
description: performing arts archive (Q27030945)
|
||||
meaning: wikidata:Q27030945
|
||||
comments:
|
||||
- Archiv für darstellende Kunst (de)
|
||||
- archives des arts de la scène (fr)
|
||||
PERSON_OR_ORGANIZATION:
|
||||
description: class of agents
|
||||
meaning: wikidata:Q106559804
|
||||
comments:
|
||||
- Person oder Organisation (de)
|
||||
- persona u organización (es)
|
||||
- personne ou organisation (fr)
|
||||
PERSONAL_LIBRARY:
|
||||
description: the private library collection of an individual
|
||||
meaning: wikidata:Q106402388
|
||||
comments:
|
||||
- Autorenbibliothek (de)
|
||||
- biblioteca de autor (es)
|
||||
- bibliothèque personnelle (fr)
|
||||
PERSONENSTANDSARCHIV:
|
||||
description: Personenstandsarchiv (Q2072394)
|
||||
meaning: wikidata:Q2072394
|
||||
comments:
|
||||
- Personenstandsarchiv (de)
|
||||
PHOTO_ARCHIVE:
|
||||
description: physical image collection
|
||||
meaning: wikidata:Q27032363
|
||||
comments:
|
||||
- Fotoarchiv (de)
|
||||
- archivo fotográfico (es)
|
||||
- archive photographique (fr)
|
||||
PHOTOGRAPH_COLLECTION:
|
||||
description: photograph collection (Q130486108)
|
||||
meaning: wikidata:Q130486108
|
||||
comments:
|
||||
- Fotosammlung (de)
|
||||
- colección de fotografías (es)
|
||||
- collection de photographies (fr)
|
||||
POLITICAL_ARCHIVE:
|
||||
description: political archive (Q27030921)
|
||||
meaning: wikidata:Q27030921
|
||||
comments:
|
||||
- Politikarchiv (de)
|
||||
- archivo político (es)
|
||||
POSTCUSTODIAL_ARCHIVE:
|
||||
description: postcustodial archive (Q124223197)
|
||||
meaning: wikidata:Q124223197
|
||||
PRESS_ARCHIVE:
|
||||
description: collection of press, newspaper materials and content
|
||||
meaning: wikidata:Q56650887
|
||||
comments:
|
||||
- Pressearchiv (de)
|
||||
- archivo periodístico (es)
|
||||
- archives de presse (fr)
|
||||
PRINT_ROOM:
|
||||
description: collection of prints, and sometimes drawings, watercolours and photographs
|
||||
meaning: wikidata:Q445396
|
||||
comments:
|
||||
- Kupferstichkabinett (de)
|
||||
- gabinete de estampas (es)
|
||||
- cabinet des estampes (fr)
|
||||
PROVINCIAL_ARCHIVE:
|
||||
description: provincial archive (Q5403345)
|
||||
meaning: wikidata:Q5403345
|
||||
comments:
|
||||
- Provinzarchiv (de)
|
||||
PROVINCIAL_HISTORICAL_ARCHIVE:
|
||||
description: type of local archive
|
||||
meaning: wikidata:Q21087388
|
||||
comments:
|
||||
- Historisches Provinzarchiv (Katalonien) (de)
|
||||
- archivo histórico provincial (es)
|
||||
PUBLIC_ARCHIVE:
|
||||
description: repository for official documents
|
||||
meaning: wikidata:Q27031009
|
||||
comments:
|
||||
- Öffentliches Archiv (de)
|
||||
- archivo público (es)
|
||||
- archives publiques (fr)
|
||||
PUBLIC_ARCHIVES_IN_FRANCE:
|
||||
description: Type of archives in France
|
||||
meaning: wikidata:Q2421452
|
||||
comments:
|
||||
- Öffentliches Archiv (de)
|
||||
- archives publiques en France (fr)
|
||||
PUBLIC_SPACE:
|
||||
description: places for public use
|
||||
meaning: wikidata:Q294440
|
||||
comments:
|
||||
- öffentlicher Raum (de)
|
||||
- espacio público (es)
|
||||
- espace public (fr)
|
||||
RADIO_ARCHIVE:
|
||||
description: radio archive (Q109326271)
|
||||
meaning: wikidata:Q109326271
|
||||
comments:
|
||||
- Radioarchiv (de)
|
||||
- archivo radiofónico (es)
|
||||
- archives radiophoniques (fr)
|
||||
REGIONAL_ARCHIVE:
|
||||
description: archive with a regional scope
|
||||
meaning: wikidata:Q27032392
|
||||
comments:
|
||||
- Regionalarchiv (de)
|
||||
- archivo regional (es)
|
||||
- archives régionales (fr)
|
||||
REGIONAL_ARCHIVES_IN_ICELAND:
|
||||
description: regional archives in Iceland (Q16428785)
|
||||
meaning: wikidata:Q16428785
|
||||
comments:
|
||||
- Regionalarchiv (Island) (de)
|
||||
REGIONAL_ECONOMIC_ARCHIVE:
|
||||
description: archive documenting the economic history of a region
|
||||
meaning: wikidata:Q2138319
|
||||
comments:
|
||||
- regionales Wirtschaftsarchiv (de)
|
||||
- archivo económico regional (es)
|
||||
REGIONAL_HISTORIC_CENTER:
|
||||
description: name for archives in the Netherlands
|
||||
meaning: wikidata:Q1882512
|
||||
comments:
|
||||
- Regionalhistorisches Zentrum (de)
|
||||
- centre régional historique (fr)
|
||||
- Regionaal Historisch Centrum (nl)
|
||||
REGIONAL_STATE_ARCHIVES:
|
||||
description: regional state archives in Sweden
|
||||
meaning: wikidata:Q8727648
|
||||
comments:
|
||||
- Provinzarchiv (de)
|
||||
- archivo regional (es)
|
||||
- archives régionales (fr)
|
||||
RELIGIOUS_ARCHIVE:
|
||||
description: accumulation of records of a religious denomination or society
|
||||
meaning: wikidata:Q85545753
|
||||
comments:
|
||||
- Religionsarchiv (de)
|
||||
- archivo religioso (es)
|
||||
SCHOOL_ARCHIVE:
|
||||
description: school archive (Q27030883)
|
||||
meaning: wikidata:Q27030883
|
||||
comments:
|
||||
- Schularchiv (de)
|
||||
- archivo escolar (es)
|
||||
- archives scolaires (fr)
|
||||
SCIENTIFIC_ARCHIVE:
|
||||
description: archive created for academic purposes
|
||||
meaning: wikidata:Q27032095
|
||||
comments:
|
||||
- Forschungsarchiv (de)
|
||||
- archives scientifiques (fr)
|
||||
SCIENTIFIC_TECHNIC_AND_INDUSTRIAL_CULTURE_CENTER:
|
||||
description: popular science place in France
|
||||
meaning: wikidata:Q2945276
|
||||
comments:
|
||||
- centre de culture scientifique, technique et industrielle (fr)
|
||||
- centro di cultura scientifica, tecnica e industriale (it)
|
||||
- wetenschappelijk, technisch en industrieel cultuurcentrum (nl)
|
||||
SECTOR_OF_ARCHIVES_IN_SWEDEN:
|
||||
description: sector of archives
|
||||
meaning: wikidata:Q84171278
|
||||
comments:
|
||||
- Archivwesen in Schweden (de)
|
||||
SECURITY_ARCHIVES:
|
||||
description: type of archives in Czechia
|
||||
meaning: wikidata:Q101475797
|
||||
SOCIAL_SPACE:
|
||||
description: physical or virtual space such as a social center, online social media, or other
|
||||
gathering place where people gather and interact
|
||||
meaning: wikidata:Q4430275
|
||||
comments:
|
||||
- sozialer Raum (de)
|
||||
- espacio social (es)
|
||||
- espace social (fr)
|
||||
SOUND_ARCHIVE:
|
||||
description: collection of sounds
|
||||
meaning: wikidata:Q2230431
|
||||
comments:
|
||||
- Schallarchiv (de)
|
||||
- fonoteca (es)
|
||||
- phonothèque (fr)
|
||||
SPECIAL_COLLECTION:
|
||||
description: library or library unit that houses materials requiring specialized security and
|
||||
user services or whose relation (period, subject, etc.) is to be preserved
|
||||
meaning: wikidata:Q4431094
|
||||
comments:
|
||||
- Spezialsammlung (de)
|
||||
- colección especial (es)
|
||||
- fonds spéciaux (fr)
|
||||
SPECIALIZED_ARCHIVE:
|
||||
description: archive specialized in a specific field
|
||||
meaning: wikidata:Q27030941
|
||||
comments:
|
||||
- Facharchiv (de)
|
||||
- archivo especial (es)
|
||||
- archives spécialisées (fr)
|
||||
SPECIALIZED_ARCHIVES:
|
||||
description: type of archives in Czechia
|
||||
meaning: wikidata:Q101470010
|
||||
comments:
|
||||
- archivo especializado (es)
|
||||
- archives spécialisées (fr)
|
||||
STATE_ARCHIVES:
|
||||
description: archive of a state
|
||||
meaning: wikidata:Q52341833
|
||||
comments:
|
||||
- Staatsarchiv (de)
|
||||
- archivo estatal (es)
|
||||
- archives de l'État (fr)
|
||||
STATE_ARCHIVES_SECTION:
|
||||
description: section of a national archive in Italy
|
||||
meaning: wikidata:Q44796387
|
||||
comments:
|
||||
- Staatsarchiv-Abteilung (de)
|
||||
- sezione di archivio di Stato (it)
|
||||
- sectie staatsarchief (nl)
|
||||
STATE_DISTRICT_ARCHIVE:
|
||||
description: Archive type in the Czech Republic
|
||||
meaning: wikidata:Q53131316
|
||||
comments:
|
||||
- Bezirksarchiv (Tschechien) (de)
|
||||
STATE_REGIONAL_ARCHIVE_CZECHIA:
|
||||
description: state regional archive (Czechia) (Q53130134)
|
||||
meaning: wikidata:Q53130134
|
||||
SUBSIDIARY_ORGANIZATION:
|
||||
description: entity or organization administered by a larger entity or organization
|
||||
meaning: wikidata:Q62079110
|
||||
comments:
|
||||
- Tochterorganisation (de)
|
||||
- entidad subsidiaria (es)
|
||||
- entité subsidiaire (fr)
|
||||
TELEVISION_ARCHIVE:
|
||||
description: a collection of television programs, recordings, and broadcasts
|
||||
meaning: wikidata:Q109326243
|
||||
comments:
|
||||
- Fernseharchiv (de)
|
||||
- archivo de televisión (es)
|
||||
- archives télévisuelles (fr)
|
||||
TENTATIVE_WORLD_HERITAGE_SITE:
|
||||
description: Wikimedia list article
|
||||
meaning: wikidata:Q1459900
|
||||
comments:
|
||||
- Tentativliste (de)
|
||||
- lista indicativa del Patrimonio de la Humanidad (es)
|
||||
- liste indicative du patrimoine mondial (fr)
|
||||
TRADE_UNION_ARCHIVE:
|
||||
description: archive formed by the documentation of the labor organisations
|
||||
meaning: wikidata:Q66604802
|
||||
comments:
|
||||
- Gewerkschaftsarchiv (de)
|
||||
UNIVERSITY_ARCHIVE:
|
||||
description: collection of historical records of a college or university
|
||||
meaning: wikidata:Q2496264
|
||||
comments:
|
||||
- Universitätsarchiv (de)
|
||||
- archivo universitario (es)
|
||||
- archives universitaires (fr)
|
||||
VEREINSARCHIV:
|
||||
description: Vereinsarchiv (Q130758889)
|
||||
meaning: wikidata:Q130758889
|
||||
comments:
|
||||
- Vereinsarchiv (de)
|
||||
VERLAGSARCHIV:
|
||||
description: Verlagsarchiv (Q130759004)
|
||||
meaning: wikidata:Q130759004
|
||||
comments:
|
||||
- Verlagsarchiv (de)
|
||||
VERWALTUNGSARCHIV:
|
||||
description: Subclass of archives
|
||||
meaning: wikidata:Q2519292
|
||||
comments:
|
||||
- Verwaltungsarchiv (de)
|
||||
VIRTUAL_MAP_LIBRARY:
|
||||
description: type of library for virtual maps or cartographic products
|
||||
meaning: wikidata:Q5995078
|
||||
comments:
|
||||
- Virtuelle Kartenbibliothek (de)
|
||||
- Mapoteca virtual (es)
|
||||
WEB_ARCHIVE:
|
||||
description: publication type, collection of preserved web pages
|
||||
meaning: wikidata:Q30047053
|
||||
comments:
|
||||
- Webarchiv (de)
|
||||
- archivo web (es)
|
||||
- archive du Web (fr)
|
||||
WOMENS_ARCHIVES:
|
||||
description: archives of documents and records written by and about women
|
||||
meaning: wikidata:Q130217628
|
||||
comments:
|
||||
- Frauenarchiv (de)
|
||||
WORLD_HERITAGE_SITE:
|
||||
description: place of significance listed by UNESCO
|
||||
meaning: wikidata:Q9259
|
||||
comments:
|
||||
- UNESCO-Welterbe (de)
|
||||
- Patrimonio de la Humanidad (es)
|
||||
- patrimoine mondial (fr)
|
||||
|
|
@ -0,0 +1,204 @@
|
|||
id: https://nde.nl/ontology/hc/enum/CustodianPrimaryTypeEnum
|
||||
name: CustodianPrimaryTypeEnum
|
||||
title: GLAMORCUBESFIXPHDNT Primary Type Categories
|
||||
|
||||
description: |
|
||||
Top-level classification of heritage custodian types using the
|
||||
GLAMORCUBESFIXPHDNT taxonomy (19 categories).
|
||||
|
||||
**Mnemonic**: GLAMORCUBESFIXPHDNT
|
||||
- **G**alleries
|
||||
- **L**ibraries
|
||||
- **A**rchives
|
||||
- **M**useums
|
||||
- **O**fficial institutions
|
||||
- **R**esearch centers
|
||||
- **C**orporations (commercial)
|
||||
- **U**nknown/unspecified
|
||||
- **B**otanical gardens/zoos (bio custodians)
|
||||
- **E**ducation providers
|
||||
- **S**ocieties (heritage/collecting societies)
|
||||
- **F**eatures (geographic features AS custodians)
|
||||
- **I**ntangible heritage groups
|
||||
- mi**X**ed (multiple types)
|
||||
- **P**ersonal collections
|
||||
- **H**oly/sacred sites
|
||||
- **D**igital platforms
|
||||
- **N**GOs (non-profit organizations)
|
||||
- **T**aste/smell heritage
|
||||
|
||||
Each category has specialized subclasses with Wikidata-derived enum values.
|
||||
|
||||
enums:
|
||||
CustodianPrimaryTypeEnum:
|
||||
permissible_values:
|
||||
GALLERY:
|
||||
description: "Art gallery or exhibition space (Q118554787, Q1007870)"
|
||||
meaning: wikidata:Q118554787
|
||||
comments:
|
||||
- "Visual arts organizations"
|
||||
- "Exhibition spaces (may or may not hold permanent collections)"
|
||||
- "Kunsthallen, art galleries, visual arts centers"
|
||||
|
||||
LIBRARY:
|
||||
description: "Library - institution preserving and providing access to books and documents (Q7075)"
|
||||
meaning: wikidata:Q7075
|
||||
comments:
|
||||
- "Public libraries, academic libraries, national libraries"
|
||||
- "Special libraries, digital libraries"
|
||||
- "Includes bibliotheken, bibliotecas, bibliothèques"
|
||||
|
||||
ARCHIVE:
|
||||
description: "Archive - institution preserving historical documents and records (Q166118)"
|
||||
meaning: wikidata:Q166118
|
||||
comments:
|
||||
- "National archives, city archives, corporate archives"
|
||||
- "Government archives, religious archives"
|
||||
- "Includes archieven, archivos, archives"
|
||||
|
||||
MUSEUM:
|
||||
description: "Museum - institution preserving and exhibiting cultural or scientific collections (Q33506)"
|
||||
meaning: wikidata:Q33506
|
||||
comments:
|
||||
- "Art museums, history museums, natural history museums"
|
||||
- "Science museums, ethnographic museums, local museums"
|
||||
- "Includes musea, museos, musées, museums"
|
||||
|
||||
OFFICIAL_INSTITUTION:
|
||||
description: "Government heritage agency, platform, or official cultural institution (Q895526)"
|
||||
meaning: wikidata:Q895526
|
||||
comments:
|
||||
- "Provincial heritage services"
|
||||
- "Heritage aggregation platforms"
|
||||
- "Government cultural agencies"
|
||||
- "TOOI: tooi:Overheidsorganisatie (Dutch government)"
|
||||
- "CPOV: cpov:PublicOrganisation (EU public sector)"
|
||||
|
||||
RESEARCH_CENTER:
|
||||
description: "Research organization or documentation center (Q136410232)"
|
||||
meaning: wikidata:Q136410232
|
||||
comments:
|
||||
- "Research institutes with heritage collections"
|
||||
- "Documentation centers"
|
||||
- "University research units"
|
||||
- "Policy institutes with archives"
|
||||
|
||||
COMMERCIAL:
|
||||
description: "Corporation or business with heritage collections (Q21980538)"
|
||||
meaning: wikidata:Q21980538
|
||||
comments:
|
||||
- "Company archives"
|
||||
- "Corporate museums"
|
||||
- "Brand heritage centers"
|
||||
- "ROV: rov:RegisteredOrganization (if legally registered)"
|
||||
|
||||
UNSPECIFIED:
|
||||
description: "Institution type cannot be determined (data quality flag)"
|
||||
comments:
|
||||
- "NOT a real institution type - indicates missing/ambiguous data"
|
||||
- "Should be resolved during data curation"
|
||||
- "NOT mapped to Wikidata"
|
||||
|
||||
BIO_CUSTODIAN:
|
||||
description: "Botanical garden, zoo, aquarium, or living collections (Q473972, Q23790, Q43501)"
|
||||
meaning: wikidata:Q473972
|
||||
comments:
|
||||
- "Botanical gardens (Q473972)"
|
||||
- "Zoological gardens (Q23790)"
|
||||
- "Arboreta (Q43501)"
|
||||
- "Herbaria (Q2982911)"
|
||||
- "Aquariums (Q4915239)"
|
||||
|
||||
EDUCATION_PROVIDER:
|
||||
description: "Educational institution with heritage collections (Q5341295)"
|
||||
meaning: wikidata:Q5341295
|
||||
comments:
|
||||
- "Universities with archives or collections"
|
||||
- "Schools with historical materials"
|
||||
- "Training centers preserving educational heritage"
|
||||
- "Schema.org: schema:EducationalOrganization, schema:CollegeOrUniversity"
|
||||
|
||||
HERITAGE_SOCIETY:
|
||||
description: "Historical society, heritage society, or collecting society (Q5774403, Q10549978)"
|
||||
meaning: wikidata:Q5774403
|
||||
comments:
|
||||
- "Historical societies (Q5774403)"
|
||||
- "Heritage societies / heemkundige kring (Q10549978)"
|
||||
- "Philatelic societies (Q955824)"
|
||||
- "Numismatic clubs"
|
||||
- "Ephemera collectors"
|
||||
|
||||
FEATURE_CUSTODIAN:
|
||||
description: "Geographic feature that IS the heritage custodian (special case)"
|
||||
comments:
|
||||
- "SPECIAL: Also links to FeaturePlace (dual aspect)"
|
||||
- "Used when custodian IS a geofeature (e.g., historic mansion as museum)"
|
||||
- "Examples: Q1802963 (mansion), Q44539 (temple), Q16560 (palace)"
|
||||
- "Requires BOTH custodian_type AND custodian_place.place_type"
|
||||
|
||||
INTANGIBLE_HERITAGE_GROUP:
|
||||
description: "Organization preserving intangible cultural heritage (Q105815710)"
|
||||
meaning: wikidata:Q105815710
|
||||
comments:
|
||||
- "Traditional performance groups"
|
||||
- "Oral history societies"
|
||||
- "Folklore organizations"
|
||||
- "Indigenous cultural practice groups"
|
||||
- "UNESCO intangible cultural heritage"
|
||||
|
||||
MIXED:
|
||||
description: "Institution with multiple simultaneous type classifications"
|
||||
comments:
|
||||
- "GHCID uses 'X' code"
|
||||
- "actual_types slot documents all applicable types"
|
||||
- "Example: Combined museum/archive/library facility"
|
||||
|
||||
PERSONAL_COLLECTION:
|
||||
description: "Private personal collection managed by individual collector (Q134886297)"
|
||||
meaning: wikidata:Q134886297
|
||||
comments:
|
||||
- "Individual collectors"
|
||||
- "Family archives"
|
||||
- "Private art collections (non-commercial)"
|
||||
- "Distinguished from commercial galleries"
|
||||
|
||||
HOLY_SACRED_SITE:
|
||||
description: "Religious site with heritage collections (Q4588528)"
|
||||
meaning: wikidata:Q4588528
|
||||
comments:
|
||||
- "Church archives (parish records, baptismal registers)"
|
||||
- "Monastery libraries (manuscript collections)"
|
||||
- "Cathedral treasuries (liturgical objects, religious art)"
|
||||
- "Temple museums (Buddhist artifacts)"
|
||||
- "Mosque libraries (Islamic manuscripts)"
|
||||
- "Synagogue archives (Jewish community records)"
|
||||
- "Schema.org: schema:PlaceOfWorship"
|
||||
|
||||
DIGITAL_PLATFORM:
|
||||
description: "Born-digital heritage platform or online repository (Q28017710)"
|
||||
meaning: wikidata:Q28017710
|
||||
comments:
|
||||
- "Online archives (Internet Archive)"
|
||||
- "Digital libraries (HathiTrust)"
|
||||
- "Heritage aggregators (Europeana, DPLA)"
|
||||
- "Virtual museums"
|
||||
- "Schema.org: schema:WebSite, schema:SoftwareApplication"
|
||||
|
||||
NON_PROFIT:
|
||||
description: "Non-governmental heritage organization (Q163740)"
|
||||
meaning: wikidata:Q163740
|
||||
comments:
|
||||
- "Heritage preservation NGOs"
|
||||
- "Cultural advocacy organizations"
|
||||
- "Conservation societies managing heritage sites"
|
||||
- "Schema.org: schema:NGO"
|
||||
|
||||
TASTE_SCENT_HERITAGE:
|
||||
description: "Organization preserving culinary or olfactory heritage"
|
||||
comments:
|
||||
- "Historic restaurants preserving culinary traditions"
|
||||
- "Parfumeries with historic formulation archives"
|
||||
- "Distilleries maintaining traditional production methods"
|
||||
- "Culinary heritage museums"
|
||||
- "Potential Wikidata: Q11707 (restaurant), Q185329 (perfumery), Q131734 (distillery)"
|
||||
- "NEW CATEGORY - not yet formally recognized in Wikidata"
|
||||
|
|
@ -13,6 +13,7 @@ When an enum is converted to a class hierarchy, the original enum file is:
|
|||
|
||||
| File | Archived Date | Replaced By | Rationale |
|
||||
|------|--------------|-------------|-----------|
|
||||
| `ArchiveTypeEnum.yaml.archived_20250105` | 2025-01-05 | 96 archive class files (e.g., `AcademicArchive.yaml`, `MunicipalArchive.yaml`) | 144 enum values replaced by class hierarchy with dual-class pattern (custodian type + rico:RecordSetType), rich ontology mappings (Schema.org, RiC-O, CIDOC-CRM, Wikidata), and multilingual labels. Enum contained non-archive types (BRANCH, DEPARTMENT, ORGANIZATION) that didn't belong. |
|
||||
| `StaffRoleTypeEnum.yaml.archived_20251206` | 2025-12-06 | `StaffRole.yaml`, `StaffRoles.yaml` | Enum promoted to class hierarchy to capture formal title vs de facto work distinction, enable rich properties (role_category, common_variants, typical_domains) |
|
||||
|
||||
## See Also
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# Enum Instances Index
|
||||
# Generated: 2025-11-30
|
||||
# Updated: 2025-12-06 (Session 4 - WebPortalTypeEnum migrated to class hierarchy)
|
||||
# Updated: 2026-01-05 (Session - CustodianPrimaryTypeEnum migrated to CustodianType class hierarchy)
|
||||
#
|
||||
# This file provides a manifest of all enum instance files
|
||||
# for programmatic loading by the frontend, RDF generators, and UML tools.
|
||||
|
|
@ -12,29 +12,31 @@ description: |
|
|||
Each enum value is represented as a rich instance with extended metadata,
|
||||
ontology mappings, Wikidata links, and documentation for enrichment.
|
||||
|
||||
version: "1.8.0"
|
||||
version: "1.9.0"
|
||||
generated: "2025-11-30T00:00:00Z"
|
||||
last_updated: "2025-12-06T00:00:00Z"
|
||||
last_updated: "2026-01-05T00:00:00Z"
|
||||
|
||||
# Statistics
|
||||
statistics:
|
||||
total_enums: 29 # Was 30, SocialMediaPlatformTypeEnum migrated to class hierarchy
|
||||
completed_instances: 24 # Was 25, one more enum migrated to class hierarchy
|
||||
total_values_elaborated: 623+ # Was 650+, minus 27 SocialMediaPlatformTypeEnum values
|
||||
total_values_estimated: 650+
|
||||
total_enums: 28 # Was 29, CustodianPrimaryTypeEnum migrated to CustodianType class hierarchy
|
||||
completed_instances: 23 # Was 24, one more enum migrated to class hierarchy
|
||||
total_values_elaborated: 604+ # Was 623+, minus 19 CustodianPrimaryTypeEnum values
|
||||
total_values_estimated: 630+
|
||||
with_wikidata_mapping: 93%
|
||||
with_ontology_mapping: 96%
|
||||
|
||||
# Completed Enum Instance Files
|
||||
completed:
|
||||
# === Original 8 (Session 1) ===
|
||||
- id: 1
|
||||
name: Custodian Type Classification
|
||||
file: custodian_primary_type.yaml
|
||||
enum: CustodianPrimaryTypeEnum
|
||||
count: 19
|
||||
status: completed
|
||||
description: "GLAMORCUBESFIXPHDNT taxonomy - top-level heritage custodian categories"
|
||||
|
||||
# ID 1 (CustodianPrimaryTypeEnum) - MIGRATED to CustodianType class hierarchy
|
||||
# See: modules/classes/CustodianType.yaml and 19 specialized subclasses
|
||||
# Archived: archive/enums/CustodianPrimaryTypeEnum.yaml.archived_20260105
|
||||
# Instance archived: instances/enums/archive/custodian_primary_type.yaml.archived_20260105
|
||||
# Migration date: 2026-01-05
|
||||
# Rationale: Enum promoted to class hierarchy per Rule 9 (Enum-to-Class Promotion).
|
||||
# CustodianType subclasses support rich properties (wikidata_entity,
|
||||
# custodian_type_broader, etc.) and inheritance. Single Source of Truth principle.
|
||||
|
||||
- id: 2
|
||||
name: Organizational Change Events
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
|
|
@ -19,19 +19,7 @@ classes:
|
|||
AcademicArchive:
|
||||
is_a: ArchiveOrganizationType
|
||||
class_uri: schema:ArchiveOrganization
|
||||
description: |
|
||||
Archive of a higher education institution (university, college, polytechnic).
|
||||
|
||||
**Dual-Class Pattern**:
|
||||
This class represents the CUSTODIAN type (the archive organization).
|
||||
For the collection type, see `AcademicArchiveRecordSetType` which maps to `rico:RecordSetType`.
|
||||
|
||||
**Holdings** (linked via rico:isOrWasHolderOf):
|
||||
Academic archives typically hold records classified under these RecordSetTypes:
|
||||
- UniversityAdministrativeFonds - Governance, committee, policy records
|
||||
- StudentRecordSeries - Enrollment, transcripts, graduation records
|
||||
- FacultyPaperCollection - Personal papers of faculty members
|
||||
- CampusDocumentationCollection - Photos, publications, ephemera
|
||||
description: Archive of a higher education institution (university, college, polytechnic).
|
||||
slots:
|
||||
- custodian_types
|
||||
- custodian_types_rationale
|
||||
|
|
@ -73,15 +61,7 @@ classes:
|
|||
- campus life documentation
|
||||
slot_usage:
|
||||
holds_record_set_types:
|
||||
description: |
|
||||
Links this custodian type to the record set types it typically holds.
|
||||
Uses RiC-O property rico:isOrWasHolderOf to express custodial relationship.
|
||||
|
||||
**Academic Archive Holdings**:
|
||||
- UniversityAdministrativeFonds - Governance, committee, policy records
|
||||
- StudentRecordSeries - Enrollment, transcripts, graduation records
|
||||
- FacultyPaperCollection - Personal papers of faculty members
|
||||
- CampusDocumentationCollection - Photos, publications, ephemera
|
||||
description: Record set types typically held by academic archives.
|
||||
equals_expression: |
|
||||
["hc:UniversityAdministrativeFonds", "hc:StudentRecordSeries", "hc:FacultyPaperCollection", "hc:CampusDocumentationCollection"]
|
||||
wikidata_entity:
|
||||
|
|
@ -95,9 +75,9 @@ classes:
|
|||
Typically 'university', 'college', or 'institutional'.
|
||||
Reflects the educational institution's administrative scope.
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: AcademicArchive is an archival institution - maps to ARCHIVE
|
||||
equals_string: AcademicArchive is an archival institution - maps to ArchiveOrganizationType
|
||||
(A)
|
||||
wikidata_alignment:
|
||||
range: WikidataAlignment
|
||||
|
|
@ -140,13 +120,7 @@ classes:
|
|||
- wd:Q1065413
|
||||
- AcademicArchiveRecordSetType
|
||||
AcademicArchiveRecordSetType:
|
||||
description: "A rico:RecordSetType for classifying collections of academic and\
|
||||
\ higher \neducation institutional records within heritage institutions.\n\n\
|
||||
**Dual-Class Pattern**:\nThis class represents the COLLECTION type (rico:RecordSetType).\n\
|
||||
For the custodian organization type, see `AcademicArchive`.\n\n**Scope**:\n\
|
||||
Used to classify record sets that contain academic institutional materials:\n\
|
||||
- University administrative fonds\n- Student record series\n- Faculty paper\
|
||||
\ collections\n- Campus documentation collections\n"
|
||||
description: A rico:RecordSetType for classifying collections of academic and higher education institutional records.
|
||||
is_a: CollectionType
|
||||
class_uri: rico:RecordSetType
|
||||
slots:
|
||||
|
|
@ -166,13 +140,8 @@ classes:
|
|||
Structured scope definitions for AcademicArchiveRecordSetType.
|
||||
Formally documents what types of record sets are classified under this type.
|
||||
comments:
|
||||
- "**Subclasses (concrete RecordSetTypes)**:\n\nThis abstract type has four concrete\
|
||||
\ subclasses defined in \nAcademicArchiveRecordSetTypes.yaml:\n\n1. UniversityAdministrativeFonds\
|
||||
\ - Governance, committee, policy records\n2. StudentRecordSeries - Enrollment,\
|
||||
\ transcripts, graduation records\n3. FacultyPaperCollection - Personal papers\
|
||||
\ of faculty members\n4. CampusDocumentationCollection - Photos, publications,\
|
||||
\ ephemera\n\nEach subclass maps to rico:RecordSetType with appropriate broad_mappings\
|
||||
\ \nto RiC-O organizational concepts (rico:Fonds, rico:Series, rico:Collection).\n"
|
||||
- Collection type class for academic/higher education record sets
|
||||
- Part of dual-class pattern with AcademicArchive (custodian type)
|
||||
structured_aliases:
|
||||
- literal_form: Hochschularchivbestand
|
||||
in_language: de
|
||||
|
|
@ -186,7 +155,7 @@ classes:
|
|||
wikidata_equivalent:
|
||||
equals_string: Q27032435
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: AcademicArchiveRecordSetType classifies collections held by
|
||||
ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -72,7 +72,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: University administrative fonds are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
@ -97,7 +97,7 @@ classes:
|
|||
"strategic planning", "accreditation records"]'
|
||||
scope_excludes:
|
||||
equals_string: '["student records", "faculty papers", "research data"]'
|
||||
StudentRecordSeries:
|
||||
AcademicStudentRecordSeries:
|
||||
is_a: AcademicArchiveRecordSetType
|
||||
class_uri: rico:RecordSetType
|
||||
description: "A rico:RecordSetType for student records organized as archival series.\n\
|
||||
|
|
@ -155,7 +155,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Student record series are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
@ -165,7 +165,7 @@ classes:
|
|||
range: TemplateSpecificityScores
|
||||
inlined: true
|
||||
rico_record_set_type:
|
||||
equals_string: StudentRecordSeries
|
||||
equals_string: AcademicStudentRecordSeries
|
||||
rico_organizational_principle:
|
||||
equals_string: series
|
||||
rico_organizational_principle_uri:
|
||||
|
|
@ -245,7 +245,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "L"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:LibraryType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Faculty papers may be held by ARCHIVE (A) or LIBRARY (L) special
|
||||
collections
|
||||
|
|
@ -334,7 +334,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "L", "M"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:LibraryType", "hc:MuseumType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Campus documentation may be held by ARCHIVE (A), LIBRARY (L),
|
||||
or MUSEUM (M) depending on material type
|
||||
|
|
|
|||
|
|
@ -87,7 +87,7 @@ classes:
|
|||
wikidata_equivalent:
|
||||
equals_string: Q60658673
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: AdvertisingRadioArchive is an archival institution - maps to
|
||||
ARCHIVE (A)
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: RadioAdvertisementCollection records are held by ARCHIVE (A)
|
||||
type custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CampaignDocumentationSeries records are held by ARCHIVE (A)
|
||||
type custodians
|
||||
|
|
|
|||
|
|
@ -100,9 +100,9 @@ classes:
|
|||
wikidata_equivalent:
|
||||
equals_string: Q18574935
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: AnimalSoundArchive is an archival institution - maps to ARCHIVE
|
||||
equals_string: AnimalSoundArchive is an archival institution - maps to ArchiveOrganizationType
|
||||
(A)
|
||||
wikidata_alignment:
|
||||
range: WikidataAlignment
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: BioacousticRecordingCollection records are held by ARCHIVE
|
||||
(A) type custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: FieldRecordingSeries records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
|
|||
|
|
@ -49,10 +49,10 @@ classes:
|
|||
Typically includes: architectural drawings, blueprints, building plans,
|
||||
models, photographs, specifications, correspondence, competition entries.
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ArchitecturalArchive is a specialized archive type for architectural
|
||||
documentation - maps to ARCHIVE type (A)
|
||||
documentation - maps to ArchiveOrganizationType type (A)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
inlined: true
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ArchitecturalDrawingCollection records are held by ARCHIVE
|
||||
(A) type custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ArchitectPapersCollection records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
@ -123,7 +123,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: BuildingProjectFonds records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
|
|||
|
|
@ -48,7 +48,7 @@ classes:
|
|||
All ArchivalLibrary instances MUST be linked to a parent archive.
|
||||
required: true
|
||||
custodian_types:
|
||||
equals_expression: '["A", "L"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:LibraryType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Archival library is an OrganizationBranch combining archive
|
||||
(A) and library (L) functions.
|
||||
|
|
|
|||
|
|
@ -54,7 +54,7 @@ classes:
|
|||
Advocacy, public programming, and engagement activities.
|
||||
Key focus for archive associations as support organizations.
|
||||
custodian_types:
|
||||
equals_expression: '["A", "S"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:HeritageSocietyType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Archive association combines archive (A) and society/association
|
||||
(S).
|
||||
|
|
|
|||
|
|
@ -47,8 +47,6 @@ classes:
|
|||
slots:
|
||||
- custodian_types
|
||||
- custodian_types_rationale
|
||||
- encompassing_body_link
|
||||
- member_archives
|
||||
- specificity_annotation
|
||||
- template_specificity
|
||||
slot_usage:
|
||||
|
|
@ -65,9 +63,9 @@ classes:
|
|||
minimum_cardinality: 1
|
||||
maximum_cardinality: 1
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ArchiveNetwork is an archival institution - maps to ARCHIVE
|
||||
equals_string: ArchiveNetwork is an archival institution - maps to ArchiveOrganizationType
|
||||
(A)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
|
|
|
|||
|
|
@ -61,7 +61,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ArchiveOfInternationalOrganizationRecordSetType classifies
|
||||
collections held by ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: InternationalOrgFonds records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: TreatyCollection records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
@ -122,7 +122,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ConferenceRecordSeries records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
|
|||
|
|
@ -97,7 +97,7 @@ classes:
|
|||
range: ArchiveOrganizationType
|
||||
required: false
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ArchiveOrganizationType is specific to archives - institutions
|
||||
preserving original records and historical documents
|
||||
|
|
|
|||
|
|
@ -86,7 +86,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ArchivesForBuildingRecordsRecordSetType classifies collections
|
||||
held by ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: BuildingPermitSeries records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
@ -79,7 +79,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ConstructionDocumentCollection records are held by ARCHIVE
|
||||
(A) type custodians
|
||||
|
|
|
|||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ArchivesRegionalesRecordSetType classifies collections held
|
||||
by ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: RegionalAdministrationFonds records are held by ARCHIVE (A)
|
||||
type custodians
|
||||
|
|
|
|||
|
|
@ -87,7 +87,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ArtArchiveRecordSetType classifies collections held by ARCHIVE
|
||||
(A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ArtistPapersCollection records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: GalleryRecordsFonds records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
@ -122,7 +122,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ExhibitionDocumentationCollection records are held by ARCHIVE
|
||||
(A) type custodians
|
||||
|
|
|
|||
|
|
@ -87,7 +87,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: AudiovisualArchiveRecordSetType classifies collections held
|
||||
by ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: AudiovisualRecordingCollection records are held by ARCHIVE
|
||||
(A) type custodians
|
||||
|
|
@ -58,7 +58,7 @@ classes:
|
|||
rico_has_or_had_holder_note:
|
||||
equals_string: This RecordSetType is typically held by AudiovisualArchive
|
||||
custodians. Inverse of rico:isOrWasHolderOf.
|
||||
MediaProductionFonds:
|
||||
AudiovisualProductionFonds:
|
||||
is_a: AudiovisualArchiveRecordSetType
|
||||
class_uri: rico:RecordSetType
|
||||
description: "A rico:RecordSetType for Media production records.\n\n**RiC-O Alignment**:\n\
|
||||
|
|
@ -80,9 +80,9 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: MediaProductionFonds records are held by ARCHIVE (A) type custodians
|
||||
equals_string: AudiovisualProductionFonds records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
inlined: true
|
||||
|
|
@ -90,7 +90,7 @@ classes:
|
|||
range: TemplateSpecificityScores
|
||||
inlined: true
|
||||
rico_record_set_type:
|
||||
equals_string: MediaProductionFonds
|
||||
equals_string: AudiovisualProductionFonds
|
||||
rico_organizational_principle:
|
||||
equals_string: fonds
|
||||
rico_organizational_principle_uri:
|
||||
|
|
|
|||
|
|
@ -57,28 +57,10 @@ classes:
|
|||
- Deutsche Bank Historical Archive
|
||||
- Rothschild Archive (London)
|
||||
- Archives historiques de la Société Générale
|
||||
|
||||
**Dual-Class Pattern**:
|
||||
This class represents the CUSTODIAN type (the archive organization).
|
||||
For the collection type, see `BankRecordSetType` (rico:RecordSetType).
|
||||
|
||||
**Ontological Alignment**:
|
||||
- **SKOS**: skos:Concept with skos:broader Q166118 (archive)
|
||||
- **Schema.org**: schema:ArchiveOrganization
|
||||
- **RiC-O**: rico:CorporateBody (as agent)
|
||||
|
||||
**Multilingual Labels**:
|
||||
- de: Bankarchiv
|
||||
- es: archivo bancario
|
||||
- fr: archives bancaires
|
||||
slot_usage: null
|
||||
BankArchiveRecordSetType:
|
||||
description: |
|
||||
A rico:RecordSetType for classifying collections held by BankArchive custodians.
|
||||
|
||||
**Dual-Class Pattern**:
|
||||
This class represents the COLLECTION type (rico:RecordSetType).
|
||||
For the custodian organization type, see `BankArchive`.
|
||||
is_a: CollectionType
|
||||
class_uri: rico:RecordSetType
|
||||
slots:
|
||||
|
|
@ -93,7 +75,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: BankArchiveRecordSetType classifies collections held by ARCHIVE
|
||||
(A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: BankingRecordsFonds records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
@ -79,7 +79,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: FinancialTransactionSeries records are held by ARCHIVE (A)
|
||||
type custodians
|
||||
|
|
@ -122,7 +122,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CustomerAccountSeries records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
|
|||
|
|
@ -60,9 +60,9 @@ classes:
|
|||
- de: Bildstelle
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Bildstelle is an archival institution - maps to ARCHIVE (A)
|
||||
equals_string: Bildstelle is an archival institution - maps to ArchiveOrganizationType (A)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
inlined: true
|
||||
|
|
|
|||
|
|
@ -416,7 +416,7 @@ classes:
|
|||
range: string
|
||||
required: false
|
||||
custodian_types:
|
||||
equals_expression: '["B"]'
|
||||
equals_expression: '["hc:BioCustodianType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: BioCustodianType is specific to botanical gardens, zoos, aquariums
|
||||
- institutions with living collections
|
||||
|
|
|
|||
|
|
@ -487,7 +487,7 @@ classes:
|
|||
- value: MW123456
|
||||
- value: MN987654
|
||||
custodian_types:
|
||||
equals_expression: '["B", "M", "R"]'
|
||||
equals_expression: '["hc:BioCustodianType", "hc:MuseumType", "hc:ResearchOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: |
|
||||
BiologicalObject is primarily relevant to:
|
||||
|
|
|
|||
|
|
@ -98,7 +98,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CantonalArchiveRecordSetType classifies collections held by
|
||||
ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CantonalGovernmentFonds records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CantonalLegislationCollection records are held by ARCHIVE (A)
|
||||
type custodians
|
||||
|
|
|
|||
|
|
@ -75,7 +75,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["M"]'
|
||||
equals_expression: '["hc:MuseumType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Cast collection is a museum collection type (M).
|
||||
specificity_annotation:
|
||||
|
|
|
|||
|
|
@ -86,7 +86,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CathedralArchiveRecordSetType classifies collections held by
|
||||
ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ChapterRecordsFonds records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
@ -79,7 +79,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: LiturgicalDocumentCollection records are held by ARCHIVE (A)
|
||||
type custodians
|
||||
|
|
@ -122,7 +122,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: FabricRecordsSeries records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
|
|||
|
|
@ -105,37 +105,3 @@ classes:
|
|||
- PastoralCorrespondenceCollection
|
||||
- ChurchPropertyFonds
|
||||
- CongregationalLifeCollection
|
||||
ChurchArchiveRecordSetType:
|
||||
description: |
|
||||
A rico:RecordSetType for classifying collections held by ChurchArchive custodians.
|
||||
|
||||
**Dual-Class Pattern**:
|
||||
This class represents the COLLECTION type (rico:RecordSetType).
|
||||
For the custodian organization type, see `ChurchArchive`.
|
||||
is_a: CollectionType
|
||||
class_uri: rico:RecordSetType
|
||||
slots:
|
||||
- custodian_types
|
||||
- custodian_types_rationale
|
||||
- dual_class_link
|
||||
- specificity_annotation
|
||||
- template_specificity
|
||||
- type_scope
|
||||
see_also:
|
||||
- ChurchArchive
|
||||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ChurchArchiveRecordSetType classifies collections held by ARCHIVE
|
||||
(A) type custodians
|
||||
dual_class_link:
|
||||
range: DualClassLink
|
||||
inlined: true
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
inlined: true
|
||||
template_specificity:
|
||||
range: TemplateSpecificityScores
|
||||
inlined: true
|
||||
|
|
|
|||
|
|
@ -49,7 +49,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "H"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Church archive record set types are held by ARCHIVE (A) or
|
||||
HOLY_SITES (H) type custodians
|
||||
|
|
@ -121,7 +121,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "H"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Church governance fonds are held by ARCHIVE (A) or HOLY_SITES
|
||||
(H) type custodians
|
||||
|
|
@ -199,12 +199,12 @@ classes:
|
|||
- wd:Q185583
|
||||
close_mappings:
|
||||
- skos:Concept
|
||||
- CivilRegistrySeries
|
||||
see_also:
|
||||
- ChurchArchiveRecordSetType
|
||||
- rico:RecordSetType
|
||||
- rico-rst:Series
|
||||
- ParishArchive
|
||||
- CivilRegistrySeries
|
||||
annotations:
|
||||
genealogy_note: Primary source for genealogical research, especially pre-civil
|
||||
registration periods. Many digitized and indexed by organizations like FamilySearch, Alle
|
||||
|
|
@ -219,7 +219,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "H"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Parish register series are held by ARCHIVE (A) or HOLY_SITES
|
||||
(H), often transferred to regional archives
|
||||
|
|
@ -294,11 +294,11 @@ classes:
|
|||
- wd:Q22075301
|
||||
close_mappings:
|
||||
- skos:Concept
|
||||
- FacultyPaperCollection
|
||||
see_also:
|
||||
- ChurchArchiveRecordSetType
|
||||
- rico:RecordSetType
|
||||
- rico-rst:Fonds
|
||||
- FacultyPaperCollection
|
||||
slots:
|
||||
- custodian_types
|
||||
- custodian_types_rationale
|
||||
|
|
@ -306,7 +306,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "H", "L"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType", "hc:LibraryType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Pastoral correspondence may be held by ARCHIVE (A), HOLY_SITES
|
||||
(H), or LIBRARY (L) special collections
|
||||
|
|
@ -396,7 +396,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "H"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Church property fonds are held by ARCHIVE (A) or HOLY_SITES
|
||||
(H) type custodians
|
||||
|
|
@ -477,11 +477,11 @@ classes:
|
|||
close_mappings:
|
||||
- skos:Concept
|
||||
- schema:Collection
|
||||
- CampusDocumentationCollection
|
||||
see_also:
|
||||
- ChurchArchiveRecordSetType
|
||||
- rico:RecordSetType
|
||||
- rico-rst:Collection
|
||||
- CampusDocumentationCollection
|
||||
annotations:
|
||||
collection_nature_note: Often includes artificial/assembled collections. Materials
|
||||
reflect the lived religious experience of the community beyond formal administration.
|
||||
|
|
@ -492,7 +492,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "H", "S"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:HolySacredSiteType", "hc:HeritageSocietyType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Congregational life collections may be held by ARCHIVE (A),
|
||||
HOLY_SITES (H), or COLLECTING_SOCIETY (S)
|
||||
|
|
|
|||
|
|
@ -92,7 +92,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ChurchArchiveSwedenRecordSetType classifies collections held
|
||||
by ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: SwedishParishRecordSeries records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
@ -58,12 +58,13 @@ classes:
|
|||
rico_has_or_had_holder_note:
|
||||
equals_string: This RecordSetType is typically held by ChurchArchiveSweden
|
||||
custodians. Inverse of rico:isOrWasHolderOf.
|
||||
ChurchPropertyFonds:
|
||||
SwedishChurchPropertyFonds:
|
||||
is_a: ChurchArchiveSwedenRecordSetType
|
||||
class_uri: rico:RecordSetType
|
||||
description: "A rico:RecordSetType for Church property records.\n\n**RiC-O Alignment**:\n\
|
||||
description: "A rico:RecordSetType for Swedish Church property records.\n\n**RiC-O Alignment**:\n\
|
||||
This class is a specialized rico:RecordSetType following the fonds \norganizational\
|
||||
\ principle as defined by rico-rst:Fonds.\n"
|
||||
\ principle as defined by rico-rst:Fonds.\n\n**Note**: This is a Swedish-specific\
|
||||
\ variant. For the general church property fonds type, see ChurchPropertyFonds.\n"
|
||||
exact_mappings:
|
||||
- rico:RecordSetType
|
||||
related_mappings:
|
||||
|
|
@ -73,6 +74,7 @@ classes:
|
|||
see_also:
|
||||
- ChurchArchiveSwedenRecordSetType
|
||||
- rico:RecordSetType
|
||||
- ChurchPropertyFonds
|
||||
slots:
|
||||
- custodian_types
|
||||
- custodian_types_rationale
|
||||
|
|
@ -80,9 +82,9 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ChurchPropertyFonds records are held by ARCHIVE (A) type custodians
|
||||
equals_string: SwedishChurchPropertyFonds records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
inlined: true
|
||||
|
|
@ -90,7 +92,7 @@ classes:
|
|||
range: TemplateSpecificityScores
|
||||
inlined: true
|
||||
rico_record_set_type:
|
||||
equals_string: ChurchPropertyFonds
|
||||
equals_string: SwedishChurchPropertyFonds
|
||||
rico_organizational_principle:
|
||||
equals_string: fonds
|
||||
rico_organizational_principle_uri:
|
||||
|
|
|
|||
|
|
@ -64,9 +64,9 @@ classes:
|
|||
- fr: cinémathèque
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Cinematheque is an archival institution - maps to ARCHIVE (A)
|
||||
equals_string: Cinematheque is an archival institution - maps to ArchiveOrganizationType (A)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
inlined: true
|
||||
|
|
|
|||
|
|
@ -90,7 +90,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ClimateArchiveRecordSetType classifies collections held by
|
||||
ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ClimateDataCollection records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: MeteorologicalObservationSeries records are held by ARCHIVE
|
||||
(A) type custodians
|
||||
|
|
|
|||
|
|
@ -69,9 +69,9 @@ classes:
|
|||
- **RiC-O**: rico:CorporateBody (as agent)
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CollectingArchives is an archival institution - maps to ARCHIVE
|
||||
equals_string: CollectingArchives is an archival institution - maps to ArchiveOrganizationType
|
||||
(A)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
|
|
@ -100,7 +100,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CollectingArchivesRecordSetType classifies collections held
|
||||
by ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CollectedMaterialsFonds records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DonatedPapersCollection records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
|
|||
|
|
@ -24,16 +24,33 @@ imports:
|
|||
- ./FindingAid
|
||||
- ./ExhibitedObject
|
||||
- ./CurationActivity
|
||||
- ../slots/access_policy_ref
|
||||
- ../slots/acquisition_date
|
||||
- ../slots/acquisition_method
|
||||
- ../slots/acquisition_source
|
||||
- ../slots/arrangement
|
||||
- ../slots/class_metadata_slots
|
||||
- ../slots/collection_description
|
||||
- ../slots/collection_id
|
||||
- ../slots/collection_name
|
||||
- ../slots/collection_type_ref
|
||||
- ../slots/curation_activities
|
||||
- ../slots/custodial_history
|
||||
- ../slots/digital_surrogate_url
|
||||
- ../slots/digitization_status
|
||||
- ../slots/extent
|
||||
- ../slots/extent_items
|
||||
- ../slots/finding_aids
|
||||
- ../slots/items
|
||||
- ../slots/parent_collection
|
||||
- ../slots/part_of_custodian_collection
|
||||
- ../slots/provenance_statement
|
||||
- ../slots/rico_record_set_type
|
||||
- ../slots/sub_collections
|
||||
- ../slots/subject_areas
|
||||
- ../slots/temporal_coverage
|
||||
- ../slots/valid_from
|
||||
- ../slots/valid_to
|
||||
- ../slots/collection_name
|
||||
- ../slots/collection_description
|
||||
- ../slots/extent
|
||||
- ../slots/temporal_coverage
|
||||
- ../slots/digitization_status
|
||||
- ../slots/acquisition_method
|
||||
- ../slots/acquisition_date
|
||||
- ../slots/class_metadata_slots
|
||||
classes:
|
||||
Collection:
|
||||
class_uri: rico:RecordSet
|
||||
|
|
@ -616,7 +633,7 @@ classes:
|
|||
Date when this collection ended at current custodian (if transferred/deaccessioned).
|
||||
range: date
|
||||
custodian_types:
|
||||
equals_expression: '["G", "L", "A", "M", "B", "H"]'
|
||||
equals_expression: '["hc:GalleryType", "hc:LibraryType", "hc:ArchiveOrganizationType", "hc:MuseumType", "hc:BioCustodianType", "hc:HolySacredSiteType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: 'Collection is relevant to institutions that hold catalogued
|
||||
collections: Galleries, Libraries, Archives, Museums, Botanical gardens/zoos,
|
||||
|
|
@ -681,65 +698,9 @@ classes:
|
|||
digitization_status: PARTIAL
|
||||
part_of_custodian_collection: https://nde.nl/ontology/hc/custodian-collection/nationaal-archief
|
||||
description: VOC archival fonds at Nationaal Archief
|
||||
slots:
|
||||
collection_id:
|
||||
description: Unique identifier for this collection
|
||||
range: uriorcurie
|
||||
collection_type_ref:
|
||||
description: Classification from CollectionType hierarchy
|
||||
range: CollectionType
|
||||
rico_record_set_type:
|
||||
description: RiC-O RecordSetType vocabulary mapping
|
||||
range: uriorcurie
|
||||
extent_items:
|
||||
description: Numeric item count
|
||||
range: integer
|
||||
subject_areas:
|
||||
description: Thematic subjects
|
||||
range: string
|
||||
multivalued: true
|
||||
provenance_statement:
|
||||
description: Provenance narrative
|
||||
range: string
|
||||
custodial_history:
|
||||
description: Chain of custody
|
||||
range: string
|
||||
multivalued: true
|
||||
acquisition_source:
|
||||
description: From whom collection was acquired
|
||||
range: string
|
||||
access_policy_ref:
|
||||
description: Access policy governing collection
|
||||
range: AccessPolicy
|
||||
arrangement:
|
||||
description: Intellectual arrangement system
|
||||
range: string
|
||||
finding_aids:
|
||||
description: Finding aids describing this collection
|
||||
range: FindingAid
|
||||
multivalued: true
|
||||
slot_uri: rico:isDescribedBy
|
||||
digital_surrogate_url:
|
||||
description: URL to digital surrogate
|
||||
range: uri
|
||||
multivalued: true
|
||||
parent_collection:
|
||||
description: Parent collection (hierarchical)
|
||||
range: Collection
|
||||
sub_collections:
|
||||
description: Child collections (hierarchical)
|
||||
range: Collection
|
||||
multivalued: true
|
||||
items:
|
||||
description: Individual ExhibitedObject items within this collection
|
||||
range: ExhibitedObject
|
||||
multivalued: true
|
||||
slot_uri: rico:hasOrHadConstituent
|
||||
curation_activities:
|
||||
description: Ongoing curation activities performed on this collection
|
||||
range: CurationActivity
|
||||
multivalued: true
|
||||
slot_uri: crm:P147i_was_curated_by
|
||||
part_of_custodian_collection:
|
||||
description: Link to abstract CustodianCollection
|
||||
range: CustodianCollection
|
||||
|
||||
# NOTE: All slots are defined in centralized modules/slots/ files
|
||||
# Slots used by this class: collection_id, collection_type_ref, rico_record_set_type,
|
||||
# extent_items, subject_areas, provenance_statement, custodial_history, acquisition_source,
|
||||
# access_policy_ref, arrangement, finding_aids, digital_surrogate_url, parent_collection,
|
||||
# sub_collections, items, curation_activities, part_of_custodian_collection
|
||||
|
|
|
|||
|
|
@ -89,7 +89,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ComarcalArchiveRecordSetType classifies collections held by
|
||||
ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ComarcalAdministrationFonds records are held by ARCHIVE (A)
|
||||
type custodians
|
||||
|
|
@ -58,7 +58,7 @@ classes:
|
|||
rico_has_or_had_holder_note:
|
||||
equals_string: This RecordSetType is typically held by ComarcalArchive custodians.
|
||||
Inverse of rico:isOrWasHolderOf.
|
||||
LocalHistoryCollection:
|
||||
ComarcalHistoryCollection:
|
||||
is_a: ComarcalArchiveRecordSetType
|
||||
class_uri: rico:RecordSetType
|
||||
description: "A rico:RecordSetType for Regional historical documentation.\n\n\
|
||||
|
|
@ -80,9 +80,9 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: LocalHistoryCollection records are held by ARCHIVE (A) type
|
||||
equals_string: ComarcalHistoryCollection records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
|
|
@ -91,7 +91,7 @@ classes:
|
|||
range: TemplateSpecificityScores
|
||||
inlined: true
|
||||
rico_record_set_type:
|
||||
equals_string: LocalHistoryCollection
|
||||
equals_string: ComarcalHistoryCollection
|
||||
rico_organizational_principle:
|
||||
equals_string: collection
|
||||
rico_organizational_principle_uri:
|
||||
|
|
|
|||
|
|
@ -368,10 +368,10 @@ classes:
|
|||
- value: Corporate events, Weddings, Conference space
|
||||
description: Company museum activities
|
||||
custodian_types:
|
||||
equals_expression: '["C"]'
|
||||
equals_expression: '["hc:CommercialOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CommercialOrganizationType represents for-profit commercial
|
||||
heritage custodians (corporate archives, company museums) - maps to CORPORATION
|
||||
heritage custodians (corporate archives, company museums) - maps to CommercialOrganizationType
|
||||
type (C)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
|
|
|
|||
|
|
@ -96,7 +96,7 @@ classes:
|
|||
- rico:RecordSetType
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CommunityArchiveRecordSetType classifies collections held by
|
||||
ARCHIVE (A) type custodians
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CommunityOrganizationFonds records are held by ARCHIVE (A)
|
||||
type custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: OralHistoryCollection records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
@ -123,7 +123,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: LocalEventDocumentation records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
|
|||
|
|
@ -51,7 +51,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "C"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Company archive record set types are held by ARCHIVE (A) or
|
||||
CORPORATION (C) type custodians
|
||||
|
|
@ -114,12 +114,12 @@ classes:
|
|||
- wd:Q1643722
|
||||
close_mappings:
|
||||
- skos:Concept
|
||||
- CouncilGovernanceFonds
|
||||
see_also:
|
||||
- CompanyArchiveRecordSetType
|
||||
- rico:RecordSetType
|
||||
- rico-rst:Fonds
|
||||
- CompanyArchives
|
||||
- CouncilGovernanceFonds
|
||||
slots:
|
||||
- custodian_types
|
||||
- custodian_types_rationale
|
||||
|
|
@ -127,7 +127,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "C"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Corporate governance fonds are held by ARCHIVE (A) or CORPORATION
|
||||
(C) type custodians
|
||||
|
|
@ -223,7 +223,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "C", "R"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType", "hc:ResearchOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Product development collections may be held by ARCHIVE (A),
|
||||
CORPORATION (C), or RESEARCH_CENTER (R)
|
||||
|
|
@ -318,7 +318,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "C", "M"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType", "hc:MuseumType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Marketing archives may be held by ARCHIVE (A), CORPORATION
|
||||
(C), or MUSEUM (M) for design/advertising collections
|
||||
|
|
@ -397,11 +397,11 @@ classes:
|
|||
- wd:Q185583
|
||||
close_mappings:
|
||||
- skos:Concept
|
||||
- StudentRecordSeries
|
||||
see_also:
|
||||
- CompanyArchiveRecordSetType
|
||||
- rico:RecordSetType
|
||||
- rico-rst:Series
|
||||
- StudentRecordSeries
|
||||
slots:
|
||||
- custodian_types
|
||||
- custodian_types_rationale
|
||||
|
|
@ -409,7 +409,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "C"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Personnel records series are held by ARCHIVE (A) or CORPORATION
|
||||
(C) type custodians
|
||||
|
|
@ -504,7 +504,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A", "C", "L"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:CommercialOrganizationType", "hc:LibraryType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Corporate publications may be held by ARCHIVE (A), CORPORATION
|
||||
(C), or LIBRARY (L)
|
||||
|
|
|
|||
|
|
@ -15,8 +15,12 @@ imports:
|
|||
- ./Department
|
||||
- ./OrganizationBranch
|
||||
- ./CompanyArchiveRecordSetTypes
|
||||
- ../slots/archive_branches
|
||||
- ../slots/archive_department_of
|
||||
- ../slots/holds_record_set_types
|
||||
- ../slots/parent_corporation
|
||||
- ../slots/type_scope
|
||||
- ../slots/wikidata_entity
|
||||
|
||||
classes:
|
||||
CompanyArchives:
|
||||
|
|
|
|||
|
|
@ -414,7 +414,7 @@ classes:
|
|||
- value: Treatment coincided with preparation for 1995 exhibition
|
||||
- value: Discovery of Vermeer's signature during cleaning
|
||||
custodian_types:
|
||||
equals_expression: '["G", "M", "A", "L", "R", "H", "B"]'
|
||||
equals_expression: '["hc:GalleryType", "hc:MuseumType", "hc:ArchiveOrganizationType", "hc:LibraryType", "hc:ResearchOrganizationType", "hc:HolySacredSiteType", "hc:BioCustodianType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: |
|
||||
ConservationRecord is relevant to all custodian types managing physical collections:
|
||||
|
|
|
|||
|
|
@ -81,9 +81,9 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CountyRecordOffice is an archival institution - maps to ARCHIVE
|
||||
equals_string: CountyRecordOffice is an archival institution - maps to ArchiveOrganizationType
|
||||
(A)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
|
|
|
|||
|
|
@ -68,9 +68,9 @@ classes:
|
|||
- commercial
|
||||
description: General court archive covering main jurisdictions
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CourtRecords is an archival institution - maps to ARCHIVE (A)
|
||||
equals_string: CourtRecords is an archival institution - maps to ArchiveOrganizationType (A)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
inlined: true
|
||||
|
|
|
|||
|
|
@ -88,7 +88,7 @@ classes:
|
|||
range: string
|
||||
multivalued: true
|
||||
custodian_types:
|
||||
equals_expression: '["X"]'
|
||||
equals_expression: '["hc:MixedCustodianType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CulturalInstitution is a broad type encompassing multiple heritage
|
||||
categories (G, L, A, M, etc.). Maps to MIXED (X) as it spans categories.
|
||||
|
|
|
|||
|
|
@ -525,7 +525,7 @@ classes:
|
|||
- value: condition-assessment
|
||||
description: SPECTRUM Condition Checking
|
||||
custodian_types:
|
||||
equals_expression: '["G", "L", "A", "M", "R", "H", "B"]'
|
||||
equals_expression: '["hc:GalleryType", "hc:LibraryType", "hc:ArchiveOrganizationType", "hc:MuseumType", "hc:ResearchOrganizationType", "hc:HolySacredSiteType", "hc:BioCustodianType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: |
|
||||
CurationActivity is relevant to ALL heritage custodian types that manage collections:
|
||||
|
|
|
|||
|
|
@ -99,10 +99,10 @@ classes:
|
|||
multivalued: true
|
||||
required: false
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Current Archive is an archive for active/current records -
|
||||
maps to ARCHIVE (A)
|
||||
maps to ArchiveOrganizationType (A)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
inlined: true
|
||||
|
|
@ -141,20 +141,7 @@ classes:
|
|||
creating_organization: Ministry of Finance
|
||||
retention_schedule: Finance Records Schedule 2023
|
||||
description: Current archive for ministry records
|
||||
slots:
|
||||
creating_organization:
|
||||
description: Organization creating the records
|
||||
range: string
|
||||
transfer_policy:
|
||||
description: Policy for transferring to permanent archive
|
||||
range: string
|
||||
has_narrower_instance:
|
||||
slot_uri: skos:narrowerTransitive
|
||||
description: |
|
||||
Links archive TYPE to specific CustodianArchive INSTANCES.
|
||||
SKOS narrowerTransitive for type-to-instance relationship.
|
||||
range: CustodianArchive
|
||||
multivalued: true
|
||||
|
||||
CurrentArchiveRecordSetType:
|
||||
description: |
|
||||
A rico:RecordSetType for classifying collections held by CurrentArchive custodians.
|
||||
|
|
@ -170,23 +157,29 @@ slots:
|
|||
- CurrentArchive
|
||||
- rico:RecordSetType
|
||||
annotations:
|
||||
custodian_types: '["A"]'
|
||||
custodian_types: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale: CurrentArchiveRecordSetType classifies collections
|
||||
held by ARCHIVE (A) type custodians
|
||||
held by ArchiveOrganizationType custodians
|
||||
linked_custodian_type: CurrentArchive
|
||||
dual_class_pattern: collection_type
|
||||
specificity_score: 0.7
|
||||
specificity_rationale: Type taxonomy class.
|
||||
specificity_annotation_timestamp: '2026-01-06T00:26:29.675099Z'
|
||||
specificity_annotation_agent: opencode-claude-sonnet-4
|
||||
template_specificity:
|
||||
archive_search: 0.2
|
||||
museum_search: 0.75
|
||||
library_search: 0.75
|
||||
collection_discovery: 0.75
|
||||
person_research: 0.75
|
||||
location_browse: 0.75
|
||||
identifier_lookup: 0.75
|
||||
organizational_change: 0.75
|
||||
digital_platform: 0.75
|
||||
general_heritage: 0.75
|
||||
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
|
||||
|
||||
|
||||
slots:
|
||||
creating_organization:
|
||||
description: Organization creating the records
|
||||
range: string
|
||||
transfer_policy:
|
||||
description: Policy for transferring to permanent archive
|
||||
range: string
|
||||
has_narrower_instance:
|
||||
slot_uri: skos:narrowerTransitive
|
||||
description: |
|
||||
Links archive TYPE to specific CustodianArchive INSTANCES.
|
||||
SKOS narrowerTransitive for type-to-instance relationship.
|
||||
range: CustodianArchive
|
||||
multivalued: true
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: ActiveRecordsFonds records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
|
|||
|
|
@ -560,7 +560,7 @@ classes:
|
|||
- value: wikidata:Q3621648
|
||||
description: Current archive / active records
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Custodian archive is an archive type (A).
|
||||
specificity_annotation:
|
||||
|
|
@ -619,6 +619,33 @@ classes:
|
|||
schedule.
|
||||
refers_to_custodian: https://nde.nl/ontology/hc/nl-na
|
||||
description: Government records in active processing (9 years after accession)
|
||||
CustodianArchiveRecordSetType:
|
||||
description: |
|
||||
A rico:RecordSetType for classifying collections held by CustodianArchive custodians.
|
||||
|
||||
**Dual-Class Pattern**:
|
||||
This class represents the COLLECTION type (rico:RecordSetType).
|
||||
For the custodian organization type, see `CustodianArchive`.
|
||||
is_a: CollectionType
|
||||
class_uri: rico:RecordSetType
|
||||
slots:
|
||||
- type_scope
|
||||
see_also:
|
||||
- CustodianArchive
|
||||
- rico:RecordSetType
|
||||
annotations:
|
||||
custodian_types: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale: CustodianArchiveRecordSetType classifies collections
|
||||
held by ArchiveOrganizationType custodians
|
||||
linked_custodian_type: CustodianArchive
|
||||
dual_class_pattern: collection_type
|
||||
specificity_score: 0.7
|
||||
specificity_rationale: Type taxonomy class.
|
||||
specificity_annotation_timestamp: '2026-01-06T00:26:29.676176Z'
|
||||
specificity_annotation_agent: opencode-claude-sonnet-4
|
||||
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
|
||||
|
||||
|
||||
slots:
|
||||
archive_name:
|
||||
description: Name/title for operational archive accession
|
||||
|
|
@ -677,38 +704,3 @@ slots:
|
|||
\ for instance-to-type relationship.\nValues: CurrentArchive (Q3621648), DepositArchive\
|
||||
\ (Q244904), \nHistoricalArchive (Q3621673).\n"
|
||||
range: uriorcurie
|
||||
CustodianArchiveRecordSetType:
|
||||
description: |
|
||||
A rico:RecordSetType for classifying collections held by CustodianArchive custodians.
|
||||
|
||||
**Dual-Class Pattern**:
|
||||
This class represents the COLLECTION type (rico:RecordSetType).
|
||||
For the custodian organization type, see `CustodianArchive`.
|
||||
is_a: CollectionType
|
||||
class_uri: rico:RecordSetType
|
||||
slots:
|
||||
- type_scope
|
||||
see_also:
|
||||
- CustodianArchive
|
||||
- rico:RecordSetType
|
||||
annotations:
|
||||
custodian_types: '["A"]'
|
||||
custodian_types_rationale: CustodianArchiveRecordSetType classifies collections
|
||||
held by ARCHIVE (A) type custodians
|
||||
linked_custodian_type: CustodianArchive
|
||||
dual_class_pattern: collection_type
|
||||
specificity_score: 0.7
|
||||
specificity_rationale: Type taxonomy class.
|
||||
specificity_annotation_timestamp: '2026-01-06T00:26:29.676176Z'
|
||||
specificity_annotation_agent: opencode-claude-sonnet-4
|
||||
template_specificity:
|
||||
archive_search: 0.2
|
||||
museum_search: 0.75
|
||||
library_search: 0.75
|
||||
collection_discovery: 0.75
|
||||
person_research: 0.75
|
||||
location_browse: 0.75
|
||||
identifier_lookup: 0.75
|
||||
organizational_change: 0.75
|
||||
digital_platform: 0.75
|
||||
general_heritage: 0.75
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CustodialRecordsFonds records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
|
|||
|
|
@ -112,8 +112,7 @@ classes:
|
|||
description: Confidence in observation accuracy
|
||||
range: ConfidenceMeasure
|
||||
custodian_types:
|
||||
equals_expression: '["G", "L", "A", "M", "O", "R", "C", "U", "B", "E", "S",
|
||||
"F", "I", "X", "P", "H", "D", "N", "T"]'
|
||||
equals_expression: '["hc:GalleryType", "hc:LibraryType", "hc:ArchiveOrganizationType", "hc:MuseumType", "hc:OfficialInstitutionType", "hc:ResearchOrganizationType", "hc:CommercialOrganizationType", "hc:UnspecifiedType", "hc:BioCustodianType", "hc:EducationProviderType", "hc:HeritageSocietyType", "hc:FeatureCustodianType", "hc:IntangibleHeritageGroupType", "hc:MixedCustodianType", "hc:PersonalCollectionType", "hc:HolySacredSiteType", "hc:DigitalPlatformType", "hc:NonProfitType", "hc:TasteScentHeritageType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: CustodianObservation is universal - source-based evidence can
|
||||
apply to any heritage custodian type
|
||||
|
|
|
|||
|
|
@ -8,11 +8,10 @@ imports:
|
|||
- ../slots/created
|
||||
- ../slots/modified
|
||||
- ../slots/class_metadata_slots
|
||||
- ../slots/wikidata_entity
|
||||
slots:
|
||||
type_id:
|
||||
range: uriorcurie
|
||||
wikidata_entity:
|
||||
range: string
|
||||
type_label:
|
||||
range: string
|
||||
type_description:
|
||||
|
|
|
|||
|
|
@ -104,7 +104,7 @@ classes:
|
|||
range: AccessPolicy
|
||||
required: true
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DarkArchive is a type of archive with restricted access - maps
|
||||
to ARCHIVE type (A)
|
||||
|
|
@ -168,17 +168,6 @@ classes:
|
|||
access_level: CLOSED
|
||||
restriction_reason: Donor restriction - 50 year embargo
|
||||
description: Embargoed materials dark archive
|
||||
slots:
|
||||
access_trigger_events:
|
||||
description: Events that trigger access
|
||||
range: string
|
||||
multivalued: true
|
||||
preservation_purpose:
|
||||
description: Purpose for dark archive
|
||||
range: string
|
||||
refers_to_access_policy:
|
||||
description: Link to access policy
|
||||
range: AccessPolicy
|
||||
DarkArchiveRecordSetType:
|
||||
description: |
|
||||
A rico:RecordSetType for classifying collections held by DarkArchive custodians.
|
||||
|
|
@ -194,7 +183,7 @@ slots:
|
|||
- DarkArchive
|
||||
- rico:RecordSetType
|
||||
annotations:
|
||||
custodian_types: '["A"]'
|
||||
custodian_types: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale: DarkArchiveRecordSetType classifies collections held
|
||||
by ARCHIVE (A) type custodians
|
||||
linked_custodian_type: DarkArchive
|
||||
|
|
@ -203,14 +192,17 @@ slots:
|
|||
specificity_rationale: Type taxonomy class.
|
||||
specificity_annotation_timestamp: '2026-01-06T00:26:29.676643Z'
|
||||
specificity_annotation_agent: opencode-claude-sonnet-4
|
||||
template_specificity:
|
||||
archive_search: 0.2
|
||||
museum_search: 0.75
|
||||
library_search: 0.75
|
||||
collection_discovery: 0.75
|
||||
person_research: 0.75
|
||||
location_browse: 0.75
|
||||
identifier_lookup: 0.75
|
||||
organizational_change: 0.75
|
||||
digital_platform: 0.75
|
||||
general_heritage: 0.75
|
||||
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
|
||||
|
||||
|
||||
slots:
|
||||
access_trigger_events:
|
||||
description: Events that trigger access
|
||||
range: string
|
||||
multivalued: true
|
||||
preservation_purpose:
|
||||
description: Purpose for dark archive
|
||||
range: string
|
||||
refers_to_access_policy:
|
||||
description: Link to access policy
|
||||
range: AccessPolicy
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: PreservationCopyCollection records are held by ARCHIVE (A)
|
||||
type custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DigitalPreservationFonds records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
|
|||
|
|
@ -139,9 +139,9 @@ classes:
|
|||
minimum_cardinality: 1
|
||||
maximum_cardinality: 1
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DepartmentalArchives is an archival institution - maps to ARCHIVE
|
||||
equals_string: DepartmentalArchives is an archival institution - maps to ArchiveOrganizationType
|
||||
(A)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
|
|
@ -206,9 +206,9 @@ classes:
|
|||
wikidata_equivalent:
|
||||
equals_string: Q2860456
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DepartmentalArchives is an archival institution - maps to ARCHIVE
|
||||
equals_string: DepartmentalArchives is an archival institution - maps to ArchiveOrganizationType
|
||||
(A)
|
||||
wikidata_alignment:
|
||||
range: WikidataAlignment
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DepartmentAdministrationFonds records are held by ARCHIVE (A)
|
||||
type custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: PrefectureSeries records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
|
|||
|
|
@ -117,9 +117,9 @@ classes:
|
|||
- permanent archive transfer
|
||||
- depositor return
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DepositArchive is an archival institution - maps to ARCHIVE
|
||||
equals_string: DepositArchive is an archival institution - maps to ArchiveOrganizationType
|
||||
(A)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
|
|
@ -172,6 +172,33 @@ classes:
|
|||
- secure destruction
|
||||
- transfer to national archives
|
||||
description: Federal records center deposit archive
|
||||
DepositArchiveRecordSetType:
|
||||
description: |
|
||||
A rico:RecordSetType for classifying collections held by DepositArchive custodians.
|
||||
|
||||
**Dual-Class Pattern**:
|
||||
This class represents the COLLECTION type (rico:RecordSetType).
|
||||
For the custodian organization type, see `DepositArchive`.
|
||||
is_a: CollectionType
|
||||
class_uri: rico:RecordSetType
|
||||
slots:
|
||||
- type_scope
|
||||
see_also:
|
||||
- DepositArchive
|
||||
- rico:RecordSetType
|
||||
annotations:
|
||||
custodian_types: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale: DepositArchiveRecordSetType classifies collections
|
||||
held by ArchiveOrganizationType custodians
|
||||
linked_custodian_type: DepositArchive
|
||||
dual_class_pattern: collection_type
|
||||
specificity_score: 0.7
|
||||
specificity_rationale: Type taxonomy class.
|
||||
specificity_annotation_timestamp: '2026-01-06T00:26:29.677478Z'
|
||||
specificity_annotation_agent: opencode-claude-sonnet-4
|
||||
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
|
||||
|
||||
|
||||
slots:
|
||||
operates_storage_types:
|
||||
description: Storage types operated by deposit archive
|
||||
|
|
@ -188,38 +215,3 @@ slots:
|
|||
description: Disposition services provided
|
||||
range: string
|
||||
multivalued: true
|
||||
DepositArchiveRecordSetType:
|
||||
description: |
|
||||
A rico:RecordSetType for classifying collections held by DepositArchive custodians.
|
||||
|
||||
**Dual-Class Pattern**:
|
||||
This class represents the COLLECTION type (rico:RecordSetType).
|
||||
For the custodian organization type, see `DepositArchive`.
|
||||
is_a: CollectionType
|
||||
class_uri: rico:RecordSetType
|
||||
slots:
|
||||
- type_scope
|
||||
see_also:
|
||||
- DepositArchive
|
||||
- rico:RecordSetType
|
||||
annotations:
|
||||
custodian_types: '["A"]'
|
||||
custodian_types_rationale: DepositArchiveRecordSetType classifies collections
|
||||
held by ARCHIVE (A) type custodians
|
||||
linked_custodian_type: DepositArchive
|
||||
dual_class_pattern: collection_type
|
||||
specificity_score: 0.7
|
||||
specificity_rationale: Type taxonomy class.
|
||||
specificity_annotation_timestamp: '2026-01-06T00:26:29.677478Z'
|
||||
specificity_annotation_agent: opencode-claude-sonnet-4
|
||||
template_specificity:
|
||||
archive_search: 0.2
|
||||
museum_search: 0.75
|
||||
library_search: 0.75
|
||||
collection_discovery: 0.75
|
||||
person_research: 0.75
|
||||
location_browse: 0.75
|
||||
identifier_lookup: 0.75
|
||||
organizational_change: 0.75
|
||||
digital_platform: 0.75
|
||||
general_heritage: 0.75
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DepositedRecordsFonds records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
|
|||
|
|
@ -167,10 +167,10 @@ classes:
|
|||
- JPEG2000
|
||||
- XML
|
||||
custodian_types:
|
||||
equals_expression: '["A", "D"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:DigitalPlatformType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DigitalArchive bridges archive and digital platform types -
|
||||
maps to ARCHIVE (A) and DIGITAL_PLATFORM (D)
|
||||
maps to ArchiveOrganizationType (A) and DIGITAL_PLATFORM (D)
|
||||
specificity_annotation:
|
||||
range: SpecificityAnnotation
|
||||
inlined: true
|
||||
|
|
@ -220,21 +220,6 @@ classes:
|
|||
- JPEG2000
|
||||
- WARC
|
||||
description: Government digital archive with mixed content
|
||||
slots:
|
||||
operates_platform_types:
|
||||
description: Digital platform types operated
|
||||
range: DigitalPlatformType
|
||||
multivalued: true
|
||||
content_origin:
|
||||
description: Origin of content (born_digital, digitized, mixed)
|
||||
range: string
|
||||
access_interface_url:
|
||||
description: URL of access interface
|
||||
range: uri
|
||||
supported_formats:
|
||||
description: Supported file formats
|
||||
range: string
|
||||
multivalued: true
|
||||
DigitalArchiveRecordSetType:
|
||||
description: |
|
||||
A rico:RecordSetType for classifying collections held by DigitalArchive custodians.
|
||||
|
|
@ -250,23 +235,30 @@ slots:
|
|||
- DigitalArchive
|
||||
- rico:RecordSetType
|
||||
annotations:
|
||||
custodian_types: '["A"]'
|
||||
custodian_types: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale: DigitalArchiveRecordSetType classifies collections
|
||||
held by ARCHIVE (A) type custodians
|
||||
held by ArchiveOrganizationType custodians
|
||||
linked_custodian_type: DigitalArchive
|
||||
dual_class_pattern: collection_type
|
||||
specificity_score: 0.7
|
||||
specificity_rationale: Type taxonomy class.
|
||||
specificity_annotation_timestamp: '2026-01-06T00:26:29.677967Z'
|
||||
specificity_annotation_agent: opencode-claude-sonnet-4
|
||||
template_specificity:
|
||||
archive_search: 0.2
|
||||
museum_search: 0.75
|
||||
library_search: 0.75
|
||||
collection_discovery: 0.75
|
||||
person_research: 0.75
|
||||
location_browse: 0.75
|
||||
identifier_lookup: 0.75
|
||||
organizational_change: 0.75
|
||||
digital_platform: 0.75
|
||||
general_heritage: 0.75
|
||||
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
|
||||
|
||||
|
||||
slots:
|
||||
operates_platform_types:
|
||||
description: Digital platform types operated
|
||||
range: DigitalPlatformType
|
||||
multivalued: true
|
||||
content_origin:
|
||||
description: Origin of content (born_digital, digitized, mixed)
|
||||
range: string
|
||||
access_interface_url:
|
||||
description: URL of access interface
|
||||
range: uri
|
||||
supported_formats:
|
||||
description: Supported file formats
|
||||
range: string
|
||||
multivalued: true
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DigitalObjectCollection records are held by ARCHIVE (A) type
|
||||
custodians
|
||||
|
|
@ -80,7 +80,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: DigitizedCollection records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
@ -122,7 +122,7 @@ classes:
|
|||
- template_specificity
|
||||
slot_usage:
|
||||
custodian_types:
|
||||
equals_expression: '["A"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: WebArchiveCollection records are held by ARCHIVE (A) type custodians
|
||||
specificity_annotation:
|
||||
|
|
|
|||
|
|
@ -158,7 +158,7 @@ classes:
|
|||
examples:
|
||||
- value: 2-3 business days
|
||||
custodian_types:
|
||||
equals_expression: '["A", "D"]'
|
||||
equals_expression: '["hc:ArchiveOrganizationType", "hc:DigitalPlatformType"]'
|
||||
custodian_types_rationale:
|
||||
equals_string: Digital archives combine archive (A) and digital platform (D).
|
||||
specificity_annotation:
|
||||
|
|
@ -209,20 +209,6 @@ classes:
|
|||
access_application_url: https://archive.example.org/apply
|
||||
typical_approval_time: 5-10 business days
|
||||
description: Dim archive with researcher access only
|
||||
slots:
|
||||
default_access_policy:
|
||||
description: Default access policy for dim archive
|
||||
range: AccessPolicy
|
||||
restriction_categories:
|
||||
description: Categories of restrictions applied
|
||||
range: string
|
||||
multivalued: true
|
||||
access_application_url:
|
||||
description: URL for access application
|
||||
range: uri
|
||||
typical_approval_time:
|
||||
description: Typical time for approval
|
||||
range: string
|
||||
DimArchivesRecordSetType:
|
||||
description: |
|
||||
A rico:RecordSetType for classifying collections held by DimArchives custodians.
|
||||
|
|
@ -238,7 +224,7 @@ slots:
|
|||
- DimArchives
|
||||
- rico:RecordSetType
|
||||
annotations:
|
||||
custodian_types: '["A"]'
|
||||
custodian_types: '["hc:ArchiveOrganizationType"]'
|
||||
custodian_types_rationale: DimArchivesRecordSetType classifies collections held
|
||||
by ARCHIVE (A) type custodians
|
||||
linked_custodian_type: DimArchives
|
||||
|
|
@ -247,14 +233,20 @@ slots:
|
|||
specificity_rationale: Type taxonomy class.
|
||||
specificity_annotation_timestamp: '2026-01-06T00:26:29.678263Z'
|
||||
specificity_annotation_agent: opencode-claude-sonnet-4
|
||||
template_specificity:
|
||||
archive_search: 0.2
|
||||
museum_search: 0.75
|
||||
library_search: 0.75
|
||||
collection_discovery: 0.75
|
||||
person_research: 0.75
|
||||
location_browse: 0.75
|
||||
identifier_lookup: 0.75
|
||||
organizational_change: 0.75
|
||||
digital_platform: 0.75
|
||||
general_heritage: 0.75
|
||||
template_specificity: '{"archive_search": 0.2, "museum_search": 0.75, "library_search": 0.75, "collection_discovery": 0.75, "person_research": 0.75, "location_browse": 0.75, "identifier_lookup": 0.75, "organizational_change": 0.75, "digital_platform": 0.75, "general_heritage": 0.75}'
|
||||
|
||||
|
||||
slots:
|
||||
default_access_policy:
|
||||
description: Default access policy for dim archive
|
||||
range: AccessPolicy
|
||||
restriction_categories:
|
||||
description: Categories of restrictions applied
|
||||
range: string
|
||||
multivalued: true
|
||||
access_application_url:
|
||||
description: URL for access application
|
||||
range: uri
|
||||
typical_approval_time:
|
||||
description: Typical time for approval
|
||||
range: string
|
||||
|
|
|
|||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue