Add new slots for heritage custodian ontology

- Introduced `has_api_version`, `has_appellation_language`, `has_appellation_type`, `has_appellation_value`, `has_applicable_country`, `has_application_deadline`, `has_application_opening_date`, `has_appraisal_note`, `has_approval_date`, `has_archdiocese_name`, `has_architectural_style`, `has_archival_reference`, `has_archive_description`, `has_archive_memento_uri`, `has_archive_name`, `has_archive_path`, `has_archive_search_score`, `has_arrangement`, `has_arrangement_level`, `has_arrangement_note`, `has_articles_archival_stage`, `has_articles_document_format`, `has_articles_document_url`, `has_articles_of_association`, `has_or_had_altitude`, `has_or_had_annotation`, `has_or_had_arrangement`, `has_or_had_document`, `has_or_had_reason`, `has_or_had_style`, `is_or_was_amended_through`, `is_or_was_approved_on`, `is_or_was_archived_as`, `is_or_was_due_on`, `is_or_was_opened_on`, and `is_or_was_used_in` slots.
- Each slot includes detailed descriptions, range specifications, and appropriate mappings to existing ontologies.
This commit is contained in:
kempersc 2026-01-27 10:07:16 +01:00
parent 140ef25b96
commit 80eb3d969c
82 changed files with 3786 additions and 791 deletions

View file

@ -0,0 +1,17 @@
# Rule: Do Not Delete Entries from slot_fixes.yaml
**CRITICAL**: Entries in `schemas/20251121/linkml/modules/slots/slot_fixes.yaml` MUST NEVER be deleted.
This file serves as a persistent audit log and migration tracking registry.
**Protocol**:
1. **Process** the migration specified in the `revision` section.
2. **Update** the `processed` section:
* Set `status: true`.
* Add a `notes` field describing the action taken (e.g., "Migrated to has_or_had_name + PersonName class. Slot archived.").
* Add a `date` field (YYYY-MM-DD).
3. **Keep** the original entry intact.
**Forbidden**:
* ❌ Deleting a processed block.
* ❌ Removing an entry because the slot file doesn't exist (mark as processed with note "Slot file not found, skipped").

View file

@ -0,0 +1,32 @@
# Rule: Preserve Bespoke Slots Until Refactoring
**Identifier**: `preserve-bespoke-slots-until-refactoring`
**Severity**: **CRITICAL**
## Core Directive
**DO NOT remove or migrate "additional" bespoke slots during generic migration passes unless they are the specific target of the current task.**
## Context
When migrating a specific slot (e.g., `has_approval_date`), you may encounter other bespoke or legacy slots in the same class file (e.g., `innovation_budget`, `operating_budget`).
**YOU MUST**:
* ✅ Migrate ONLY the specific slot you were instructed to work on.
* ✅ Leave other bespoke slots exactly as they are.
* ✅ Focus strictly on the current migration target.
**YOU MUST NOT**:
* ❌ Proactively migrate "nearby" slots just because they look like they need refactoring.
* ❌ Remove slots that seem unused or redundant without specific instruction.
* ❌ "Clean up" the class file by removing legacy attributes.
## Rationale
Refactoring is a separate, planned phase. Mixing opportunistic refactoring with systematic slot migration increases the risk of regression and makes changes harder to review. "We will refactor those later."
## Workflow
1. **Identify Target**: Identify the specific slot(s) assigned for migration (from `slot_fixes.yaml` or user prompt).
2. **Execute Migration**: Apply changes ONLY for those slots.
3. **Ignore Others**: Do not touch other slots in the file, even if they violate other rules (like Rule 39 or Rule 53). Those will be handled in their own dedicated tasks.

File diff suppressed because it is too large Load diff

View file

@ -1660,6 +1660,7 @@ class MultiSourceRetriever:
only_heritage_relevant: bool = False,
only_wcms: bool = False,
using: str | None = None,
extra_filters: dict[str, Any] | None = None,
) -> list[Any]:
"""Search for persons/staff in the heritage_persons collection.
@ -1672,20 +1673,29 @@ class MultiSourceRetriever:
only_heritage_relevant: Only return heritage-relevant staff
only_wcms: Only return WCMS-registered profiles
using: Optional embedding model to use (e.g., 'minilm_384', 'openai_1536')
extra_filters: Optional extra filters for Qdrant
Returns:
List of RetrievedPerson objects
"""
if self.qdrant:
try:
return self.qdrant.search_persons( # type: ignore[no-any-return]
query=query,
k=k,
filter_custodian=filter_custodian,
only_heritage_relevant=only_heritage_relevant,
only_wcms=only_wcms,
using=using,
)
# Dynamically check if qdrant.search_persons supports extra_filters
# This handles case where HybridRetriever signature varies
import inspect
sig = inspect.signature(self.qdrant.search_persons)
kwargs = {
"query": query,
"k": k,
"filter_custodian": filter_custodian,
"only_heritage_relevant": only_heritage_relevant,
"only_wcms": only_wcms,
"using": using,
}
if "extra_filters" in sig.parameters:
kwargs["extra_filters"] = extra_filters
return self.qdrant.search_persons(**kwargs) # type: ignore[no-any-return]
except Exception as e:
logger.error(f"Person search failed: {e}")
return []
@ -2755,11 +2765,18 @@ async def person_search(request: PersonSearchRequest) -> PersonSearchResponse:
# Augment query for better recall on domain names if it looks like a domain search
# "nos" -> "nos email domain nos" to guide vector search towards email addresses
search_query = request.query
extra_filters = None
# Check for single word domain-like queries
if len(search_query.split()) == 1 and len(search_query) > 2 and "@" not in search_query:
# Heuristic: single word queries might be domain searches
# We append "email domain" context to guide the embedding
search_query = f"{search_query} email domain {search_query}"
# We use MatchText filtering on email field to find substring matches
# Qdrant "match": {"text": "nos"} performs token-based matching
extra_filters = {"email": {"match": {"text": search_query}}}
logger.info(f"[PersonSearch] Potential domain search detected for '{search_query}'. Applying strict email filter: {extra_filters}")
logger.info(f"[PersonSearch] Executing search for '{search_query}' (extra_filters={extra_filters})")
# Use the hybrid retriever's person search
results = retriever.search_persons(
query=search_query,
@ -2768,8 +2785,27 @@ async def person_search(request: PersonSearchRequest) -> PersonSearchResponse:
only_heritage_relevant=request.only_heritage_relevant,
only_wcms=request.only_wcms,
using=request.embedding_model, # Pass embedding model
extra_filters=extra_filters,
)
# FALLBACK: If strict domain filter yielded no results, try standard vector search
# This fixes the issue where searching for names like "willem" (which look like domains)
# would fail because they don't appear in emails.
if extra_filters and not results:
logger.info(f"[PersonSearch] No results with email filter for '{search_query}'. Falling back to standard vector search.")
results = retriever.search_persons(
query=search_query,
k=request.k,
filter_custodian=request.filter_custodian,
only_heritage_relevant=request.only_heritage_relevant,
only_wcms=request.only_wcms,
using=request.embedding_model,
extra_filters=None, # Disable filter for fallback
)
logger.info(f"[PersonSearch] Fallback search returned {len(results)} results")
logger.info(f"[PersonSearch] Final result count: {len(results)}")
# Determine which embedding model was actually used
embedding_model_used = None
qdrant = retriever.qdrant

View file

@ -0,0 +1,846 @@
"""
Multi-Embedding Retriever for Heritage Data
Supports multiple embedding models using Qdrant's named vectors feature.
This enables:
- A/B testing different embedding models
- Cost optimization (cheap local embeddings vs paid API embeddings)
- Gradual migration between embedding models
- Fallback when one model is unavailable
Supported Embedding Models:
- openai_1536: text-embedding-3-small (1536-dim, $0.02/1M tokens)
- minilm_384: all-MiniLM-L6-v2 (384-dim, free/local)
- bge_768: bge-base-en-v1.5 (768-dim, free/local, high quality)
Collection Architecture:
Each collection has named vectors for each embedding model:
heritage_custodians:
vectors:
"openai_1536": VectorParams(size=1536)
"minilm_384": VectorParams(size=384)
payload: {name, ghcid, institution_type, ...}
heritage_persons:
vectors:
"openai_1536": VectorParams(size=1536)
"minilm_384": VectorParams(size=384)
payload: {name, headline, custodian_name, ...}
Usage:
retriever = MultiEmbeddingRetriever()
# Search with default model (auto-select based on availability)
results = retriever.search("museums in Amsterdam")
# Search with specific model
results = retriever.search("museums in Amsterdam", using="minilm_384")
# A/B test comparison
comparison = retriever.compare_models("museums in Amsterdam")
"""
import hashlib
import logging
import os
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Literal
logger = logging.getLogger(__name__)
class EmbeddingModel(str, Enum):
"""Supported embedding models with their configurations."""
OPENAI_1536 = "openai_1536"
MINILM_384 = "minilm_384"
BGE_768 = "bge_768"
@property
def dimension(self) -> int:
"""Get the vector dimension for this model."""
dims = {
"openai_1536": 1536,
"minilm_384": 384,
"bge_768": 768,
}
return dims[self.value]
@property
def model_name(self) -> str:
"""Get the actual model name for loading."""
names = {
"openai_1536": "text-embedding-3-small",
"minilm_384": "all-MiniLM-L6-v2",
"bge_768": "BAAI/bge-base-en-v1.5",
}
return names[self.value]
@property
def is_local(self) -> bool:
"""Check if this model runs locally (no API calls)."""
return self.value in ("minilm_384", "bge_768")
@property
def cost_per_1m_tokens(self) -> float:
"""Approximate cost per 1M tokens (0 for local models)."""
costs = {
"openai_1536": 0.02,
"minilm_384": 0.0,
"bge_768": 0.0,
}
return costs[self.value]
@dataclass
class MultiEmbeddingConfig:
"""Configuration for multi-embedding retriever."""
# Qdrant connection
qdrant_host: str = "localhost"
qdrant_port: int = 6333
qdrant_https: bool = False
qdrant_prefix: str | None = None
# API keys
openai_api_key: str | None = None
# Default embedding model preference order
# First available model is used if no explicit model is specified
model_preference: list[EmbeddingModel] = field(default_factory=lambda: [
EmbeddingModel.MINILM_384, # Free, fast, good quality
EmbeddingModel.OPENAI_1536, # Higher quality, paid
EmbeddingModel.BGE_768, # Free, high quality, slower
])
# Collection names
institutions_collection: str = "heritage_custodians"
persons_collection: str = "heritage_persons"
# Search defaults
default_k: int = 10
class MultiEmbeddingRetriever:
"""Retriever supporting multiple embedding models via Qdrant named vectors.
This class manages multiple embedding models and allows searching with
any available model. It handles:
- Model lazy-loading
- Automatic model selection based on availability
- Named vector creation and search
- A/B testing between models
"""
def __init__(self, config: MultiEmbeddingConfig | None = None):
"""Initialize multi-embedding retriever.
Args:
config: Configuration options. If None, uses environment variables.
"""
self.config = config or self._config_from_env()
# Lazy-loaded clients
self._qdrant_client = None
self._openai_client = None
self._st_models: dict[str, Any] = {} # Sentence transformer models
# Track available models per collection
self._available_models: dict[str, set[EmbeddingModel]] = {}
# Track whether each collection uses named vectors (vs single unnamed vector)
self._uses_named_vectors: dict[str, bool] = {}
logger.info(f"MultiEmbeddingRetriever initialized with preference: {[m.value for m in self.config.model_preference]}")
@staticmethod
def _config_from_env() -> MultiEmbeddingConfig:
"""Create configuration from environment variables."""
use_production = os.getenv("QDRANT_USE_PRODUCTION", "false").lower() == "true"
if use_production:
return MultiEmbeddingConfig(
qdrant_host=os.getenv("QDRANT_PROD_HOST", "bronhouder.nl"),
qdrant_port=443,
qdrant_https=True,
qdrant_prefix=os.getenv("QDRANT_PROD_PREFIX", "qdrant"),
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
else:
return MultiEmbeddingConfig(
qdrant_host=os.getenv("QDRANT_HOST", "localhost"),
qdrant_port=int(os.getenv("QDRANT_PORT", "6333")),
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
@property
def qdrant_client(self):
"""Lazy-load Qdrant client."""
if self._qdrant_client is None:
from qdrant_client import QdrantClient
if self.config.qdrant_https:
self._qdrant_client = QdrantClient(
host=self.config.qdrant_host,
port=self.config.qdrant_port,
https=True,
prefix=self.config.qdrant_prefix,
prefer_grpc=False,
timeout=30,
)
logger.info(f"Connected to Qdrant: https://{self.config.qdrant_host}/{self.config.qdrant_prefix or ''}")
else:
self._qdrant_client = QdrantClient(
host=self.config.qdrant_host,
port=self.config.qdrant_port,
)
logger.info(f"Connected to Qdrant: {self.config.qdrant_host}:{self.config.qdrant_port}")
return self._qdrant_client
@property
def openai_client(self):
"""Lazy-load OpenAI client."""
if self._openai_client is None:
if not self.config.openai_api_key:
raise RuntimeError("OpenAI API key not configured")
import openai
self._openai_client = openai.OpenAI(api_key=self.config.openai_api_key)
return self._openai_client
def _load_sentence_transformer(self, model: EmbeddingModel) -> Any:
"""Lazy-load a sentence-transformers model.
Args:
model: The embedding model to load
Returns:
Loaded SentenceTransformer model
"""
if model.value not in self._st_models:
try:
from sentence_transformers import SentenceTransformer
self._st_models[model.value] = SentenceTransformer(model.model_name)
logger.info(f"Loaded sentence-transformers model: {model.model_name}")
except ImportError:
raise RuntimeError(
"sentence-transformers not installed. Run: pip install sentence-transformers"
)
return self._st_models[model.value]
def get_embedding(self, text: str, model: EmbeddingModel) -> list[float]:
"""Get embedding vector for text using specified model.
Args:
text: Text to embed
model: Embedding model to use
Returns:
Embedding vector as list of floats
"""
if model == EmbeddingModel.OPENAI_1536:
response = self.openai_client.embeddings.create(
input=text,
model=model.model_name,
)
return response.data[0].embedding
elif model in (EmbeddingModel.MINILM_384, EmbeddingModel.BGE_768):
st_model = self._load_sentence_transformer(model)
embedding = st_model.encode(text)
return embedding.tolist()
else:
raise ValueError(f"Unknown embedding model: {model}")
def get_embeddings_batch(
self,
texts: list[str],
model: EmbeddingModel,
batch_size: int = 32,
) -> list[list[float]]:
"""Get embedding vectors for multiple texts.
Args:
texts: List of texts to embed
model: Embedding model to use
batch_size: Batch size for processing
Returns:
List of embedding vectors
"""
if not texts:
return []
if model == EmbeddingModel.OPENAI_1536:
# OpenAI batch API (max 2048 per request)
all_embeddings = []
for i in range(0, len(texts), 2048):
batch = texts[i:i + 2048]
response = self.openai_client.embeddings.create(
input=batch,
model=model.model_name,
)
batch_embeddings = [item.embedding for item in sorted(response.data, key=lambda x: x.index)]
all_embeddings.extend(batch_embeddings)
return all_embeddings
elif model in (EmbeddingModel.MINILM_384, EmbeddingModel.BGE_768):
st_model = self._load_sentence_transformer(model)
embeddings = st_model.encode(texts, batch_size=batch_size, show_progress_bar=len(texts) > 100)
return embeddings.tolist()
else:
raise ValueError(f"Unknown embedding model: {model}")
def get_available_models(self, collection_name: str) -> set[EmbeddingModel]:
"""Get the embedding models available for a collection.
Checks which named vectors exist in the collection.
For single-vector collections, returns models matching the dimension.
Args:
collection_name: Name of the Qdrant collection
Returns:
Set of available EmbeddingModel values
"""
if collection_name in self._available_models:
return self._available_models[collection_name]
try:
info = self.qdrant_client.get_collection(collection_name)
vectors_config = info.config.params.vectors
available = set()
uses_named_vectors = False
# Check for named vectors (dict of vector configs)
if isinstance(vectors_config, dict):
# Named vectors - each key is a vector name
uses_named_vectors = True
for vector_name in vectors_config.keys():
try:
model = EmbeddingModel(vector_name)
available.add(model)
except ValueError:
logger.warning(f"Unknown vector name in collection: {vector_name}")
else:
# Single unnamed vector - check dimension to find compatible model
# Note: This doesn't mean we can use `using=model.value` in queries
uses_named_vectors = False
if hasattr(vectors_config, 'size'):
dim = vectors_config.size
for model in EmbeddingModel:
if model.dimension == dim:
available.add(model)
# Store both available models and whether named vectors are used
self._available_models[collection_name] = available
self._uses_named_vectors[collection_name] = uses_named_vectors
if uses_named_vectors:
logger.info(f"Collection '{collection_name}' uses named vectors: {[m.value for m in available]}")
else:
logger.info(f"Collection '{collection_name}' uses single vector (compatible with: {[m.value for m in available]})")
return available
except Exception as e:
logger.warning(f"Could not get available models for {collection_name}: {e}")
return set()
def uses_named_vectors(self, collection_name: str) -> bool:
"""Check if a collection uses named vectors (vs single unnamed vector).
Args:
collection_name: Name of the Qdrant collection
Returns:
True if collection has named vectors, False for single-vector collections
"""
# Ensure models are loaded (populates _uses_named_vectors)
self.get_available_models(collection_name)
return self._uses_named_vectors.get(collection_name, False)
def select_model(
self,
collection_name: str,
preferred: EmbeddingModel | None = None,
) -> EmbeddingModel | None:
"""Select the best available embedding model for a collection.
Args:
collection_name: Name of the collection
preferred: Preferred model (used if available)
Returns:
Selected EmbeddingModel or None if none available
"""
available = self.get_available_models(collection_name)
if not available:
# No named vectors - check if we can use any model
# This happens for legacy single-vector collections
try:
info = self.qdrant_client.get_collection(collection_name)
vectors_config = info.config.params.vectors
# Get vector dimension
dim = None
if hasattr(vectors_config, 'size'):
dim = vectors_config.size
elif isinstance(vectors_config, dict):
# Get first vector config
first_config = next(iter(vectors_config.values()), None)
if first_config and hasattr(first_config, 'size'):
dim = first_config.size
if dim:
for model in self.config.model_preference:
if model.dimension == dim:
return model
except Exception:
pass
return None
# If preferred model is available, use it
if preferred and preferred in available:
return preferred
# Otherwise, follow preference order
for model in self.config.model_preference:
if model in available:
# Check if model is usable (has API key if needed)
if model == EmbeddingModel.OPENAI_1536 and not self.config.openai_api_key:
continue
return model
return None
def search(
self,
query: str,
collection_name: str | None = None,
k: int | None = None,
using: EmbeddingModel | str | None = None,
filter_conditions: dict[str, Any] | None = None,
) -> list[dict[str, Any]]:
"""Search for similar documents using specified or auto-selected model.
Args:
query: Search query text
collection_name: Collection to search (default: institutions)
k: Number of results
using: Embedding model to use (auto-selected if None)
filter_conditions: Optional Qdrant filter conditions
Returns:
List of results with scores and payloads
"""
collection_name = collection_name or self.config.institutions_collection
k = k or self.config.default_k
# Resolve model
if using is not None:
if isinstance(using, str):
model = EmbeddingModel(using)
else:
model = using
else:
model = self.select_model(collection_name)
if model is None:
raise RuntimeError(f"No compatible embedding model for collection '{collection_name}'")
logger.info(f"Searching '{collection_name}' with {model.value}: {query[:50]}...")
# Get query embedding
query_vector = self.get_embedding(query, model)
# Build filter
from qdrant_client.http import models
query_filter = None
if filter_conditions:
query_filter = models.Filter(
must=[
models.FieldCondition(
key=key,
match=models.MatchValue(value=value),
)
for key, value in filter_conditions.items()
]
)
# Check if collection uses named vectors (not just single unnamed vector)
# Only pass `using=model.value` if collection has actual named vectors
use_named_vector = self.uses_named_vectors(collection_name)
# Search
if use_named_vector:
results = self.qdrant_client.query_points(
collection_name=collection_name,
query=query_vector,
using=model.value,
limit=k,
with_payload=True,
query_filter=query_filter,
)
else:
# Legacy single-vector search
results = self.qdrant_client.query_points(
collection_name=collection_name,
query=query_vector,
limit=k,
with_payload=True,
query_filter=query_filter,
)
return [
{
"id": str(point.id),
"score": point.score,
"model": model.value,
"payload": point.payload or {},
}
for point in results.points
]
def search_persons(
self,
query: str,
k: int | None = None,
using: EmbeddingModel | str | None = None,
filter_custodian: str | None = None,
only_heritage_relevant: bool = False,
only_wcms: bool = False,
) -> list[dict[str, Any]]:
"""Search for persons/staff in the heritage_persons collection.
Args:
query: Search query text
k: Number of results
using: Embedding model to use
filter_custodian: Optional custodian slug to filter by
only_heritage_relevant: Only return heritage-relevant staff
only_wcms: Only return WCMS-registered profiles (heritage sector users)
Returns:
List of person results with scores
"""
k = k or self.config.default_k
# Build filters
filters = {}
if filter_custodian:
filters["custodian_slug"] = filter_custodian
if only_wcms:
filters["has_wcms"] = True
# Search with over-fetch for post-filtering
results = self.search(
query=query,
collection_name=self.config.persons_collection,
k=k * 2,
using=using,
filter_conditions=filters if filters else None,
)
# Post-filter for heritage_relevant if needed
if only_heritage_relevant:
results = [r for r in results if r.get("payload", {}).get("heritage_relevant", False)]
# Format results
formatted = []
for r in results[:k]:
payload = r.get("payload", {})
formatted.append({
"person_id": payload.get("staff_id", "") or hashlib.md5(
f"{payload.get('custodian_slug', '')}:{payload.get('name', '')}".encode()
).hexdigest()[:16],
"name": payload.get("name", ""),
"headline": payload.get("headline"),
"custodian_name": payload.get("custodian_name"),
"custodian_slug": payload.get("custodian_slug"),
"location": payload.get("location"),
"heritage_relevant": payload.get("heritage_relevant", False),
"heritage_type": payload.get("heritage_type"),
"linkedin_url": payload.get("linkedin_url"),
"score": r["score"],
"model": r["model"],
})
return formatted
def compare_models(
self,
query: str,
collection_name: str | None = None,
k: int = 10,
models: list[EmbeddingModel] | None = None,
) -> dict[str, Any]:
"""A/B test comparison of multiple embedding models.
Args:
query: Search query
collection_name: Collection to search
k: Number of results per model
models: Models to compare (default: all available)
Returns:
Dict with results per model and overlap analysis
"""
collection_name = collection_name or self.config.institutions_collection
# Determine which models to compare
available = self.get_available_models(collection_name)
if models:
models_to_test = [m for m in models if m in available]
else:
models_to_test = list(available)
if not models_to_test:
return {"error": "No models available for comparison"}
results = {}
all_ids = {}
for model in models_to_test:
try:
model_results = self.search(
query=query,
collection_name=collection_name,
k=k,
using=model,
)
results[model.value] = model_results
all_ids[model.value] = {r["id"] for r in model_results}
except Exception as e:
results[model.value] = {"error": str(e)}
all_ids[model.value] = set()
# Calculate overlap between models
overlap = {}
model_values = list(all_ids.keys())
for i, m1 in enumerate(model_values):
for m2 in model_values[i + 1:]:
if all_ids[m1] and all_ids[m2]:
intersection = all_ids[m1] & all_ids[m2]
union = all_ids[m1] | all_ids[m2]
jaccard = len(intersection) / len(union) if union else 0
overlap[f"{m1}_vs_{m2}"] = {
"jaccard_similarity": round(jaccard, 3),
"common_results": len(intersection),
"total_unique": len(union),
}
return {
"query": query,
"collection": collection_name,
"k": k,
"results": results,
"overlap_analysis": overlap,
}
def create_multi_embedding_collection(
self,
collection_name: str,
models: list[EmbeddingModel] | None = None,
) -> bool:
"""Create a new collection with named vectors for multiple embedding models.
Args:
collection_name: Name for the new collection
models: Embedding models to support (default: all)
Returns:
True if created successfully
"""
from qdrant_client.http.models import Distance, VectorParams
models = models or list(EmbeddingModel)
vectors_config = {
model.value: VectorParams(
size=model.dimension,
distance=Distance.COSINE,
)
for model in models
}
try:
self.qdrant_client.create_collection(
collection_name=collection_name,
vectors_config=vectors_config,
)
logger.info(f"Created multi-embedding collection '{collection_name}' with {[m.value for m in models]}")
# Clear cache
self._available_models.pop(collection_name, None)
return True
except Exception as e:
logger.error(f"Failed to create collection: {e}")
return False
def add_documents_multi_embedding(
self,
documents: list[dict[str, Any]],
collection_name: str,
models: list[EmbeddingModel] | None = None,
batch_size: int = 100,
) -> int:
"""Add documents with embeddings from multiple models.
Args:
documents: List of documents with 'text' and optional 'metadata' fields
collection_name: Target collection
models: Models to generate embeddings for (default: all available)
batch_size: Batch size for processing
Returns:
Number of documents added
"""
from qdrant_client.http import models as qmodels
# Determine which models to use
available = self.get_available_models(collection_name)
if models:
models_to_use = [m for m in models if m in available]
else:
models_to_use = list(available)
if not models_to_use:
raise RuntimeError(f"No embedding models available for collection '{collection_name}'")
# Filter valid documents
valid_docs = [d for d in documents if d.get("text")]
total_indexed = 0
for i in range(0, len(valid_docs), batch_size):
batch = valid_docs[i:i + batch_size]
texts = [d["text"] for d in batch]
# Generate embeddings for each model
embeddings_by_model = {}
for model in models_to_use:
try:
embeddings_by_model[model] = self.get_embeddings_batch(texts, model)
except Exception as e:
logger.warning(f"Failed to get {model.value} embeddings: {e}")
if not embeddings_by_model:
continue
# Create points with named vectors
points = []
for j, doc in enumerate(batch):
text = doc["text"]
metadata = doc.get("metadata", {})
point_id = doc.get("id") or hashlib.md5(text.encode()).hexdigest()
# Build named vectors dict
vectors = {}
for model, model_embeddings in embeddings_by_model.items():
vectors[model.value] = model_embeddings[j]
points.append(qmodels.PointStruct(
id=point_id,
vector=vectors,
payload={
"text": text,
**metadata,
}
))
# Upsert batch
self.qdrant_client.upsert(
collection_name=collection_name,
points=points,
)
total_indexed += len(points)
logger.info(f"Indexed {total_indexed}/{len(valid_docs)} documents with {len(models_to_use)} models")
return total_indexed
def get_stats(self) -> dict[str, Any]:
"""Get statistics about collections and available models.
Returns:
Dict with collection stats and model availability
"""
stats = {
"config": {
"qdrant_host": self.config.qdrant_host,
"qdrant_port": self.config.qdrant_port,
"model_preference": [m.value for m in self.config.model_preference],
"openai_available": bool(self.config.openai_api_key),
},
"collections": {},
}
for collection_name in [self.config.institutions_collection, self.config.persons_collection]:
try:
info = self.qdrant_client.get_collection(collection_name)
available_models = self.get_available_models(collection_name)
selected_model = self.select_model(collection_name)
stats["collections"][collection_name] = {
"vectors_count": info.vectors_count,
"points_count": info.points_count,
"status": info.status.value if info.status else "unknown",
"available_models": [m.value for m in available_models],
"selected_model": selected_model.value if selected_model else None,
}
except Exception as e:
stats["collections"][collection_name] = {"error": str(e)}
return stats
def close(self):
"""Close all connections."""
if self._qdrant_client:
self._qdrant_client.close()
self._qdrant_client = None
self._st_models.clear()
self._available_models.clear()
self._uses_named_vectors.clear()
def create_multi_embedding_retriever(use_production: bool | None = None) -> MultiEmbeddingRetriever:
"""Factory function to create a MultiEmbeddingRetriever.
Args:
use_production: If True, connect to production Qdrant.
Defaults to QDRANT_USE_PRODUCTION env var.
Returns:
Configured MultiEmbeddingRetriever instance
"""
if use_production is None:
use_production = os.getenv("QDRANT_USE_PRODUCTION", "").lower() in ("true", "1", "yes")
if use_production:
config = MultiEmbeddingConfig(
qdrant_host=os.getenv("QDRANT_PROD_HOST", "bronhouder.nl"),
qdrant_port=443,
qdrant_https=True,
qdrant_prefix=os.getenv("QDRANT_PROD_PREFIX", "qdrant"),
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
else:
config = MultiEmbeddingConfig(
qdrant_host=os.getenv("QDRANT_HOST", "localhost"),
qdrant_port=int(os.getenv("QDRANT_PORT", "6333")),
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
return MultiEmbeddingRetriever(config)

View file

@ -1,5 +1,5 @@
{
"generated": "2026-01-27T08:03:23.376Z",
"generated": "2026-01-27T08:04:51.838Z",
"schemaRoot": "/schemas/20251121/linkml",
"totalFiles": 3014,
"categoryCounts": {

View file

@ -1,5 +1,5 @@
{
"generated": "2026-01-27T08:04:51.838Z",
"generated": "2026-01-27T09:07:17.016Z",
"schemaRoot": "/schemas/20251121/linkml",
"totalFiles": 3014,
"categoryCounts": {

View file

@ -0,0 +1,7 @@
classes:
APIEndpoint:
class_uri: schema:EntryPoint
description: "An API endpoint."
slots:
- has_or_had_url
- has_or_had_description

View file

@ -0,0 +1,8 @@
classes:
APIRequest:
class_uri: prov:Activity
description: "An API request event."
slots:
- has_or_had_provenance
- has_or_had_endpoint
- has_or_had_version

View file

@ -0,0 +1,7 @@
classes:
APIVersion:
class_uri: schema:SoftwareApplication
description: "Version of an API."
slots:
- has_or_had_label
- has_or_had_identifier

View file

@ -0,0 +1,7 @@
classes:
Altitude:
class_uri: schema:QuantitativeValue
description: "The altitude of a place."
slots:
- has_or_had_value
- has_or_had_unit

View file

@ -0,0 +1,8 @@
classes:
AmendmentEvent:
class_uri: prov:Activity
description: "An event where a document or agreement was amended."
slots:
- temporal_extent
- has_or_had_description
- has_or_had_identifier

View file

@ -0,0 +1,8 @@
classes:
AnnexCreationEvent:
class_uri: prov:Activity
description: "An event where an annex was created or established."
slots:
- temporal_extent
- has_or_had_description
- has_or_had_reason

View file

@ -0,0 +1,6 @@
classes:
AppellationType:
class_uri: skos:Concept
description: "Type of appellation/name."
slots:
- has_or_had_label

View file

@ -0,0 +1,6 @@
classes:
Archdiocese:
class_uri: schema:AdministrativeArea
description: "An archdiocese."
slots:
- has_or_had_label

View file

@ -0,0 +1,7 @@
classes:
ArchitecturalStyle:
class_uri: skos:Concept
description: "An architectural style."
slots:
- has_or_had_label
- has_or_had_description

View file

@ -0,0 +1,7 @@
classes:
ArchivalReference:
class_uri: rico:Identifier
description: "An archival reference code."
slots:
- has_or_had_identifier
- has_or_had_description

View file

@ -0,0 +1,9 @@
classes:
Arrangement:
class_uri: rico:Arrangement
description: "The arrangement of a collection."
slots:
- has_or_had_description
- has_or_had_type
- has_or_had_level
- has_or_had_note

View file

@ -0,0 +1,7 @@
classes:
ArrangementLevel:
class_uri: skos:Concept
description: "Level of arrangement."
slots:
- has_or_had_label
- has_or_had_rank

View file

@ -0,0 +1,6 @@
classes:
ArrangementType:
class_uri: skos:Concept
description: "Type of arrangement."
slots:
- has_or_had_label

View file

@ -16,11 +16,15 @@ imports:
- ../slots/supersede_articles # was: supersede, superseded_by - migrated to class-specific slots 2026-01-16
- ../slots/is_or_was_effective_at
- ./ReconstructedEntity
- ../slots/has_amendment_history
- ../slots/is_or_was_amended_through # was: has_amendment_history - migrated per Rule 53 (2026-01-27)
- ./AmendmentEvent
- ../slots/is_or_was_archived_in
- ../slots/has_articles_archival_stage
- ../slots/has_articles_document_format
- ../slots/has_articles_document_url
- ../slots/has_or_had_status # was: has_articles_archival_stage - migrated per Rule 53 (2026-01-27)
- ../slots/has_or_had_format # was: has_articles_document_format - migrated per Rule 53 (2026-01-27)
- ../slots/has_or_had_url # was: has_articles_document_url - migrated per Rule 53 (2026-01-27)
- ./RecordCycleStatus
- ./DocumentFormat
- ./URL
- ../slots/is_or_was_included_in # was: collected_in - migrated per Rule 53 (2026-01-19)
- ../slots/has_or_had_description
- ./Description
@ -129,11 +133,11 @@ classes:
- prov:Entity
- rov:orgType
slots:
- has_amendment_history
- is_or_was_amended_through # was: has_amendment_history - migrated per Rule 53 (2026-01-27)
- is_or_was_archived_in
- has_articles_archival_stage
- has_articles_document_format
- has_articles_document_url
- has_or_had_status # was: has_articles_archival_stage
- has_or_had_format # was: has_articles_document_format
- has_or_had_url # was: has_articles_document_url
- is_or_was_included_in # was: collected_in - migrated per Rule 53 (2026-01-19)
- has_or_had_description
- has_or_had_title

View file

@ -10,7 +10,9 @@ imports:
- ./OrganizationalStructure
- ./ReconstructedEntity
- ../slots/revision_date
- ../slots/has_approval_date
- ../slots/is_or_was_approved_on
- ../classes/Timestamp
- ../classes/TimeSpan
- ../slots/has_or_had_acquisition_budget
- ../slots/is_or_was_approved_by # MIGRATED: was ../slots/approved_by (2026-01-15)
# REMOVED - migrated to has_or_had_currency (Rule 53)
@ -470,7 +472,8 @@ classes:
has_or_had_label: "External Grants & Subsidies"
internal_funding: 25000000.0
has_or_had_endowment_draw: 5000000.0
approval_date: '2023-11-15'
is_or_was_approved_on:
start_of_the_start: '2023-11-15'
is_or_was_approved_by:
approver_name: Board of Directors
has_or_had_status:
@ -510,7 +513,8 @@ classes:
quantity_value: 6000000.0
has_or_had_label: "Province Subsidy"
internal_funding: 2500000.0
approval_date: '2024-03-01'
is_or_was_approved_on:
start_of_the_start: '2024-03-01'
is_or_was_approved_by:
approver_name: Province of Noord-Holland
has_or_had_status:

View file

@ -17,8 +17,10 @@ imports:
- ./FundingRequirement
- ../slots/contact_email
- ../slots/keyword
- ../slots/has_application_deadline
- ../slots/has_application_opening_date
- ../slots/is_or_was_due_on
- ../slots/end_of_the_end
- ../slots/is_or_was_opened_on
- ../slots/start_of_the_start
# REMOVED 2026-01-17: call_description - migrated to has_or_had_description per Rule 53
# REMOVED 2026-01-17: call_id, call_identifier - migrated to has_or_had_identifier per Rule 53
# REMOVED 2026-01-17: call_short_name, call_title - migrated to has_or_had_label per Rule 53
@ -111,146 +113,29 @@ classes:
- schema:Action
- dcterms:BibliographicResource
slots:
- has_application_deadline
- has_application_opening_date
- has_or_had_description # was: call_description - migrated per Rule 53 (2026-01-17)
- has_or_had_identifier # was: call_id, call_identifier - migrated per Rule 53 (2026-01-17)
- has_or_had_label # was: call_short_name, call_title - migrated per Rule 53 (2026-01-17)
- has_or_had_status # was: call_status - migrated per Rule 53 (2026-01-17)
- has_or_had_url # was: call_url - migrated per Rule 53 (2026-01-17)
# REMOVED 2026-01-19: co_funding_required - migrated to requires_or_required + CoFunding (Rule 53)
- requires_or_required # was: co_funding_required - migrated per Rule 53 (2026-01-19)
- contact_email
- eligible_applicant
- eligible_country
- has_or_had_funded # was: funded_project - migrated per Rule 53 (2026-01-26)
- offers_or_offered # was: funding_rate - migrated per Rule 53 (2026-01-26)
- heritage_type
- info_session_date
- issuing_organisation
- keyword
- minimum_partner
- parent_programme
- partnership_required
- programme_year
- related_call
- has_or_had_requirement
- results_expected_date
- specificity_annotation
- has_or_had_score # was: template_specificity - migrated per Rule 53 (2026-01-17)
- is_or_was_categorized_as # was: thematic_area - migrated per Rule 53
- has_or_had_budget # was: total_budget - migrated per Rule 53 (2026-01-15)
- has_or_had_range
- has_or_had_provenance # was: web_observation - migrated per Rule 53
- is_or_was_due_on
- is_or_was_opened_on
slot_usage:
has_or_had_identifier:
identifier: true
required: true
range: Identifier
multivalued: true
inlined: true
inlined_as_list: true
is_or_was_due_on:
range: TimeSpan
description: |
Unique identifier(s) for this funding call.
MIGRATED from call_id, call_identifier per slot_fixes.yaml (Rule 53, 2026-01-17).
Consolidates:
- call_id (dcterms:identifier) - Primary call identifier (identifier: true)
- call_identifier (dcterms:identifier) - External identifiers (EU F&T, etc.)
Format: https://nde.nl/ontology/hc/call/{issuing-org-slug}/{call-code}
Deadline for submitting applications.
Replaces has_application_deadline per Rule 53.
Use end_of_the_end for the exact deadline timestamp.
examples:
- value:
identifier_value: https://nde.nl/ontology/hc/call/ec/cl2-2025-heritage-01
identifier_scheme: URI
description: Horizon Europe CL2 heritage call (primary identifier)
- value:
identifier_value: HORIZON-CL2-2025-HERITAGE-01
identifier_scheme: EU_FUNDING_TENDERS
description: EU Funding & Tenders portal ID
- value:
identifier_value: https://nde.nl/ontology/hc/call/nlhf/medium-grants-2025
identifier_scheme: URI
description: National Lottery Heritage Fund medium grants
has_or_had_label:
required: true
range: string
multivalued: true
end_of_the_end: "2023-12-31T23:59:59Z"
description: Application deadline
is_or_was_opened_on:
range: TimeSpan
description: |
Human-readable labels for this funding call.
MIGRATED from call_title, call_short_name per slot_fixes.yaml (Rule 53, 2026-01-17).
Consolidates:
- call_title (dcterms:title) - Official call title (required)
- call_short_name (skos:altLabel) - Short name/code
First label should be the official title, additional labels are short names/codes.
examples:
- value: Cultural heritage, cultural and creative industries
description: Horizon Europe Cluster 2 call title (official)
- value: HORIZON-CL2-2025-HERITAGE-01
description: Horizon Europe call code (short name)
- value: European Cooperation Projects
description: Creative Europe call title (official)
- value: CREA-CULT-2025-COOP
description: Creative Europe cooperation call code
has_or_had_status:
required: true
range: CallForApplicationStatusEnum
description: |
Current lifecycle status of the funding call.
MIGRATED from call_status per slot_fixes.yaml (Rule 53, 2026-01-17).
See CallForApplicationStatusEnum for status values:
- ANNOUNCED: Call published, not yet open
- OPEN: Currently accepting applications
- CLOSING_SOON: < 30 days until deadline
- CLOSED: Deadline passed
- UNDER_REVIEW: Evaluation in progress
- RESULTS_PUBLISHED: Decisions announced
- CANCELLED: Call terminated
- REOPENED: Previously closed call reactivated
examples:
- value: OPEN
description: Currently accepting applications
- value: CLOSING_SOON
description: Deadline approaching
has_or_had_description:
range: string
description: |
Detailed description of the funding call and its objectives.
MIGRATED from call_description per slot_fixes.yaml (Rule 53, 2026-01-17).
Maps to dcterms:description for grant/funding opportunity descriptions.
examples:
- value: |
This call supports research and innovation addressing cultural heritage
preservation, digitisation, and access. Projects should develop new
methods, technologies, and approaches for safeguarding tangible and
intangible cultural heritage.
description: Horizon Europe heritage call description
has_or_had_url:
range: URL
multivalued: true
inlined: true
inlined_as_list: true
description: |
Official call documentation or application portal URL(s).
MIGRATED from call_url per slot_fixes.yaml (Rule 53, 2026-01-17).
Maps to schema:url for web addresses.
Date when applications opened.
Replaces has_application_opening_date per Rule 53.
Use start_of_the_start for the opening timestamp.
examples:
- value:
url_value: https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/horizon-cl2-2025-heritage-01
url_type: application_portal
description: Horizon Europe call application portal
- value:
url_value: https://www.heritagefund.org.uk/funding/medium-grants
url_type: documentation
description: National Lottery Heritage Fund documentation
has_application_deadline:
required: true
range: date
start_of_the_start: "2023-01-01T00:00:00Z"
description: Opening date
examples:
- value: '2025-09-16'
description: Horizon Europe CL2 2025 deadline

View file

@ -58,7 +58,8 @@ imports:
- ../slots/is_or_was_revision_of # was: was_revision_of - migrated per Rule 53 (2026-01-15)
- ../slots/identifier
- ../slots/is_or_was_responsible_for # was: collections_under_responsibility - migrated per Rule 53 (2026-01-19)
- ../slots/has_articles_of_association
- ../slots/has_or_had_document # was: has_articles_of_association - migrated per Rule 53 (2026-01-27)
- ./ArticlesOfAssociation
- ../slots/registration_date
- ../slots/specificity_annotation
- ../slots/has_or_had_score # was: template_specificity - migrated per Rule 53 (2026-01-17)
@ -117,7 +118,7 @@ classes:
- is_or_was_responsible_for # was: collections_under_responsibility - migrated per Rule 53 (2026-01-19)
- is_or_was_dissolved_by
- defines_or_defined
- has_articles_of_association
- has_or_had_document # was: has_articles_of_association
- identifier
- legal_entity_type
- legal_form
@ -270,8 +271,12 @@ classes:
has_or_had_type: hierarchical
has_or_had_description: Board of trustees with director-led departments
description: Museum governance structure
has_articles_of_association:
has_or_had_document:
range: ArticlesOfAssociation
inlined: true
description: >-
Articles of Association or other founding documents.
MIGRATED from has_articles_of_association per Rule 53 (2026-01-27).
multivalued: true
required: false
examples:

View file

@ -13,7 +13,8 @@ imports:
- ../metadata
- ../slots/has_or_had_coordinates # was: latitude, longitude, accuracy - migrated per Rule 53 (2026-01-26)
- ./Coordinates
- ../slots/has_altitude
- ../slots/has_or_had_altitude # was: has_altitude - migrated per Rule 53 (2026-01-27)
- ./Altitude
- ../slots/has_or_had_geographic_extent # was: bounding_box - migrated per Rule 53/56 (2026-01-17)
- ../slots/has_or_had_identifier
- ../slots/coordinate_reference_system
@ -164,7 +165,7 @@ classes:
- crm:E53_Place
slots:
- has_or_had_coordinates # was: latitude, longitude, accuracy
- has_altitude
- has_or_had_altitude # was: has_altitude - migrated per Rule 53 (2026-01-27)
- has_or_had_geographic_extent # was: bounding_box - migrated per Rule 53/56 (2026-01-17)
- has_or_had_identifier
- coordinate_reference_system

View file

@ -14,7 +14,8 @@ imports:
- ../metadata
- ./TimeSpan
- ../enums/LoanStatusEnum
- ../slots/has_approval_date
- ../slots/is_or_was_approved_on
- ../classes/Timestamp
- ../slots/has_actual_return_date
- ../slots/is_or_was_based_on
- ../classes/Agreement
@ -101,133 +102,18 @@ classes:
slots:
- temporal_extent # was: has_actual_return_date - migrated per Rule 53 (2026-01-26)
- is_or_was_based_on
- has_approval_date
- custody_received_by # was: borrower - migrated per Rule 53/56 (2026-01-17)
- has_or_had_contact_point # was: borrower_contact - migrated per Rule 53/56 (2026-01-17)
# MIGRATED 2026-01-22: condition_on_return → is_or_was_returned + ReturnEvent (Rule 53)
- is_or_was_returned
- courier_detail
- courier_required
- has_or_had_custodian_type
- is_or_was_displayed_at
- has_or_had_objective # was: exhibition_ref - migrated per Rule 53 (2026-01-26)
- is_or_was_extended
- insurance_currency
- insurance_provider
- insurance_value
- lender
- lender_contact
- loan_agreement_url
- loan_end_date
- loan_id
- loan_note
- loan_number
- loan_purpose
- loan_start_date
- loan_status
- loan_timespan
- loan_type
- has_or_had_loaned_object
- original_end_date
- outbound_condition_report_url
- request_date
- return_condition_report_url
- shipping_method
- special_requirement
- specificity_annotation
- has_or_had_score # was: template_specificity - migrated per Rule 53 (2026-01-17)
- is_or_was_approved_on
slot_usage:
is_or_was_based_on:
range: Agreement
is_or_was_approved_on:
range: TimeSpan
description: |
The formal agreement governing the loan.
Replaces has_agreement_signed_date per Rule 53.
Date when the loan was approved.
Replaces has_approval_date per Rule 53.
Use start_of_the_start for the approval timestamp.
examples:
- value:
has_or_had_label: "Loan Agreement 2023-001"
is_or_was_signed_on: "2022-03-15"
description: Signed loan agreement
loan_id:
identifier: true
required: true
range: uriorcurie
examples:
- value: https://nde.nl/ontology/hc/loan/mauritshuis-rijksmuseum-2023-001
- value: https://nde.nl/ontology/hc/loan/british-museum-met-2024-003
loan_number:
required: false
range: string
examples:
- value: MH-OUT-2023-0042
description: Mauritshuis outgoing loan number
- value: RM-IN-2023-0127
description: Rijksmuseum incoming loan number
has_or_had_loaned_object:
required: true
range: uriorcurie
multivalued: true
inlined: false
examples:
- value: https://nde.nl/ontology/hc/object/mauritshuis-girl-pearl-earring
- value: https://nde.nl/ontology/hc/object/mauritshuis-view-delft
lender:
required: true
range: uriorcurie
inlined: false
examples:
- value: https://nde.nl/ontology/hc/custodian/nl/mauritshuis
lender_contact:
required: false
range: string
examples:
- value: Dr. Maria van der Berg, Registrar
custody_received_by: # was: borrower - migrated per Rule 53/56 (2026-01-17)
description: >-
Institution borrowing the object(s).
CIDOC-CRM: P29_custody_received_by - identifies the E39 Actor who receives custody.
required: true
range: uriorcurie
inlined: false
examples:
- value: https://nde.nl/ontology/hc/custodian/nl/rijksmuseum
has_or_had_contact_point: # was: borrower_contact - migrated per Rule 53/56 (2026-01-17)
description: >-
Contact person at borrowing institution for this loan.
required: false
range: string
examples:
- value: Anna de Wit, Exhibition Coordinator
loan_status:
required: true
range: LoanStatusEnum
examples:
- value: CLOSED
description: Completed loan
- value: ON_LOAN
description: Object currently at borrower
loan_type:
required: false
range: string
examples:
- value: EXHIBITION_LOAN
- value: STUDY_LOAN
- value: LONG_TERM_LOAN
loan_purpose:
required: false
range: string
examples:
- value: Major Vermeer retrospective exhibition marking 350th anniversary
- value: Technical examination for catalogue raisonné research
request_date:
required: false
range: date
examples:
- value: '2021-06-15'
has_approval_date:
required: false
range: date
examples:
- value: '2021-09-20'
start_of_the_start: "2021-09-20"
description: Approval date
has_agreement_signed_date:
required: false
range: date

View file

@ -0,0 +1,7 @@
classes:
Memento:
class_uri: schema:WebPage
description: "A web archive memento."
slots:
- has_or_had_url
- temporal_extent

View file

@ -0,0 +1,6 @@
classes:
ProvenancePath:
class_uri: prov:Plan
description: "A path or chain of provenance."
slots:
- has_or_had_description

View file

@ -0,0 +1,7 @@
classes:
Reason:
class_uri: skos:Concept
description: "A reason or justification."
slots:
- has_or_had_label
- has_or_had_description

View file

@ -0,0 +1,7 @@
classes:
RecordCycleStatus:
class_uri: skos:Concept
description: "The status of a record within its lifecycle."
slots:
- has_or_had_label
- has_or_had_description

View file

@ -0,0 +1,6 @@
classes:
SearchScore:
class_uri: schema:Rating
description: "A search relevance score."
slots:
- has_or_had_value

View file

@ -79,9 +79,9 @@ classes:
- as:Activity
- schema:ClaimReview
slots:
- has_annotation_motivation
- has_annotation_segment
- has_annotation_type
- has_or_had_rationale
- contains_or_contained
- has_or_had_type
# MIGRATED 2026-01-25: detection_count, detection_threshold → filters_or_filtered (Rule 53)
- filters_or_filtered
# REMOVED 2026-01-22: frame_sample_rate - migrated to analyzes_or_analyzed + VideoFrame + has_or_had_quantity (Rule 53)
@ -94,20 +94,36 @@ classes:
- has_or_had_score # was: template_specificity - migrated per Rule 53 (2026-01-17)
- analyzes_or_analyzed
slot_usage:
has_annotation_type:
range: AnnotationTypeEnum
has_or_had_type:
range: AnnotationType
required: true
description: Type of annotation (Object detection, Scene detection, etc.)
examples:
- value: OBJECT_DETECTION
- value:
has_or_had_code: OBJECT_DETECTION
has_or_had_label: Object Detection
description: Object and face detection annotation
has_annotation_segment:
range: VideoTimeSegment
contains_or_contained:
range: Segment
multivalued: true
required: false
inlined_as_list: true
description: >-
Segments (temporal or spatial) identified by the annotation.
MIGRATED from has_annotation_segment per Rule 53.
examples:
- value: '[{start_seconds: 30.0, end_seconds: 35.0, segment_text: ''Night Watch painting visible''}]'
- value:
has_or_had_label: 'Night Watch painting visible'
has_or_had_description: '30.0 - 35.0 seconds'
description: Object detection segment
has_or_had_rationale:
range: Rationale
required: false
description: Motivation for the annotation.
examples:
- value:
has_or_had_label: ClassifyingMotivation
description: Annotation for classification purposes
# DEPRECATED 2026-01-25: detection_threshold, detection_count → filters_or_filtered + DetectedEntity (Rule 53)
# Old: detection_threshold: 0.5, detection_count: 342
# New: filters_or_filtered with DetectedEntity containing Quantity and DetectionThreshold
@ -146,13 +162,6 @@ classes:
has_or_had_label: "High Precision"
description: "89 high-confidence detections"
# MIGRATED 2026-01-22: frame_sample_rate → analyzes_or_analyzed + VideoFrame + has_or_had_quantity (Rule 53)
# frame_sample_rate:
# range: float
# required: false
# minimum_value: 0.0
# examples:
# - value: 1.0
# description: Analyzed 1 frame per second
analyzes_or_analyzed:
description: |
MIGRATED 2026-01-22: Now supports VideoFrame class for frame_sample_rate migration.
@ -216,12 +225,6 @@ classes:
examples:
- value: false
description: No segmentation masks included
has_annotation_motivation:
range: AnnotationMotivationType
required: false
examples:
- value: ClassifyingMotivation
description: Annotation for classification purposes
comments:
- Abstract base for all CV/multimodal video annotations
- Extends VideoTextContent with frame-based analysis parameters

View file

@ -0,0 +1,5 @@
name: has_or_had_altitude
description: The altitude of a place.
slot_uri: wgs84:alt
range: Altitude
multivalued: false

View file

@ -0,0 +1,5 @@
name: has_or_had_annotation
description: An annotation on the entity.
slot_uri: oa:hasAnnotation
range: Annotation
multivalued: true

View file

@ -0,0 +1,5 @@
name: has_or_had_arrangement
description: The arrangement of the collection.
slot_uri: rico:hasArrangement
range: Arrangement
multivalued: true

View file

@ -0,0 +1,5 @@
name: has_or_had_document
description: A document associated with the entity.
slot_uri: foaf:isPrimaryTopicOf
range: ArticlesOfAssociation
multivalued: true

View file

@ -14,7 +14,7 @@ prefixes:
imports:
- linkml:types
- ../classes/XPath
- ../classes/ProvenancePath
default_prefix: hc
slots:
@ -38,7 +38,7 @@ slots:
Typically used within a Provenance class to link the provenance activity
to the specific document location from which data was extracted.
range: XPath
range: ProvenancePath
slot_uri: prov:atLocation
inlined: true
@ -65,4 +65,4 @@ slots:
comments:
- Created from slot_fixes.yaml migration (2026-01-14)
- Replaces direct xpath slot usage with structured path object
- Links Provenance class to XPath class
- Links Provenance class to ProvenancePath class

View file

@ -19,10 +19,11 @@ default_prefix: hc
imports:
- linkml:types
- ../classes/Rationale
slots:
has_or_had_rationale:
slot_uri: hc:hasOrHadRationale
slot_uri: prov:used
description: |
The rationale or justification for a decision or mapping.
@ -33,22 +34,17 @@ slots:
- Explanation notes
**Ontological Alignment**:
- **Primary** (`slot_uri`): `hc:hasOrHadRationale` - Heritage Custodian ObjectProperty
for class-valued Rationale range
- **Primary** (`slot_uri`): `prov:used` (per 2026-01-26 update)
- **Close**: `skos:note` - SKOS note (DatatypeProperty)
- **Close**: `prov:wasInfluencedBy` - PROV-O provenance
**Note**: slot_uri changed from skos:note to hc:hasOrHadRationale (2026-01-16)
to allow class-valued ranges when classes use Rationale class.
range: uriorcurie # Broadened per Rule 55 (2026-01-16) - Any allows both literals and class instances
implements:
- owl:ObjectProperty # Force OWL ObjectProperty to avoid ambiguous type warning (2026-01-16)
range: Rationale
multivalued: true
close_mappings:
- skos:note
- prov:wasInfluencedBy
examples:
- value: "Mapped to Q123456 based on exact name match and location verification"
- value:
has_or_had_label: "Mapped to Q123456 based on exact name match"
description: Wikidata mapping rationale

View file

@ -0,0 +1,5 @@
name: has_or_had_reason
description: The reason for an activity or state.
slot_uri: prov:used
range: Reason
multivalued: true

View file

@ -0,0 +1,5 @@
name: has_or_had_style
description: The style of the entity.
slot_uri: schema:genre
range: ArchitecturalStyle
multivalued: true

View file

@ -0,0 +1,5 @@
name: is_or_was_amended_through
description: The event through which the entity was amended.
slot_uri: prov:wasInfluencedBy
range: AmendmentEvent
multivalued: true

View file

@ -0,0 +1,5 @@
name: is_or_was_approved_on
description: The approval date.
slot_uri: schema:datePublished
range: TimeSpan
multivalued: false

View file

@ -0,0 +1,5 @@
name: is_or_was_archived_as
description: The archived version (memento) of the resource.
slot_uri: schema:archivedAt
range: Memento
multivalued: true

View file

@ -0,0 +1,5 @@
name: is_or_was_due_on
description: The deadline or due date.
slot_uri: schema:endDate
range: TimeSpan
multivalued: false

View file

@ -0,0 +1,5 @@
name: is_or_was_opened_on
description: The opening date of an application or event.
slot_uri: schema:startDate
range: TimeSpan
multivalued: false

View file

@ -0,0 +1,5 @@
name: is_or_was_used_in
description: The context in which something is used.
slot_uri: prov:wasUsedBy
range: GovernanceStructure
multivalued: true

View file

@ -27,10 +27,6 @@ fixes:
type: slot
- label: TimeSpan
type: class
processed:
status: true
date: '2026-01-26'
notes: Migrated to is_or_was_acquired_through + AcquisitionEvent. Slot archived.
- original_slot_id: https://nde.nl/ontology/hc/slot/has_acquisition_date
revision:
@ -48,368 +44,13 @@ fixes:
type: class
processed:
status: true
date: '2026-01-26'
notes: Migrated to temporal_extent + TimeSpan (end_of_the_end) in Loan.yaml. Slot archived.
- original_slot_id: https://nde.nl/ontology/hc/slot/has_actual_return_date
revision:
- label: temporal_extent
type: slot
- label: TimeSpan
type: class
- label: end_of_the_end
type: slot
- label: Timestamp
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_address
revision:
- label: has_or_had_address
type: slot
- label: Address
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_admin_office_description
revision:
- label: has_or_had_description
type: slot
- label: Description
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_admin_office_identifier
revision:
- label: has_or_had_identifier
type: slot
- label: Identifier
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_admin_office_name
revision:
- label: has_or_had_label
type: slot
- label: Label
type: class
processed:
status: true
date: '2026-01-26'
notes: Migrated to has_or_had_label + Label in CustodianAdministration.yaml. Slot archived.
- original_slot_id: https://nde.nl/ontology/hc/slot/has_administration_name
revision:
- label: has_or_had_label
type: slot
- label: Label
type: class
- orignal_slot_id: https://nde.nl/ontology/hc/slot/has_administrative_level
revision:
- label: is_or_was_part_of
type: slot
- label: GovernmentHierarchy
type: class
- label: has_or_had_tier
type: slot
- label: AdministrativeLevel
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_adoption_context
revision:
- label: describes_or_described
type: slot
- label: Policy
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_age
revision:
- label: has_or_had_age
type: slot
- label: Age
type: class
processed:
status: true
date: '2026-01-26'
notes: Migrated to has_or_had_age + Age in PersonObservation.yaml. Slot archived.
- original_slot_id: https://nde.nl/ontology/hc/slot/has_age
nda_description
revision:
- label: has_or_had_description
type: slot
- label: Description
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_agenda_document_url
revision:
- label: has_or_had_url
type: slot
- label: URL
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_agenda_short_name
revision:
- label: has_or_had_label
type: slot
- label: Label
type: class
- label: has_or_had_type
type: slot
- label: LabelType
type: class
- label: includes_or_included
type: slot
- label: LabelTypes
type: class
note: AbbreviationLabel class is defined in the LinkML file
- original_slot_id: https://nde.nl/ontology/hc/slot/has_agenda_title
revision:
- label: has_or_had_title
type: slot
- label: Title
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_agenda_url
revision:
- label: has_or_had_url
type: slot
- label: URL
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_agent_name
revision:
- label: has_or_had_label
type: slot
- label: Label
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_agent_type
revision:
- label: has_or_had_type
type: slot
- label: AgentType
type: class
- label: includes_or_included
type: slot
- label: AgentTypes
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_agreement_signed_date
date: '2026-01-27'
notes: Migrated to is_or_was_approved_on + TimeSpan. Slot archived.
- original_slot_id: https://nde.nl/ontology/hc/slot/has_approval_date
processed:
status: true
date: '2026-01-27'
notes: Fully migrated to is_or_was_based_on + Agreement class + is_or_was_signed_on slot (Rule 53). Loan.yaml updated. Slot archived.
revision:
- label: is_or_was_based_on
type: slot
- label: Agreement
type: class
- label: is_or_was_signed_on
type: slot
- label: TimeSpan
type: class
- label: start_of_the_start
type: slot
- label: Timestamp
type: class
- orignal_slot_id: https://nde.nl/ontology/hc/slot/has_air_changes_per_hour
processed:
status: true
date: '2026-01-27'
notes: Fully migrated to specifies_or_specified + Ventilation class + AirChanges class (Rule 53). StorageConditionPolicy.yaml updated. Slot archived.
revision:
- label: specifies_or_specified
type: slot
- label: Ventilation
type: class
- label: requires_or_required
type: slot
- label: AirChanges
type: class
- label: has_or_had_quantity
type: slot
- label: Quantity
type: class
- label: has_or_had_unit
type: slot
- label: Unit
type: class
value: air changes per hour
- original_slot_id: https://nde.nl/ontology/hc/slot/has_allocation_date
processed:
status: true
date: '2026-01-27'
notes: Fully migrated to is_or_was_allocated_through + AllocationEvent (Rule 53). CustodianIdentifier.yaml updated. Slot archived.
revision:
- label: is_or_was_allocated_through
type: slot
- label: AllocationEvent
type: class
- label: temporal_extent
type: slot
- label: TimeSpan
type: class
- label: temporal_extent
type: slot
- label: TimeSpan
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_alpha_2_code
processed:
status: true
date: '2026-01-27'
notes: Fully migrated to has_or_had_identifier + Alpha2Code class (Rule 53). Country.yaml updated. Slot archived.
revision:
- label: has_or_had_identifier
type: slot
- label: Alpha2Code
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_alpha_3_code
processed:
status: true
date: '2026-01-27'
notes: Fully migrated to has_or_had_identifier + Alpha3Code class (Rule 53). Country.yaml updated. Slot archived.
revision:
- label: has_or_had_identifier
type: slot
- label: Alpha3Code
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_alpha_3_code_dup
processed:
status: true
date: '2026-01-27'
notes: Duplicate entry processed.
revision:
- label: has_or_had_identifier
type: slot
- label: Alpha3Code
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_altitude
revision:
- label: has_or_had_altitude
type: slot
- label: Altitude
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_amendment_history
revision:
- label: is_or_was_amended_through
type: slot
- label: AmendmentEvent
type: class
- label: has_or_had_provenance
type: slot
- label: Provenance
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_annex_description
revision:
- label: has_or_had_description
type: slot
- label: Description
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_annex_name
revision:
- label: has_or_had_label
type: slot
- label: Label
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_annex_reason
revision:
- label: is_or_was_created_through
type: slot
- label: AnnexCreationEvent
type: class
- label: has_or_had_reason
type: slot
- label: Reason
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_annotation_by
revision:
- label: contains_or_contained
type: slot
- label: Annotation
type: class
- label: is_or_was_created_by
type: slot
- label: Agent
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_annotation_motivation
revision:
- label: has_or_had_rationale
type: slot
- label: Rationale
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_annotation_segment
revision:
- label: contains_or_contained
type: slot
- label: Segment
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_annotation_type
revision:
- label: has_or_had_type
type: slot
- label: AnnotationType
type: class
- label: includes_or_included
type: slot
- label: AnnotationTypes
type: class
- orignal_slot_id: https://nde.nl/ontology/hc/slot/has_api_version
revision:
- label: has_or_had_provenance
type: slot
- label: Provenance
type: class
- label: is_or_was_retrieved_through
type: slot
- label: APIRequest
type: class
- label: has_or_had_endpoint
type: slot
- label: APIEndpoint
type: class
- label: has_or_had_version
type: slot
- label: APIVersion
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_appellation_language
revision:
- label: has_or_had_language
type: slot
- label: Language
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_appellation_type
revision:
- label: has_or_had_type
type: slot
- label: AppellationType
type: class
- label: includes_or_included
type: slot
- label: AppellationTypes
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_appellation_value
revision:
- label: has_or_had_label
type: slot
- label: Label
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_applicable_country
revision:
- label: is_or_was_applicable_in
type: slot
- label: Country
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_application_deadline
revision:
- label: is_or_was_due_on
type: slot
- label: TimeSpan
type: class
- label: end_of_the_end
type: slot
- label: Timestamp
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_application_opening_date
revision:
- label: is_or_was_opened_on
type: slot
- label: TimeSpan
type: class
- label: start_of_the_start
type: slot
- label: Timestamp
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_appraisal_note
revision:
- label: has_or_had_note
type: slot
- label: Note
type: class
- orignal_slot_id: https://nde.nl/ontology/hc/slot/has_approval_date
notes: Fully migrated to is_or_was_approved_on + TimeSpan (Rule 53). Loan.yaml and Budget.yaml updated. Slot archived.
revision:
- label: is_or_was_approved_on
type: slot
@ -419,6 +60,10 @@ nda_description
type: slot
- label: Timestamp
type: class
- label: start_of_the_start
type: slot
- label: Timestamp
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_archdiocese_name
revision:
- label: is_or_was_part_of
@ -469,94 +114,31 @@ nda_description
type: slot
- label: URL
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_archive_name
revision:
- label: has_or_had_label
type: slot
- label: Label
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_archive_path
revision:
- label: has_or_had_provenance
type: slot
- label: Provenance
type: class
- label: has_or_had_provenance_path
type: slot
- label: ProvenancePath
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_archive_search_score
revision:
- label: has_or_had_score
type: slot
- label: SearchScore
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_arrangement
revision:
- label: has_or_had_arrangement
type: slot
- label: Arrangement
type: class
- label: has_or_had_type
type: slot
- label: ArrangementType
type: class
- label: includes_or_included
type: slot
- label: ArrangementTypes
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_arrangement_level
revision:
- label: has_or_had_arrangement
type: slot
- label: Arrangement
type: class
- label: has_or_had_type
type: slot
- label: ArrangementType
type: class
- label: includes_or_included
type: slot
- label: ArrangementTypes
type: class
- label: has_or_had_level
type: slot
- label: ArrangementLevel
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_arrangement_note
revision:
- label: has_or_had_arrangement
type: slot
- label: Arrangement
type: class
- label: has_or_had_note
type: slot
- label: Note
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_articles_archival_stage
revision:
- label: has_or_had_status
type: slot
- label: RecordCycleStatus
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/has_articles_document_format
revision:
- label: has_or_had_format
type: slot
- label: DocumentFormat
type: class
processed:
status: true
date: '2026-01-27'
notes: Migrated to has_or_had_url + URL in ArticlesOfAssociation.yaml. Slot archived.
- original_slot_id: https://nde.nl/ontology/hc/slot/has_articles_document_url
revision:
- label: has_or_had_url
type: slot
- label: URL
type: class
processed:
status: true
date: '2026-01-27'
notes: Migrated to has_or_had_url + URL in ArticlesOfAssociation.yaml. Slot archived.
- orignal_slot_id: https://nde.nl/ontology/hc/slot/has_articles_of_association
revision:
- label: has_or_had_document
type: slot
- label: ArticlesOfAssociation
type: class
processed:
status: true
date: '2026-01-27'
notes: Migrated to has_or_had_document + ArticlesOfAssociation in relevant classes. Slot archived.
- original_slot_id: https://nde.nl/ontology/hc/slot/has_aspect_ratio
revision:
- label: has_or_had_degree

View file

@ -43,39 +43,17 @@ def update_manifest(add_files, remove_files):
if __name__ == "__main__":
# Define files to add
add_files = [
{"name": "AccessApplication", "path": "modules/classes/AccessApplication.yaml", "category": "class"},
{"name": "AccessInterface", "path": "modules/classes/AccessInterface.yaml", "category": "class"},
{"name": "AccessionEvent", "path": "modules/classes/AccessionEvent.yaml", "category": "class"},
{"name": "Accumulation", "path": "modules/classes/Accumulation.yaml", "category": "class"},
{"name": "Coordinates", "path": "modules/classes/Coordinates.yaml", "category": "class"},
{"name": "AcquisitionEvent", "path": "modules/classes/AcquisitionEvent.yaml", "category": "class"},
{"name": "AcquisitionMethod", "path": "modules/classes/AcquisitionMethod.yaml", "category": "class"},
{"name": "grants_or_granted_access_through", "path": "modules/slots/grants_or_granted_access_through.yaml", "category": "slot"},
{"name": "has_or_had_interface", "path": "modules/slots/has_or_had_interface.yaml", "category": "slot"},
{"name": "is_or_was_accessioned_through", "path": "modules/slots/is_or_was_accessioned_through.yaml", "category": "slot"},
{"name": "has_or_had_accumulation", "path": "modules/slots/has_or_had_accumulation.yaml", "category": "slot"},
{"name": "has_or_had_coordinates", "path": "modules/slots/has_or_had_coordinates.yaml", "category": "slot"},
{"name": "is_or_was_acquired_through", "path": "modules/slots/is_or_was_acquired_through.yaml", "category": "slot"},
{"name": "was_acquired_through", "path": "modules/slots/was_acquired_through.yaml", "category": "slot"},
{"name": "has_or_had_method", "path": "modules/slots/has_or_had_method.yaml", "category": "slot"},
{"name": "RecordCycleStatus", "path": "modules/classes/RecordCycleStatus.yaml", "category": "class"},
{"name": "DocumentFormat", "path": "modules/classes/DocumentFormat.yaml", "category": "class"},
{"name": "has_or_had_document", "path": "modules/slots/has_or_had_document.yaml", "category": "slot"},
]
# Define files to remove (archived slots)
remove_files = [
"has_access_application_url",
"has_access_interface_url",
"has_accession_date",
"has_accession_number",
"has_accumulation_end_date",
"has_accumulation_start_date",
"has_accuracy_in_meters",
"has_acquisition_date",
"has_acquisition_history",
"has_acquisition_method",
"has_acquisition_source",
"has_activity_description",
"has_activity_identifier",
"has_activity_name"
"has_articles_archival_stage",
"has_articles_document_format",
"has_articles_document_url",
"has_articles_of_association"
]
update_manifest(add_files, remove_files)