kempersc 242bc8bb35 Add new slots for heritage custodian entities

- Created deliverables_slot for expected or achieved deliverable outputs.
- Introduced event_id_slot for persistent unique event identifiers.
- Added follow_up_date_slot for scheduled follow-up action dates.
- Implemented object_ref_slot for references to heritage objects.
- Established price_slot for price information across entities.
- Added price_currency_slot for currency codes in price information.
- Created protocol_slot for API protocol specifications.
- Introduced provenance_text_slot for full provenance entry text.
- Added record_type_slot for classification of record types.
- Implemented response_formats_slot for supported API response formats.
- Established status_slot for current status of entities or activities.
- Added FactualCountDisplay component for displaying count query results.
- Introduced ReplyTypeIndicator component for visualizing reply types.
- Created approval_date_slot for formal approval dates.
- Added authentication_required_slot for API authentication status.
- Implemented capacity_items_slot for maximum storage capacity.
- Established conservation_lab_slot for conservation laboratory information.
- Added cost_usd_slot for API operation costs in USD.

2026-01-05 00:49:05 +01:00

11 KiB

Raw Blame History

Specificity Score System - External Dependencies

Overview

This document lists the external dependencies required for the specificity score system. Dependencies are categorized by purpose and include both required and optional packages.

Required Dependencies

Core Python Packages

These packages are essential for the specificity score system to function:

Package	Version	Purpose	PyPI
`pydantic`	>=2.0	Score model validation and structured output	pydantic
`pyyaml`	>=6.0	LinkML schema parsing, template definitions	PyYAML
`dspy-ai`	>=2.6	Template classification, RAG integration	dspy-ai
`linkml`	>=1.6	Schema validation, annotations access	linkml

Already in Project

These packages are already in pyproject.toml and will be available:

# From pyproject.toml
dependencies = [
    "pydantic>=2.0",
    "pyyaml>=6.0",
    "dspy-ai>=2.6",
    "linkml>=1.6",
]

Optional Dependencies

Schema Processing (Recommended)

For batch processing of LinkML schema annotations:

Package	Version	Purpose	PyPI
`linkml-runtime`	>=1.6	Runtime schema loading and traversal	linkml-runtime
`linkml-validator`	>=0.5	Validate annotated schemas	linkml-validator

Usage Example:

from linkml_runtime import SchemaView

# Load schema and access annotations
schema = SchemaView("schemas/20251121/linkml/01_custodian_name.yaml")

# Get specificity score for a class
archive_class = schema.get_class("Archive")
specificity = archive_class.annotations.get("specificity_score")
rationale = archive_class.annotations.get("specificity_rationale")

print(f"Archive specificity: {specificity.value}")
# Output: Archive specificity: 0.75

Installation:

pip install linkml-runtime linkml-validator

Caching (Recommended)

For caching computed scores during RAG retrieval:

Package	Version	Purpose	PyPI
`cachetools`	>=5.0	In-memory LRU cache for scores	cachetools
`diskcache`	>=5.6	Persistent disk cache for large deployments	diskcache

Usage Example:

from cachetools import TTLCache
from functools import wraps

# Cache with 1-hour TTL, max 1000 entries
_score_cache = TTLCache(maxsize=1000, ttl=3600)

def cached_template_score(class_name: str, template_id: str) -> float:
    """Get template-specific score with caching."""
    cache_key = f"{template_id}:{class_name}"
    
    if cache_key in _score_cache:
        return _score_cache[cache_key]
    
    score = compute_template_score(class_name, template_id)
    _score_cache[cache_key] = score
    return score

Installation:

pip install cachetools diskcache

UML Visualization (Optional)

For generating filtered UML diagrams based on specificity scores:

Package	Version	Purpose	PyPI
`graphviz`	>=0.20	DOT graph generation for UML	graphviz
`pydot`	>=1.4	DOT file parsing and manipulation	pydot
`plantuml`	>=0.3	PlantUML diagram generation	plantuml

Usage Example:

from graphviz import Digraph

def create_filtered_uml(
    schema: SchemaView,
    template_id: str,
    threshold: float = 0.5
) -> Digraph:
    """Generate UML with classes filtered by specificity threshold."""
    dot = Digraph(comment=f"Heritage Ontology - {template_id}")
    dot.attr(rankdir="TB", splines="ortho")
    
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        
        if score >= threshold:
            # Add node with opacity based on score
            opacity = int(score * 255)
            color = f"#4A90D9{opacity:02X}"
            dot.node(class_name, fillcolor=color, style="filled")
    
    return dot

System Dependency:

# macOS
brew install graphviz

# Ubuntu/Debian
sudo apt-get install graphviz

# Windows
choco install graphviz

Installation:

pip install graphviz pydot plantuml

Monitoring & Observability (Optional)

For production monitoring of score calculations:

Package	Version	Purpose	PyPI
`prometheus-client`	>=0.17	Metrics collection for score usage	prometheus-client
`structlog`	>=23.0	Structured logging for score decisions	structlog

Usage Example:

from prometheus_client import Counter, Histogram

# Track template classification distribution
TEMPLATE_COUNTER = Counter(
    "specificity_template_classifications_total",
    "Number of questions classified per template",
    ["template_id"]
)

# Track score computation latency
SCORE_LATENCY = Histogram(
    "specificity_score_computation_seconds",
    "Time to compute specificity scores",
    ["score_type"]  # "general" or "template"
)

def classify_with_metrics(question: str) -> str:
    """Classify question and record metrics."""
    with SCORE_LATENCY.labels(score_type="template").time():
        template_id = classify_template(question)
    
    TEMPLATE_COUNTER.labels(template_id=template_id).inc()
    return template_id

Installation:

pip install prometheus-client structlog

External Services

Required Services

Service	Endpoint	Purpose
None	-	Specificity scoring is self-contained

The specificity score system is fully self-contained and does not require external services. All scores are computed from:

Static annotations in LinkML schema files
In-memory template definitions
DSPy classification (optional LLM backend)

Optional Services

Service	Endpoint	Purpose
Qdrant Vector DB	`http://localhost:6333`	RAG integration for score-weighted retrieval
Oxigraph SPARQL	`http://localhost:7878/query`	Schema metadata queries
LLM API (OpenAI, Z.AI)	Varies	DSPy template classification

Project Files Required

Existing Files

These files must exist for the specificity score system to function:

File	Purpose	Status
`schemas/20251121/linkml/01_custodian_name.yaml`	Main schema with annotations	✅ Exists
`schemas/20251121/linkml/modules/classes/*.yaml`	304 class YAML files to annotate	✅ Exists
`backend/rag/dspy_heritage_rag.py`	RAG integration point	✅ Exists
`docs/plan/specificity_score/04-prompt-conversation-templates.md`	Template definitions	✅ Exists

New Files to Create

File	Purpose	Status
`backend/rag/specificity_scorer.py`	Score calculation engine	❌ To create
`backend/rag/template_classifier.py`	DSPy template classifier	❌ To create
`backend/rag/specificity_aware_retriever.py`	Score-weighted retrieval	❌ To create
`data/validation/specificity_scores.json`	Cached general scores	❌ To create
`tests/rag/test_specificity_scorer.py`	Unit tests	❌ To create
`scripts/annotate_specificity_scores.py`	Batch annotation script	❌ To create

pyproject.toml Updates

Add optional dependencies for specificity scoring:

[project.optional-dependencies]
# Core specificity scoring
specificity = [
    "linkml-runtime>=1.6",
    "cachetools>=5.0",
]

# Full specificity system with visualization
specificity-full = [
    "linkml-runtime>=1.6",
    "linkml-validator>=0.5",
    "cachetools>=5.0",
    "diskcache>=5.6",
    "graphviz>=0.20",
    "pydot>=1.4",
]

# Specificity with monitoring
specificity-monitored = [
    "linkml-runtime>=1.6",
    "cachetools>=5.0",
    "prometheus-client>=0.17",
    "structlog>=23.0",
]

Installation:

# Minimal specificity support
pip install -e ".[specificity]"

# Full specificity support with visualization
pip install -e ".[specificity-full]"

# Specificity with production monitoring
pip install -e ".[specificity-monitored]"

Environment Variables

Variable	Default	Purpose
`SPECIFICITY_CACHE_TTL`	`3600`	Cache TTL in seconds
`SPECIFICITY_DEFAULT_THRESHOLD`	`0.5`	Default filtering threshold
`SPECIFICITY_TEMPLATE_FALLBACK`	`general_heritage`	Fallback template ID
`SPECIFICITY_ENABLE_METRICS`	`false`	Enable Prometheus metrics
`ZAI_API_TOKEN`	(required for DSPy)	Z.AI API token for classification

Version Compatibility Matrix

Python	LinkML	DSPy	Pydantic	Status
3.11+	1.6+	2.6+	2.0+	✅ Supported
3.10	1.6+	2.6+	2.0+	✅ Supported
3.9	1.5+	2.5+	2.0+	⚠️ Limited
<3.9	-	-	-	❌ Not supported

Docker Considerations

If deploying in Docker, ensure these are in the Dockerfile:

# System dependencies for graphviz (if using UML visualization)
RUN apt-get update && apt-get install -y graphviz && rm -rf /var/lib/apt/lists/*

# Python dependencies
RUN pip install --no-cache-dir \
    pydantic>=2.0 \
    pyyaml>=6.0 \
    dspy-ai>=2.6 \
    linkml>=1.6 \
    linkml-runtime>=1.6 \
    cachetools>=5.0

# Optional: graphviz Python bindings
# RUN pip install graphviz>=0.20 pydot>=1.4

Dependency Security

All recommended packages are actively maintained and have no known critical CVEs as of 2025-01.

Package	Last Updated	Security Status
pydantic	2024-12	✅ No known CVEs
linkml	2024-12	✅ No known CVEs
linkml-runtime	2024-12	✅ No known CVEs
dspy-ai	2025-01	✅ No known CVEs
cachetools	2024-11	✅ No known CVEs

Run security audit:

pip-audit --requirement requirements.txt

Dependency Graph

specificity_scorer.py
├── linkml-runtime (schema loading)
│   └── pyyaml
├── pydantic (data models)
├── cachetools (performance)
└── dspy-ai (classification)
    └── httpx (LLM API calls)

specificity_aware_retriever.py
├── specificity_scorer.py
├── qdrant-client (vector store)
└── numpy (score calculations)

uml_visualizer.py (optional)
├── graphviz
├── pydot
└── specificity_scorer.py

Summary

Minimum viable installation:

pip install pydantic pyyaml linkml linkml-runtime

Recommended installation:

pip install pydantic pyyaml linkml linkml-runtime cachetools dspy-ai

Full installation (with visualization and monitoring):

pip install pydantic pyyaml linkml linkml-runtime linkml-validator cachetools diskcache dspy-ai graphviz pydot prometheus-client structlog

References

docs/plan/prompt-query_template_mapping/external-dependencies.md - Related dependencies
docs/plan/specificity_score/03-rag-dspy-integration.md - DSPy integration details
pyproject.toml - Current project dependencies

11 KiB Raw Blame History

Specificity Score System - External Dependencies

Overview

Required Dependencies

Core Python Packages

Already in Project

Optional Dependencies

Schema Processing (Recommended)

Caching (Recommended)

UML Visualization (Optional)

Monitoring & Observability (Optional)

External Services

Required Services

Optional Services

Project Files Required

Existing Files

New Files to Create

pyproject.toml Updates

Environment Variables

Version Compatibility Matrix

Docker Considerations

Dependency Security

Dependency Graph

Summary

References

11 KiB

Raw Blame History