- Created deliverables_slot for expected or achieved deliverable outputs. - Introduced event_id_slot for persistent unique event identifiers. - Added follow_up_date_slot for scheduled follow-up action dates. - Implemented object_ref_slot for references to heritage objects. - Established price_slot for price information across entities. - Added price_currency_slot for currency codes in price information. - Created protocol_slot for API protocol specifications. - Introduced provenance_text_slot for full provenance entry text. - Added record_type_slot for classification of record types. - Implemented response_formats_slot for supported API response formats. - Established status_slot for current status of entities or activities. - Added FactualCountDisplay component for displaying count query results. - Introduced ReplyTypeIndicator component for visualizing reply types. - Created approval_date_slot for formal approval dates. - Added authentication_required_slot for API authentication status. - Implemented capacity_items_slot for maximum storage capacity. - Established conservation_lab_slot for conservation laboratory information. - Added cost_usd_slot for API operation costs in USD.
11 KiB
Specificity Score System - External Dependencies
Overview
This document lists the external dependencies required for the specificity score system. Dependencies are categorized by purpose and include both required and optional packages.
Required Dependencies
Core Python Packages
These packages are essential for the specificity score system to function:
| Package | Version | Purpose | PyPI |
|---|---|---|---|
pydantic |
>=2.0 | Score model validation and structured output | pydantic |
pyyaml |
>=6.0 | LinkML schema parsing, template definitions | PyYAML |
dspy-ai |
>=2.6 | Template classification, RAG integration | dspy-ai |
linkml |
>=1.6 | Schema validation, annotations access | linkml |
Already in Project
These packages are already in pyproject.toml and will be available:
# From pyproject.toml
dependencies = [
"pydantic>=2.0",
"pyyaml>=6.0",
"dspy-ai>=2.6",
"linkml>=1.6",
]
Optional Dependencies
Schema Processing (Recommended)
For batch processing of LinkML schema annotations:
| Package | Version | Purpose | PyPI |
|---|---|---|---|
linkml-runtime |
>=1.6 | Runtime schema loading and traversal | linkml-runtime |
linkml-validator |
>=0.5 | Validate annotated schemas | linkml-validator |
Usage Example:
from linkml_runtime import SchemaView
# Load schema and access annotations
schema = SchemaView("schemas/20251121/linkml/01_custodian_name.yaml")
# Get specificity score for a class
archive_class = schema.get_class("Archive")
specificity = archive_class.annotations.get("specificity_score")
rationale = archive_class.annotations.get("specificity_rationale")
print(f"Archive specificity: {specificity.value}")
# Output: Archive specificity: 0.75
Installation:
pip install linkml-runtime linkml-validator
Caching (Recommended)
For caching computed scores during RAG retrieval:
| Package | Version | Purpose | PyPI |
|---|---|---|---|
cachetools |
>=5.0 | In-memory LRU cache for scores | cachetools |
diskcache |
>=5.6 | Persistent disk cache for large deployments | diskcache |
Usage Example:
from cachetools import TTLCache
from functools import wraps
# Cache with 1-hour TTL, max 1000 entries
_score_cache = TTLCache(maxsize=1000, ttl=3600)
def cached_template_score(class_name: str, template_id: str) -> float:
"""Get template-specific score with caching."""
cache_key = f"{template_id}:{class_name}"
if cache_key in _score_cache:
return _score_cache[cache_key]
score = compute_template_score(class_name, template_id)
_score_cache[cache_key] = score
return score
Installation:
pip install cachetools diskcache
UML Visualization (Optional)
For generating filtered UML diagrams based on specificity scores:
| Package | Version | Purpose | PyPI |
|---|---|---|---|
graphviz |
>=0.20 | DOT graph generation for UML | graphviz |
pydot |
>=1.4 | DOT file parsing and manipulation | pydot |
plantuml |
>=0.3 | PlantUML diagram generation | plantuml |
Usage Example:
from graphviz import Digraph
def create_filtered_uml(
schema: SchemaView,
template_id: str,
threshold: float = 0.5
) -> Digraph:
"""Generate UML with classes filtered by specificity threshold."""
dot = Digraph(comment=f"Heritage Ontology - {template_id}")
dot.attr(rankdir="TB", splines="ortho")
for class_name in schema.all_classes():
cls = schema.get_class(class_name)
score = get_template_score(cls, template_id)
if score >= threshold:
# Add node with opacity based on score
opacity = int(score * 255)
color = f"#4A90D9{opacity:02X}"
dot.node(class_name, fillcolor=color, style="filled")
return dot
System Dependency:
# macOS
brew install graphviz
# Ubuntu/Debian
sudo apt-get install graphviz
# Windows
choco install graphviz
Installation:
pip install graphviz pydot plantuml
Monitoring & Observability (Optional)
For production monitoring of score calculations:
| Package | Version | Purpose | PyPI |
|---|---|---|---|
prometheus-client |
>=0.17 | Metrics collection for score usage | prometheus-client |
structlog |
>=23.0 | Structured logging for score decisions | structlog |
Usage Example:
from prometheus_client import Counter, Histogram
# Track template classification distribution
TEMPLATE_COUNTER = Counter(
"specificity_template_classifications_total",
"Number of questions classified per template",
["template_id"]
)
# Track score computation latency
SCORE_LATENCY = Histogram(
"specificity_score_computation_seconds",
"Time to compute specificity scores",
["score_type"] # "general" or "template"
)
def classify_with_metrics(question: str) -> str:
"""Classify question and record metrics."""
with SCORE_LATENCY.labels(score_type="template").time():
template_id = classify_template(question)
TEMPLATE_COUNTER.labels(template_id=template_id).inc()
return template_id
Installation:
pip install prometheus-client structlog
External Services
Required Services
| Service | Endpoint | Purpose |
|---|---|---|
| None | - | Specificity scoring is self-contained |
The specificity score system is fully self-contained and does not require external services. All scores are computed from:
- Static annotations in LinkML schema files
- In-memory template definitions
- DSPy classification (optional LLM backend)
Optional Services
| Service | Endpoint | Purpose |
|---|---|---|
| Qdrant Vector DB | http://localhost:6333 |
RAG integration for score-weighted retrieval |
| Oxigraph SPARQL | http://localhost:7878/query |
Schema metadata queries |
| LLM API (OpenAI, Z.AI) | Varies | DSPy template classification |
Project Files Required
Existing Files
These files must exist for the specificity score system to function:
| File | Purpose | Status |
|---|---|---|
schemas/20251121/linkml/01_custodian_name.yaml |
Main schema with annotations | ✅ Exists |
schemas/20251121/linkml/modules/classes/*.yaml |
304 class YAML files to annotate | ✅ Exists |
backend/rag/dspy_heritage_rag.py |
RAG integration point | ✅ Exists |
docs/plan/specificity_score/04-prompt-conversation-templates.md |
Template definitions | ✅ Exists |
New Files to Create
| File | Purpose | Status |
|---|---|---|
backend/rag/specificity_scorer.py |
Score calculation engine | ❌ To create |
backend/rag/template_classifier.py |
DSPy template classifier | ❌ To create |
backend/rag/specificity_aware_retriever.py |
Score-weighted retrieval | ❌ To create |
data/validation/specificity_scores.json |
Cached general scores | ❌ To create |
tests/rag/test_specificity_scorer.py |
Unit tests | ❌ To create |
scripts/annotate_specificity_scores.py |
Batch annotation script | ❌ To create |
pyproject.toml Updates
Add optional dependencies for specificity scoring:
[project.optional-dependencies]
# Core specificity scoring
specificity = [
"linkml-runtime>=1.6",
"cachetools>=5.0",
]
# Full specificity system with visualization
specificity-full = [
"linkml-runtime>=1.6",
"linkml-validator>=0.5",
"cachetools>=5.0",
"diskcache>=5.6",
"graphviz>=0.20",
"pydot>=1.4",
]
# Specificity with monitoring
specificity-monitored = [
"linkml-runtime>=1.6",
"cachetools>=5.0",
"prometheus-client>=0.17",
"structlog>=23.0",
]
Installation:
# Minimal specificity support
pip install -e ".[specificity]"
# Full specificity support with visualization
pip install -e ".[specificity-full]"
# Specificity with production monitoring
pip install -e ".[specificity-monitored]"
Environment Variables
| Variable | Default | Purpose |
|---|---|---|
SPECIFICITY_CACHE_TTL |
3600 |
Cache TTL in seconds |
SPECIFICITY_DEFAULT_THRESHOLD |
0.5 |
Default filtering threshold |
SPECIFICITY_TEMPLATE_FALLBACK |
general_heritage |
Fallback template ID |
SPECIFICITY_ENABLE_METRICS |
false |
Enable Prometheus metrics |
ZAI_API_TOKEN |
(required for DSPy) | Z.AI API token for classification |
Version Compatibility Matrix
| Python | LinkML | DSPy | Pydantic | Status |
|---|---|---|---|---|
| 3.11+ | 1.6+ | 2.6+ | 2.0+ | ✅ Supported |
| 3.10 | 1.6+ | 2.6+ | 2.0+ | ✅ Supported |
| 3.9 | 1.5+ | 2.5+ | 2.0+ | ⚠️ Limited |
| <3.9 | - | - | - | ❌ Not supported |
Docker Considerations
If deploying in Docker, ensure these are in the Dockerfile:
# System dependencies for graphviz (if using UML visualization)
RUN apt-get update && apt-get install -y graphviz && rm -rf /var/lib/apt/lists/*
# Python dependencies
RUN pip install --no-cache-dir \
pydantic>=2.0 \
pyyaml>=6.0 \
dspy-ai>=2.6 \
linkml>=1.6 \
linkml-runtime>=1.6 \
cachetools>=5.0
# Optional: graphviz Python bindings
# RUN pip install graphviz>=0.20 pydot>=1.4
Dependency Security
All recommended packages are actively maintained and have no known critical CVEs as of 2025-01.
| Package | Last Updated | Security Status |
|---|---|---|
| pydantic | 2024-12 | ✅ No known CVEs |
| linkml | 2024-12 | ✅ No known CVEs |
| linkml-runtime | 2024-12 | ✅ No known CVEs |
| dspy-ai | 2025-01 | ✅ No known CVEs |
| cachetools | 2024-11 | ✅ No known CVEs |
Run security audit:
pip-audit --requirement requirements.txt
Dependency Graph
specificity_scorer.py
├── linkml-runtime (schema loading)
│ └── pyyaml
├── pydantic (data models)
├── cachetools (performance)
└── dspy-ai (classification)
└── httpx (LLM API calls)
specificity_aware_retriever.py
├── specificity_scorer.py
├── qdrant-client (vector store)
└── numpy (score calculations)
uml_visualizer.py (optional)
├── graphviz
├── pydot
└── specificity_scorer.py
Summary
Minimum viable installation:
pip install pydantic pyyaml linkml linkml-runtime
Recommended installation:
pip install pydantic pyyaml linkml linkml-runtime cachetools dspy-ai
Full installation (with visualization and monitoring):
pip install pydantic pyyaml linkml linkml-runtime linkml-validator cachetools diskcache dspy-ai graphviz pydot prometheus-client structlog
References
docs/plan/prompt-query_template_mapping/external-dependencies.md- Related dependenciesdocs/plan/specificity_score/03-rag-dspy-integration.md- DSPy integration detailspyproject.toml- Current project dependencies