# Specificity Score System - External Dependencies ## Overview This document lists the external dependencies required for the specificity score system. Dependencies are categorized by purpose and include both required and optional packages. > **INTEGRATION NOTE**: This document has been updated to reflect the **existing infrastructure** in the codebase. Several components listed as "to create" already exist and should be **extended** rather than recreated. --- ## Required Dependencies ### Core Python Packages These packages are essential for the specificity score system to function: | Package | Version | Purpose | PyPI | |---------|---------|---------|------| | `pydantic` | >=2.0 | Score model validation and structured output | [pydantic](https://pypi.org/project/pydantic/) | | `pyyaml` | >=6.0 | LinkML schema parsing, template definitions | [PyYAML](https://pypi.org/project/PyYAML/) | | `dspy-ai` | >=2.6 | Template classification, RAG integration | [dspy-ai](https://pypi.org/project/dspy-ai/) | | `linkml` | >=1.6 | Schema validation, annotations access | [linkml](https://pypi.org/project/linkml/) | ### Already in Project These packages are already in `pyproject.toml` and will be available: ```toml # From pyproject.toml dependencies = [ "pydantic>=2.0", "pyyaml>=6.0", "dspy-ai>=2.6", "linkml>=1.6", ] ``` --- ## Optional Dependencies ### Schema Processing (Recommended) For batch processing of LinkML schema annotations: | Package | Version | Purpose | PyPI | |---------|---------|---------|------| | `linkml-runtime` | >=1.6 | Runtime schema loading and traversal | [linkml-runtime](https://pypi.org/project/linkml-runtime/) | | `linkml-validator` | >=0.5 | Validate annotated schemas | [linkml-validator](https://pypi.org/project/linkml-validator/) | **Usage Example:** ```python from linkml_runtime import SchemaView # Load schema and access annotations schema = SchemaView("schemas/20251121/linkml/01_custodian_name.yaml") # Get specificity score for a class archive_class = schema.get_class("Archive") specificity = archive_class.annotations.get("specificity_score") rationale = archive_class.annotations.get("specificity_rationale") print(f"Archive specificity: {specificity.value}") # Output: Archive specificity: 0.75 ``` **Installation:** ```bash pip install linkml-runtime linkml-validator ``` --- ### Caching (Recommended) For caching computed scores during RAG retrieval: | Package | Version | Purpose | PyPI | |---------|---------|---------|------| | `cachetools` | >=5.0 | In-memory LRU cache for scores | [cachetools](https://pypi.org/project/cachetools/) | | `diskcache` | >=5.6 | Persistent disk cache for large deployments | [diskcache](https://pypi.org/project/diskcache/) | **Usage Example:** ```python from cachetools import TTLCache from functools import wraps # Cache with 1-hour TTL, max 1000 entries _score_cache = TTLCache(maxsize=1000, ttl=3600) def cached_template_score(class_name: str, template_id: str) -> float: """Get template-specific score with caching.""" cache_key = f"{template_id}:{class_name}" if cache_key in _score_cache: return _score_cache[cache_key] score = compute_template_score(class_name, template_id) _score_cache[cache_key] = score return score ``` **Installation:** ```bash pip install cachetools diskcache ``` --- ### UML Visualization (Optional) For generating filtered UML diagrams based on specificity scores: | Package | Version | Purpose | PyPI | |---------|---------|---------|------| | `graphviz` | >=0.20 | DOT graph generation for UML | [graphviz](https://pypi.org/project/graphviz/) | | `pydot` | >=1.4 | DOT file parsing and manipulation | [pydot](https://pypi.org/project/pydot/) | | `plantuml` | >=0.3 | PlantUML diagram generation | [plantuml](https://pypi.org/project/plantuml/) | **Usage Example:** ```python from graphviz import Digraph def create_filtered_uml( schema: SchemaView, template_id: str, threshold: float = 0.5 ) -> Digraph: """Generate UML with classes filtered by specificity threshold.""" dot = Digraph(comment=f"Heritage Ontology - {template_id}") dot.attr(rankdir="TB", splines="ortho") for class_name in schema.all_classes(): cls = schema.get_class(class_name) score = get_template_score(cls, template_id) if score >= threshold: # Add node with opacity based on score opacity = int(score * 255) color = f"#4A90D9{opacity:02X}" dot.node(class_name, fillcolor=color, style="filled") return dot ``` **System Dependency:** ```bash # macOS brew install graphviz # Ubuntu/Debian sudo apt-get install graphviz # Windows choco install graphviz ``` **Installation:** ```bash pip install graphviz pydot plantuml ``` --- ### Monitoring & Observability (Optional) For production monitoring of score calculations: | Package | Version | Purpose | PyPI | |---------|---------|---------|------| | `prometheus-client` | >=0.17 | Metrics collection for score usage | [prometheus-client](https://pypi.org/project/prometheus-client/) | | `structlog` | >=23.0 | Structured logging for score decisions | [structlog](https://pypi.org/project/structlog/) | **Usage Example:** ```python from prometheus_client import Counter, Histogram # Track template classification distribution TEMPLATE_COUNTER = Counter( "specificity_template_classifications_total", "Number of questions classified per template", ["template_id"] ) # Track score computation latency SCORE_LATENCY = Histogram( "specificity_score_computation_seconds", "Time to compute specificity scores", ["score_type"] # "general" or "template" ) def classify_with_metrics(question: str) -> str: """Classify question and record metrics.""" with SCORE_LATENCY.labels(score_type="template").time(): template_id = classify_template(question) TEMPLATE_COUNTER.labels(template_id=template_id).inc() return template_id ``` **Installation:** ```bash pip install prometheus-client structlog ``` --- ## External Services ### Required Services | Service | Endpoint | Purpose | |---------|----------|---------| | None | - | Specificity scoring is self-contained | The specificity score system is **fully self-contained** and does not require external services. All scores are computed from: 1. Static annotations in LinkML schema files 2. In-memory template definitions 3. DSPy classification (optional LLM backend) ### Optional Services | Service | Endpoint | Purpose | |---------|----------|---------| | Qdrant Vector DB | `http://localhost:6333` | RAG integration for score-weighted retrieval | | Oxigraph SPARQL | `http://localhost:7878/query` | Schema metadata queries | | LLM API (OpenAI, Z.AI) | Varies | DSPy template classification | --- ## Project Files Required ### Existing Files (DO NOT RECREATE) These files **already exist** and provide the foundation for specificity scoring: | File | Purpose | Status | |------|---------|--------| | `backend/rag/template_sparql.py` | **TemplateClassifier** (line 1104), **SlotExtractor**, **ConversationContextResolver** | ✅ Exists - EXTEND | | `backend/rag/template_sparql.py:634` | **TemplateClassifierSignature** (DSPy Signature) | ✅ Exists - EXTEND | | `data/sparql_templates.yaml` | SPARQL template definitions (11+ templates) | ✅ Exists - EXTEND | | `schemas/20251121/linkml/01_custodian_name.yaml` | Main schema with annotations | ✅ Exists | | `schemas/20251121/linkml/modules/classes/*.yaml` | 304 class YAML files to annotate | ✅ Exists | | `backend/rag/dspy_heritage_rag.py` | RAG integration point | ✅ Exists | | `docs/plan/specificity_score/04-prompt-conversation-templates.md` | Template definitions | ✅ Exists | ### New Files to Create | File | Purpose | Status | |------|---------|--------| | `backend/rag/specificity_scorer.py` | Score calculation engine | ❌ To create | | `backend/rag/sparql_to_context_mapper.py` | Maps SPARQL templates → Context templates | ❌ To create | | `backend/rag/specificity_lookup.py` | Reads scores from LinkML annotations | ❌ To create | | `backend/rag/specificity_aware_retriever.py` | Score-weighted retrieval | ❌ To create | | `data/validation/specificity_scores.json` | Cached general scores | ❌ To create | | `tests/rag/test_specificity_scorer.py` | Unit tests | ❌ To create | | `scripts/annotate_specificity_scores.py` | Batch annotation script | ❌ To create | ### Key Integration Points The existing `TemplateClassifier` in `backend/rag/template_sparql.py:1104` already: - Classifies questions to SPARQL template IDs - Extracts slots (institution_type, location, etc.) - Uses DSPy for classification **New code should WRAP this classifier**, not replace it: ```python # backend/rag/specificity_aware_classifier.py from backend.rag.template_sparql import TemplateClassifier class SpecificityAwareClassifier: """Wraps existing TemplateClassifier with specificity score lookup.""" def __init__(self, base_classifier: TemplateClassifier, specificity_lookup): self.base_classifier = base_classifier self.specificity_lookup = specificity_lookup def classify_with_scores(self, question: str) -> ClassificationWithScores: # Use existing classifier result = self.base_classifier.classify(question) # Map SPARQL template → context template context_template = self._map_to_context_template( result.template_id, result.slots ) # Look up specificity scores for context template scores = self.specificity_lookup.get_scores(context_template) return ClassificationWithScores( sparql_template=result.template_id, context_template=context_template, slots=result.slots, class_scores=scores ) ``` --- ## pyproject.toml Updates Add optional dependencies for specificity scoring: ```toml [project.optional-dependencies] # Core specificity scoring specificity = [ "linkml-runtime>=1.6", "cachetools>=5.0", ] # Full specificity system with visualization specificity-full = [ "linkml-runtime>=1.6", "linkml-validator>=0.5", "cachetools>=5.0", "diskcache>=5.6", "graphviz>=0.20", "pydot>=1.4", ] # Specificity with monitoring specificity-monitored = [ "linkml-runtime>=1.6", "cachetools>=5.0", "prometheus-client>=0.17", "structlog>=23.0", ] ``` **Installation:** ```bash # Minimal specificity support pip install -e ".[specificity]" # Full specificity support with visualization pip install -e ".[specificity-full]" # Specificity with production monitoring pip install -e ".[specificity-monitored]" ``` --- ## Environment Variables | Variable | Default | Purpose | |----------|---------|---------| | `SPECIFICITY_CACHE_TTL` | `3600` | Cache TTL in seconds | | `SPECIFICITY_DEFAULT_THRESHOLD` | `0.5` | Default filtering threshold | | `SPECIFICITY_TEMPLATE_FALLBACK` | `general_heritage` | Fallback template ID | | `SPECIFICITY_ENABLE_METRICS` | `false` | Enable Prometheus metrics | | `ZAI_API_TOKEN` | (required for DSPy) | Z.AI API token for classification | --- ## Version Compatibility Matrix | Python | LinkML | DSPy | Pydantic | Status | |--------|--------|------|----------|--------| | 3.11+ | 1.6+ | 2.6+ | 2.0+ | ✅ Supported | | 3.10 | 1.6+ | 2.6+ | 2.0+ | ✅ Supported | | 3.9 | 1.5+ | 2.5+ | 2.0+ | ⚠️ Limited | | <3.9 | - | - | - | ❌ Not supported | --- ## Docker Considerations If deploying in Docker, ensure these are in the Dockerfile: ```dockerfile # System dependencies for graphviz (if using UML visualization) RUN apt-get update && apt-get install -y graphviz && rm -rf /var/lib/apt/lists/* # Python dependencies RUN pip install --no-cache-dir \ pydantic>=2.0 \ pyyaml>=6.0 \ dspy-ai>=2.6 \ linkml>=1.6 \ linkml-runtime>=1.6 \ cachetools>=5.0 # Optional: graphviz Python bindings # RUN pip install graphviz>=0.20 pydot>=1.4 ``` --- ## Dependency Security All recommended packages are actively maintained and have no known critical CVEs as of 2025-01. | Package | Last Updated | Security Status | |---------|--------------|-----------------| | pydantic | 2024-12 | ✅ No known CVEs | | linkml | 2024-12 | ✅ No known CVEs | | linkml-runtime | 2024-12 | ✅ No known CVEs | | dspy-ai | 2025-01 | ✅ No known CVEs | | cachetools | 2024-11 | ✅ No known CVEs | Run security audit: ```bash pip-audit --requirement requirements.txt ``` --- ## Dependency Graph ``` specificity_scorer.py ├── linkml-runtime (schema loading) │ └── pyyaml ├── pydantic (data models) ├── cachetools (performance) └── dspy-ai (classification) └── httpx (LLM API calls) specificity_aware_retriever.py ├── specificity_scorer.py ├── qdrant-client (vector store) └── numpy (score calculations) uml_visualizer.py (optional) ├── graphviz ├── pydot └── specificity_scorer.py ``` --- ## Summary **Minimum viable installation:** ```bash pip install pydantic pyyaml linkml linkml-runtime ``` **Recommended installation:** ```bash pip install pydantic pyyaml linkml linkml-runtime cachetools dspy-ai ``` **Full installation (with visualization and monitoring):** ```bash pip install pydantic pyyaml linkml linkml-runtime linkml-validator cachetools diskcache dspy-ai graphviz pydot prometheus-client structlog ``` --- ## References - `docs/plan/prompt-query_template_mapping/external-dependencies.md` - Related dependencies - `docs/plan/specificity_score/03-rag-dspy-integration.md` - DSPy integration details - `pyproject.toml` - Current project dependencies