9.1 KiB
External Dependencies
Overview
This document lists the external dependencies required for the template-based SPARQL query generation system. Dependencies are categorized by purpose and include both required and optional packages.
Required Dependencies
Core Python Packages
These packages are essential for the template system to function:
| Package | Version | Purpose | PyPI |
|---|---|---|---|
pydantic |
>=2.0 | Structured output validation, slot schemas | pydantic |
pyyaml |
>=6.0 | Template definition loading | PyYAML |
dspy-ai |
>=2.6 | DSPy framework for template classification | dspy-ai |
httpx |
>=0.25 | SPARQL endpoint HTTP client | httpx |
jinja2 |
>=3.0 | Template instantiation engine | Jinja2 |
Already in Project
These packages are already in pyproject.toml and will be available:
# From pyproject.toml
dependencies = [
"pydantic>=2.0",
"pyyaml>=6.0",
"dspy-ai>=2.6",
"httpx>=0.25",
]
Optional Dependencies
Fuzzy Matching (Recommended)
For improved slot value resolution when user input doesn't exactly match enum values:
| Package | Version | Purpose | PyPI |
|---|---|---|---|
rapidfuzz |
>=3.0 | Fast fuzzy string matching for slot values | rapidfuzz |
python-Levenshtein |
>=0.21 | Speed up rapidfuzz calculations | python-Levenshtein |
Usage Example:
from rapidfuzz import fuzz, process
# Match user input to valid province codes
PROVINCES = ["Noord-Holland", "Zuid-Holland", "Utrecht", "Drenthe", "Gelderland"]
def match_province(user_input: str, threshold: float = 70.0) -> str | None:
"""Fuzzy match user input to valid province."""
result = process.extractOne(
user_input,
PROVINCES,
scorer=fuzz.WRatio,
score_cutoff=threshold,
)
return result[0] if result else None
# Examples
match_province("drente") # -> "Drenthe"
match_province("N-Holland") # -> "Noord-Holland"
match_province("zuudholland") # -> "Zuid-Holland"
Installation:
pip install rapidfuzz python-Levenshtein
Semantic Similarity (Optional)
For intent classification when questions don't match patterns exactly:
| Package | Version | Purpose | PyPI |
|---|---|---|---|
sentence-transformers |
>=2.2 | Semantic similarity for template matching | sentence-transformers |
Usage Example:
from sentence_transformers import SentenceTransformer, util
# Load multilingual model for Dutch/English
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
# Template question patterns
PATTERNS = [
"Welke archieven zijn er in {province}?",
"Hoeveel musea zijn er in Nederland?",
"Wat is het oudste archief?",
]
def find_best_template(question: str, threshold: float = 0.7) -> int | None:
"""Find best matching template by semantic similarity."""
question_embedding = model.encode(question)
pattern_embeddings = model.encode(PATTERNS)
similarities = util.cos_sim(question_embedding, pattern_embeddings)[0]
best_idx = similarities.argmax().item()
best_score = similarities[best_idx].item()
return best_idx if best_score >= threshold else None
# Example
find_best_template("Welke archieven heeft Drenthe?") # -> 0
Installation:
pip install sentence-transformers
Note: This adds ~500MB of model weights. Only use if DSPy classification is insufficient.
SPARQL Validation (Optional)
For deeper SPARQL syntax validation beyond regex:
| Package | Version | Purpose | PyPI |
|---|---|---|---|
rdflib |
>=6.0 | RDF/SPARQL parsing and validation | rdflib |
Usage Example:
from rdflib.plugins.sparql import prepareQuery
from rdflib.plugins.sparql.parser import ParseException
def validate_sparql_syntax(query: str) -> tuple[bool, str | None]:
"""Validate SPARQL syntax using rdflib parser."""
try:
prepareQuery(query)
return True, None
except ParseException as e:
return False, str(e)
# Example
valid, error = validate_sparql_syntax("""
PREFIX hc: <https://nde.nl/ontology/hc/>
SELECT ?s WHERE { ?s a hc:Custodian }
""")
# -> (True, None)
Installation:
pip install rdflib
External Services
Required Services
| Service | Endpoint | Purpose |
|---|---|---|
| Oxigraph SPARQL | http://localhost:7878/query |
SPARQL query execution |
| Qdrant Vector DB | http://localhost:6333 |
Semantic search fallback |
Service Availability Checks
import httpx
async def check_sparql_endpoint(
endpoint: str = "http://localhost:7878/query",
timeout: float = 5.0,
) -> bool:
"""Check if SPARQL endpoint is available."""
try:
async with httpx.AsyncClient() as client:
response = await client.get(
endpoint.replace("/query", "/"),
timeout=timeout,
)
return response.status_code == 200
except Exception:
return False
async def check_qdrant(
host: str = "localhost",
port: int = 6333,
timeout: float = 5.0,
) -> bool:
"""Check if Qdrant is available."""
try:
async with httpx.AsyncClient() as client:
response = await client.get(
f"http://{host}:{port}/",
timeout=timeout,
)
return response.status_code == 200
except Exception:
return False
Project Files Required
Existing Files
These files must exist for the template system to function:
| File | Purpose | Status |
|---|---|---|
data/validation/sparql_validation_rules.json |
Slot enum values (provinces, types) | ✅ Exists |
backend/rag/ontology_mapping.py |
Entity extraction, fuzzy matching | ✅ Exists |
src/glam_extractor/api/sparql_linter.py |
SPARQL validation/correction | ✅ Exists |
backend/rag/dspy_heritage_rag.py |
Integration point | ✅ Exists |
New Files to Create
| File | Purpose | Status |
|---|---|---|
backend/rag/template_sparql.py |
Template loading, classification, instantiation | ❌ To create |
data/sparql_templates.yaml |
Template definitions | ❌ To create |
tests/rag/test_template_sparql.py |
Unit tests | ❌ To create |
pyproject.toml Updates
Add optional dependencies for template system:
[project.optional-dependencies]
# Template-based SPARQL generation
sparql-templates = [
"rapidfuzz>=3.0",
"python-Levenshtein>=0.21",
"jinja2>=3.0",
]
# Full template system with semantic matching
sparql-templates-full = [
"rapidfuzz>=3.0",
"python-Levenshtein>=0.21",
"jinja2>=3.0",
"sentence-transformers>=2.2",
"rdflib>=6.0",
]
Installation:
# Minimal template support
pip install -e ".[sparql-templates]"
# Full template support with semantic matching
pip install -e ".[sparql-templates-full]"
Environment Variables
| Variable | Default | Purpose |
|---|---|---|
SPARQL_ENDPOINT |
http://localhost:7878/query |
SPARQL endpoint URL |
QDRANT_HOST |
localhost |
Qdrant host |
QDRANT_PORT |
6333 |
Qdrant port |
TEMPLATE_CONFIDENCE_THRESHOLD |
0.7 |
Minimum confidence for template use |
ENABLE_FUZZY_MATCHING |
true |
Enable rapidfuzz for slot matching |
Version Compatibility Matrix
| Python | DSPy | Pydantic | Status |
|---|---|---|---|
| 3.11+ | 2.6+ | 2.0+ | ✅ Supported |
| 3.10 | 2.6+ | 2.0+ | ✅ Supported |
| 3.9 | 2.5+ | 2.0+ | ⚠️ Limited (no match statements) |
| <3.9 | - | - | ❌ Not supported |
Docker Considerations
If deploying in Docker, ensure these are in the Dockerfile:
# Python dependencies
RUN pip install --no-cache-dir \
pydantic>=2.0 \
pyyaml>=6.0 \
dspy-ai>=2.6 \
httpx>=0.25 \
jinja2>=3.0 \
rapidfuzz>=3.0
# Optional: sentence-transformers (adds ~500MB)
# RUN pip install sentence-transformers>=2.2
Dependency Security
All recommended packages are actively maintained and have no known critical CVEs as of 2025-06.
| Package | Last Updated | Security Status |
|---|---|---|
| pydantic | 2025-05 | ✅ No known CVEs |
| rapidfuzz | 2025-06 | ✅ No known CVEs |
| dspy-ai | 2025-06 | ✅ No known CVEs |
| jinja2 | 2025-04 | ✅ No known CVEs |
Run security audit:
pip-audit --requirement requirements.txt
Summary
Minimum viable installation:
pip install pydantic pyyaml dspy-ai httpx jinja2
Recommended installation:
pip install pydantic pyyaml dspy-ai httpx jinja2 rapidfuzz python-Levenshtein
Full installation (with semantic matching):
pip install pydantic pyyaml dspy-ai httpx jinja2 rapidfuzz python-Levenshtein sentence-transformers rdflib