# External Dependencies ## Overview This document lists the external dependencies required for the template-based SPARQL query generation system. Dependencies are categorized by purpose and include both required and optional packages. ## Required Dependencies ### Core Python Packages These packages are essential for the template system to function: | Package | Version | Purpose | PyPI | |---------|---------|---------|------| | `pydantic` | >=2.0 | Structured output validation, slot schemas | [pydantic](https://pypi.org/project/pydantic/) | | `pyyaml` | >=6.0 | Template definition loading | [PyYAML](https://pypi.org/project/PyYAML/) | | `dspy-ai` | >=2.6 | DSPy framework for template classification | [dspy-ai](https://pypi.org/project/dspy-ai/) | | `httpx` | >=0.25 | SPARQL endpoint HTTP client | [httpx](https://pypi.org/project/httpx/) | | `jinja2` | >=3.0 | Template instantiation engine | [Jinja2](https://pypi.org/project/Jinja2/) | ### Already in Project These packages are already in `pyproject.toml` and will be available: ```toml # From pyproject.toml dependencies = [ "pydantic>=2.0", "pyyaml>=6.0", "dspy-ai>=2.6", "httpx>=0.25", ] ``` ## Optional Dependencies ### Fuzzy Matching (Recommended) For improved slot value resolution when user input doesn't exactly match enum values: | Package | Version | Purpose | PyPI | |---------|---------|---------|------| | `rapidfuzz` | >=3.0 | Fast fuzzy string matching for slot values | [rapidfuzz](https://pypi.org/project/rapidfuzz/) | | `python-Levenshtein` | >=0.21 | Speed up rapidfuzz calculations | [python-Levenshtein](https://pypi.org/project/python-Levenshtein/) | **Usage Example:** ```python from rapidfuzz import fuzz, process # Match user input to valid province codes PROVINCES = ["Noord-Holland", "Zuid-Holland", "Utrecht", "Drenthe", "Gelderland"] def match_province(user_input: str, threshold: float = 70.0) -> str | None: """Fuzzy match user input to valid province.""" result = process.extractOne( user_input, PROVINCES, scorer=fuzz.WRatio, score_cutoff=threshold, ) return result[0] if result else None # Examples match_province("drente") # -> "Drenthe" match_province("N-Holland") # -> "Noord-Holland" match_province("zuudholland") # -> "Zuid-Holland" ``` **Installation:** ```bash pip install rapidfuzz python-Levenshtein ``` ### Semantic Similarity (Optional) For intent classification when questions don't match patterns exactly: | Package | Version | Purpose | PyPI | |---------|---------|---------|------| | `sentence-transformers` | >=2.2 | Semantic similarity for template matching | [sentence-transformers](https://pypi.org/project/sentence-transformers/) | **Usage Example:** ```python from sentence_transformers import SentenceTransformer, util # Load multilingual model for Dutch/English model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2') # Template question patterns PATTERNS = [ "Welke archieven zijn er in {province}?", "Hoeveel musea zijn er in Nederland?", "Wat is het oudste archief?", ] def find_best_template(question: str, threshold: float = 0.7) -> int | None: """Find best matching template by semantic similarity.""" question_embedding = model.encode(question) pattern_embeddings = model.encode(PATTERNS) similarities = util.cos_sim(question_embedding, pattern_embeddings)[0] best_idx = similarities.argmax().item() best_score = similarities[best_idx].item() return best_idx if best_score >= threshold else None # Example find_best_template("Welke archieven heeft Drenthe?") # -> 0 ``` **Installation:** ```bash pip install sentence-transformers ``` **Note:** This adds ~500MB of model weights. Only use if DSPy classification is insufficient. ### SPARQL Validation (Optional) For deeper SPARQL syntax validation beyond regex: | Package | Version | Purpose | PyPI | |---------|---------|---------|------| | `rdflib` | >=6.0 | RDF/SPARQL parsing and validation | [rdflib](https://pypi.org/project/rdflib/) | **Usage Example:** ```python from rdflib.plugins.sparql import prepareQuery from rdflib.plugins.sparql.parser import ParseException def validate_sparql_syntax(query: str) -> tuple[bool, str | None]: """Validate SPARQL syntax using rdflib parser.""" try: prepareQuery(query) return True, None except ParseException as e: return False, str(e) # Example valid, error = validate_sparql_syntax(""" PREFIX hc: SELECT ?s WHERE { ?s a hc:Custodian } """) # -> (True, None) ``` **Installation:** ```bash pip install rdflib ``` ## External Services ### Required Services | Service | Endpoint | Purpose | |---------|----------|---------| | Oxigraph SPARQL | `http://localhost:7878/query` | SPARQL query execution | | Qdrant Vector DB | `http://localhost:6333` | Semantic search fallback | ### Service Availability Checks ```python import httpx async def check_sparql_endpoint( endpoint: str = "http://localhost:7878/query", timeout: float = 5.0, ) -> bool: """Check if SPARQL endpoint is available.""" try: async with httpx.AsyncClient() as client: response = await client.get( endpoint.replace("/query", "/"), timeout=timeout, ) return response.status_code == 200 except Exception: return False async def check_qdrant( host: str = "localhost", port: int = 6333, timeout: float = 5.0, ) -> bool: """Check if Qdrant is available.""" try: async with httpx.AsyncClient() as client: response = await client.get( f"http://{host}:{port}/", timeout=timeout, ) return response.status_code == 200 except Exception: return False ``` ## Project Files Required ### Existing Files These files must exist for the template system to function: | File | Purpose | Status | |------|---------|--------| | `data/validation/sparql_validation_rules.json` | Slot enum values (provinces, types) | ✅ Exists | | `backend/rag/ontology_mapping.py` | Entity extraction, fuzzy matching | ✅ Exists | | `src/glam_extractor/api/sparql_linter.py` | SPARQL validation/correction | ✅ Exists | | `backend/rag/dspy_heritage_rag.py` | Integration point | ✅ Exists | ### New Files to Create | File | Purpose | Status | |------|---------|--------| | `backend/rag/template_sparql.py` | Template loading, classification, instantiation | ❌ To create | | `data/sparql_templates.yaml` | Template definitions | ❌ To create | | `tests/rag/test_template_sparql.py` | Unit tests | ❌ To create | ## pyproject.toml Updates Add optional dependencies for template system: ```toml [project.optional-dependencies] # Template-based SPARQL generation sparql-templates = [ "rapidfuzz>=3.0", "python-Levenshtein>=0.21", "jinja2>=3.0", ] # Full template system with semantic matching sparql-templates-full = [ "rapidfuzz>=3.0", "python-Levenshtein>=0.21", "jinja2>=3.0", "sentence-transformers>=2.2", "rdflib>=6.0", ] ``` **Installation:** ```bash # Minimal template support pip install -e ".[sparql-templates]" # Full template support with semantic matching pip install -e ".[sparql-templates-full]" ``` ## Environment Variables | Variable | Default | Purpose | |----------|---------|---------| | `SPARQL_ENDPOINT` | `http://localhost:7878/query` | SPARQL endpoint URL | | `QDRANT_HOST` | `localhost` | Qdrant host | | `QDRANT_PORT` | `6333` | Qdrant port | | `TEMPLATE_CONFIDENCE_THRESHOLD` | `0.7` | Minimum confidence for template use | | `ENABLE_FUZZY_MATCHING` | `true` | Enable rapidfuzz for slot matching | ## Version Compatibility Matrix | Python | DSPy | Pydantic | Status | |--------|------|----------|--------| | 3.11+ | 2.6+ | 2.0+ | ✅ Supported | | 3.10 | 2.6+ | 2.0+ | ✅ Supported | | 3.9 | 2.5+ | 2.0+ | ⚠️ Limited (no `match` statements) | | <3.9 | - | - | ❌ Not supported | ## Docker Considerations If deploying in Docker, ensure these are in the Dockerfile: ```dockerfile # Python dependencies RUN pip install --no-cache-dir \ pydantic>=2.0 \ pyyaml>=6.0 \ dspy-ai>=2.6 \ httpx>=0.25 \ jinja2>=3.0 \ rapidfuzz>=3.0 # Optional: sentence-transformers (adds ~500MB) # RUN pip install sentence-transformers>=2.2 ``` ## Dependency Security All recommended packages are actively maintained and have no known critical CVEs as of 2025-06. | Package | Last Updated | Security Status | |---------|--------------|-----------------| | pydantic | 2025-05 | ✅ No known CVEs | | rapidfuzz | 2025-06 | ✅ No known CVEs | | dspy-ai | 2025-06 | ✅ No known CVEs | | jinja2 | 2025-04 | ✅ No known CVEs | Run security audit: ```bash pip-audit --requirement requirements.txt ``` ## Summary **Minimum viable installation:** ```bash pip install pydantic pyyaml dspy-ai httpx jinja2 ``` **Recommended installation:** ```bash pip install pydantic pyyaml dspy-ai httpx jinja2 rapidfuzz python-Levenshtein ``` **Full installation (with semantic matching):** ```bash pip install pydantic pyyaml dspy-ai httpx jinja2 rapidfuzz python-Levenshtein sentence-transformers rdflib ```