# RAG Performance Guide ## Overview This document provides performance benchmarks and optimization guidelines for the multi-database RAG (Retrieval-Augmented Generation) pipeline used in bronhouder.nl. ## Database Architecture The RAG pipeline integrates four databases, each optimized for different query patterns: | Database | Purpose | Vectors/Records | Typical Latency | |----------|---------|-----------------|-----------------| | **Qdrant** | Vector semantic search | 27,000+ vectors | **38-44ms** (cached) | | **Oxigraph** | RDF/SPARQL knowledge graph | 148,000+ triples | **36-143ms** (cached) | | **TypeDB 3.0** | Graph relationships | 27,452 custodians | **200-230ms** (connection overhead) | | **PostgreSQL/PostGIS** | Source data & geospatial | Full dataset | **11-14ms** (cached) | ## Performance Benchmarks ### Tested Configuration - Server: Hetzner Cloud (91.98.224.44) - Date: December 2025 - Test: `/api/rag/query` endpoint with `generate_answer: false` ### Results by Source Combination | Sources | First Query | Cached Query | Notes | |---------|------------:|-------------:|-------| | **qdrant** only | ~2,200ms | 38-44ms | First query loads embedding model | | **qdrant + sparql** | 357-382ms | 36-143ms | **✅ Recommended default** | | **typedb** only | ~1,750ms | 200-230ms | Connection warmup on first | | **qdrant + typedb** | ~320ms | 200-250ms | TypeDB adds ~200ms overhead | | **postgis** only | ~160ms | 11-14ms | Very fast, needs geo context | | **qdrant + postgis** | ~73ms | Fast combination | | **All sources** | ~2,100ms | 230-330ms | TypeDB dominates latency | ### Key Insights 1. **Qdrant + SPARQL is optimal for most queries** (~350ms → ~40ms cached) 2. **TypeDB adds ~200ms overhead** due to connection management 3. **PostGIS is fastest** but only useful for geographic queries 4. **First query penalty** exists for embedding model loading (~2s) ## Recommended Configurations ### Default: Semantic + Knowledge Graph ```json { "sources": ["qdrant", "sparql"], "k": 10, "generate_answer": true } ``` Best for: General heritage institution queries, name searches, collection queries. ### Relationship Queries ```json { "sources": ["qdrant", "sparql", "typedb"], "k": 10, "generate_answer": true } ``` Best for: Complex relationship queries, organizational hierarchies, custodian networks. ### Geographic Queries ```json { "sources": ["qdrant", "postgis"], "k": 10, "generate_answer": true } ``` Best for: Location-based queries, "museums in Amsterdam", proximity searches. ### Maximum Coverage (Slower) ```json { "sources": ["qdrant", "sparql", "typedb", "postgis"], "k": 10, "generate_answer": true } ``` Best for: Comprehensive searches where accuracy is more important than speed. ## Architecture Diagram ``` ┌─────────────────────────────────────────────────────────────────────┐ │ User Query │ │ "museums in Amsterdam" │ └─────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Source Router │ │ (based on selectedSources array) │ └─────────────────────────────────────────────────────────────────────┘ │ ┌───────────────────────┼───────────────────────┐ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ Qdrant │ │ Oxigraph │ │ TypeDB │ │ (Semantic) │ │ (SPARQL) │ │ (Graph) │ │ ~40ms │ │ ~100ms │ │ ~200ms │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ └───────────────────────┼───────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Result Merger & Ranker │ │ (deduplication, relevance scoring, top-k selection) │ └─────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Answer Generator (LLM) │ │ (Claude via DSPy, only if requested) │ └─────────────────────────────────────────────────────────────────────┘ ``` ## Optimization History ### December 2025: SPARQL LLM Bypass **Problem**: Queries with `sources=["qdrant", "sparql"]` were taking 7-8 seconds. **Root Cause**: The `retrieve_from_sparql()` function was redundantly calling DSPy/Claude LLM (~7s) for SPARQL query generation, even though graph expansion was already handled by batch SPARQL in `retrieve_from_qdrant()`. **Solution**: Modified `/opt/glam-backend/rag/main.py` (line ~736) to skip LLM-based SPARQL retrieval: ```python elif source == DataSource.SPARQL: # OPTIMIZATION: Skip LLM-based SPARQL generation for RAG queries. # Graph expansion is already handled in retrieve_from_qdrant() via batch SPARQL. logger.debug("SPARQL source selected - graph expansion handled by Qdrant retrieval") continue ``` **Result**: Query time reduced from 7-8s to 350-380ms (20x improvement). ## Caching Behavior ### Server-Side Caching - **Semantic Cache**: Embeds questions and matches similar queries - **TTL**: 15 minutes default - **Scope**: Per-query result caching ### Client-Side Caching (IndexedDB) - **Database**: `GLAM_SemanticCache` - **Purpose**: Reduces redundant API calls - **Clear**: Use "Clear cache" button in UI ### Cache Invalidation ```bash # Server-side cache clear curl -X POST "http://localhost:8010/api/cache/invalidate" \ -H "Content-Type: application/json" \ -d '{"clear_all": true}' # Or via service restart systemctl restart glam-rag-api ``` ## Monitoring ### Service Status ```bash ssh root@91.98.224.44 systemctl status glam-rag-api journalctl -u glam-rag-api -n 50 --no-pager ``` ### Performance Testing ```bash # Test specific source combination curl -s -X POST "http://localhost:8010/api/rag/query" \ -H "Content-Type: application/json" \ -d '{ "question": "museums in Amsterdam", "sources": ["qdrant", "sparql"], "k": 5, "generate_answer": false }' | jq '{time_ms: .query_time_ms, count: (.results | length)}' ``` ### Expected Results ```json { "time_ms": 45.2, "count": 5 } ``` ## Troubleshooting ### Slow SPARQL Queries (>1s) 1. Check if LLM bypass is active in `rag/main.py` 2. Verify Oxigraph is running: `systemctl status oxigraph` 3. Check for complex graph patterns in query ### TypeDB Connection Timeout 1. TypeDB has 200ms connection overhead - this is expected 2. First query after idle may take longer (~1.7s) 3. Consider excluding TypeDB for latency-critical queries ### High First-Query Latency 1. Embedding model loads on first query (~2s) 2. This is expected behavior 3. Subsequent queries will use cached model ## API Reference ### Query Endpoint ``` POST /api/rag/query ``` ### Request Body ```json { "question": "string", "sources": ["qdrant", "sparql", "typedb", "postgis"], "k": 10, "generate_answer": true, "language": "en" } ``` ### Response ```json { "question": "museums in Amsterdam", "answer": "There are several museums...", "results": [...], "sources_used": ["qdrant", "sparql"], "query_time_ms": 45.2 } ``` ## Related Documentation - [Architecture](./01-architecture.md) - System components - [Retrieval Patterns](./06-retrieval-patterns.md) - Hybrid search strategies - [SPARQL Templates](./07-sparql-templates.md) - Query patterns - [Evaluation](./08-evaluation.md) - Metrics and benchmarks