glam/docs/dspy_rag/09-performance-guide.md
2025-12-16 11:57:34 +01:00

246 lines
9.4 KiB
Markdown

# RAG Performance Guide
## Overview
This document provides performance benchmarks and optimization guidelines for the multi-database RAG (Retrieval-Augmented Generation) pipeline used in bronhouder.nl.
## Database Architecture
The RAG pipeline integrates four databases, each optimized for different query patterns:
| Database | Purpose | Vectors/Records | Typical Latency |
|----------|---------|-----------------|-----------------|
| **Qdrant** | Vector semantic search | 27,000+ vectors | **38-44ms** (cached) |
| **Oxigraph** | RDF/SPARQL knowledge graph | 148,000+ triples | **36-143ms** (cached) |
| **TypeDB 3.0** | Graph relationships | 27,452 custodians | **200-230ms** (connection overhead) |
| **PostgreSQL/PostGIS** | Source data & geospatial | Full dataset | **11-14ms** (cached) |
## Performance Benchmarks
### Tested Configuration
- Server: Hetzner Cloud (91.98.224.44)
- Date: December 2025
- Test: `/api/rag/query` endpoint with `generate_answer: false`
### Results by Source Combination
| Sources | First Query | Cached Query | Notes |
|---------|------------:|-------------:|-------|
| **qdrant** only | ~2,200ms | 38-44ms | First query loads embedding model |
| **qdrant + sparql** | 357-382ms | 36-143ms | **✅ Recommended default** |
| **typedb** only | ~1,750ms | 200-230ms | Connection warmup on first |
| **qdrant + typedb** | ~320ms | 200-250ms | TypeDB adds ~200ms overhead |
| **postgis** only | ~160ms | 11-14ms | Very fast, needs geo context |
| **qdrant + postgis** | ~73ms | Fast combination |
| **All sources** | ~2,100ms | 230-330ms | TypeDB dominates latency |
### Key Insights
1. **Qdrant + SPARQL is optimal for most queries** (~350ms → ~40ms cached)
2. **TypeDB adds ~200ms overhead** due to connection management
3. **PostGIS is fastest** but only useful for geographic queries
4. **First query penalty** exists for embedding model loading (~2s)
## Recommended Configurations
### Default: Semantic + Knowledge Graph
```json
{
"sources": ["qdrant", "sparql"],
"k": 10,
"generate_answer": true
}
```
Best for: General heritage institution queries, name searches, collection queries.
### Relationship Queries
```json
{
"sources": ["qdrant", "sparql", "typedb"],
"k": 10,
"generate_answer": true
}
```
Best for: Complex relationship queries, organizational hierarchies, custodian networks.
### Geographic Queries
```json
{
"sources": ["qdrant", "postgis"],
"k": 10,
"generate_answer": true
}
```
Best for: Location-based queries, "museums in Amsterdam", proximity searches.
### Maximum Coverage (Slower)
```json
{
"sources": ["qdrant", "sparql", "typedb", "postgis"],
"k": 10,
"generate_answer": true
}
```
Best for: Comprehensive searches where accuracy is more important than speed.
## Architecture Diagram
```
┌─────────────────────────────────────────────────────────────────────┐
│ User Query │
│ "museums in Amsterdam" │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Source Router │
│ (based on selectedSources array) │
└─────────────────────────────────────────────────────────────────────┘
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Qdrant │ │ Oxigraph │ │ TypeDB │
│ (Semantic) │ │ (SPARQL) │ │ (Graph) │
│ ~40ms │ │ ~100ms │ │ ~200ms │
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Result Merger & Ranker │
│ (deduplication, relevance scoring, top-k selection) │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Answer Generator (LLM) │
│ (Claude via DSPy, only if requested) │
└─────────────────────────────────────────────────────────────────────┘
```
## Optimization History
### December 2025: SPARQL LLM Bypass
**Problem**: Queries with `sources=["qdrant", "sparql"]` were taking 7-8 seconds.
**Root Cause**: The `retrieve_from_sparql()` function was redundantly calling DSPy/Claude LLM (~7s) for SPARQL query generation, even though graph expansion was already handled by batch SPARQL in `retrieve_from_qdrant()`.
**Solution**: Modified `/opt/glam-backend/rag/main.py` (line ~736) to skip LLM-based SPARQL retrieval:
```python
elif source == DataSource.SPARQL:
# OPTIMIZATION: Skip LLM-based SPARQL generation for RAG queries.
# Graph expansion is already handled in retrieve_from_qdrant() via batch SPARQL.
logger.debug("SPARQL source selected - graph expansion handled by Qdrant retrieval")
continue
```
**Result**: Query time reduced from 7-8s to 350-380ms (20x improvement).
## Caching Behavior
### Server-Side Caching
- **Semantic Cache**: Embeds questions and matches similar queries
- **TTL**: 15 minutes default
- **Scope**: Per-query result caching
### Client-Side Caching (IndexedDB)
- **Database**: `GLAM_SemanticCache`
- **Purpose**: Reduces redundant API calls
- **Clear**: Use "Clear cache" button in UI
### Cache Invalidation
```bash
# Server-side cache clear
curl -X POST "http://localhost:8010/api/cache/invalidate" \
-H "Content-Type: application/json" \
-d '{"clear_all": true}'
# Or via service restart
systemctl restart glam-rag-api
```
## Monitoring
### Service Status
```bash
ssh root@91.98.224.44
systemctl status glam-rag-api
journalctl -u glam-rag-api -n 50 --no-pager
```
### Performance Testing
```bash
# Test specific source combination
curl -s -X POST "http://localhost:8010/api/rag/query" \
-H "Content-Type: application/json" \
-d '{
"question": "museums in Amsterdam",
"sources": ["qdrant", "sparql"],
"k": 5,
"generate_answer": false
}' | jq '{time_ms: .query_time_ms, count: (.results | length)}'
```
### Expected Results
```json
{
"time_ms": 45.2,
"count": 5
}
```
## Troubleshooting
### Slow SPARQL Queries (>1s)
1. Check if LLM bypass is active in `rag/main.py`
2. Verify Oxigraph is running: `systemctl status oxigraph`
3. Check for complex graph patterns in query
### TypeDB Connection Timeout
1. TypeDB has 200ms connection overhead - this is expected
2. First query after idle may take longer (~1.7s)
3. Consider excluding TypeDB for latency-critical queries
### High First-Query Latency
1. Embedding model loads on first query (~2s)
2. This is expected behavior
3. Subsequent queries will use cached model
## API Reference
### Query Endpoint
```
POST /api/rag/query
```
### Request Body
```json
{
"question": "string",
"sources": ["qdrant", "sparql", "typedb", "postgis"],
"k": 10,
"generate_answer": true,
"language": "en"
}
```
### Response
```json
{
"question": "museums in Amsterdam",
"answer": "There are several museums...",
"results": [...],
"sources_used": ["qdrant", "sparql"],
"query_time_ms": 45.2
}
```
## Related Documentation
- [Architecture](./01-architecture.md) - System components
- [Retrieval Patterns](./06-retrieval-patterns.md) - Hybrid search strategies
- [SPARQL Templates](./07-sparql-templates.md) - Query patterns
- [Evaluation](./08-evaluation.md) - Metrics and benchmarks