246 lines
9.4 KiB
Markdown
246 lines
9.4 KiB
Markdown
# RAG Performance Guide
|
|
|
|
## Overview
|
|
|
|
This document provides performance benchmarks and optimization guidelines for the multi-database RAG (Retrieval-Augmented Generation) pipeline used in bronhouder.nl.
|
|
|
|
## Database Architecture
|
|
|
|
The RAG pipeline integrates four databases, each optimized for different query patterns:
|
|
|
|
| Database | Purpose | Vectors/Records | Typical Latency |
|
|
|----------|---------|-----------------|-----------------|
|
|
| **Qdrant** | Vector semantic search | 27,000+ vectors | **38-44ms** (cached) |
|
|
| **Oxigraph** | RDF/SPARQL knowledge graph | 148,000+ triples | **36-143ms** (cached) |
|
|
| **TypeDB 3.0** | Graph relationships | 27,452 custodians | **200-230ms** (connection overhead) |
|
|
| **PostgreSQL/PostGIS** | Source data & geospatial | Full dataset | **11-14ms** (cached) |
|
|
|
|
## Performance Benchmarks
|
|
|
|
### Tested Configuration
|
|
- Server: Hetzner Cloud (91.98.224.44)
|
|
- Date: December 2025
|
|
- Test: `/api/rag/query` endpoint with `generate_answer: false`
|
|
|
|
### Results by Source Combination
|
|
|
|
| Sources | First Query | Cached Query | Notes |
|
|
|---------|------------:|-------------:|-------|
|
|
| **qdrant** only | ~2,200ms | 38-44ms | First query loads embedding model |
|
|
| **qdrant + sparql** | 357-382ms | 36-143ms | **✅ Recommended default** |
|
|
| **typedb** only | ~1,750ms | 200-230ms | Connection warmup on first |
|
|
| **qdrant + typedb** | ~320ms | 200-250ms | TypeDB adds ~200ms overhead |
|
|
| **postgis** only | ~160ms | 11-14ms | Very fast, needs geo context |
|
|
| **qdrant + postgis** | ~73ms | Fast combination |
|
|
| **All sources** | ~2,100ms | 230-330ms | TypeDB dominates latency |
|
|
|
|
### Key Insights
|
|
|
|
1. **Qdrant + SPARQL is optimal for most queries** (~350ms → ~40ms cached)
|
|
2. **TypeDB adds ~200ms overhead** due to connection management
|
|
3. **PostGIS is fastest** but only useful for geographic queries
|
|
4. **First query penalty** exists for embedding model loading (~2s)
|
|
|
|
## Recommended Configurations
|
|
|
|
### Default: Semantic + Knowledge Graph
|
|
```json
|
|
{
|
|
"sources": ["qdrant", "sparql"],
|
|
"k": 10,
|
|
"generate_answer": true
|
|
}
|
|
```
|
|
Best for: General heritage institution queries, name searches, collection queries.
|
|
|
|
### Relationship Queries
|
|
```json
|
|
{
|
|
"sources": ["qdrant", "sparql", "typedb"],
|
|
"k": 10,
|
|
"generate_answer": true
|
|
}
|
|
```
|
|
Best for: Complex relationship queries, organizational hierarchies, custodian networks.
|
|
|
|
### Geographic Queries
|
|
```json
|
|
{
|
|
"sources": ["qdrant", "postgis"],
|
|
"k": 10,
|
|
"generate_answer": true
|
|
}
|
|
```
|
|
Best for: Location-based queries, "museums in Amsterdam", proximity searches.
|
|
|
|
### Maximum Coverage (Slower)
|
|
```json
|
|
{
|
|
"sources": ["qdrant", "sparql", "typedb", "postgis"],
|
|
"k": 10,
|
|
"generate_answer": true
|
|
}
|
|
```
|
|
Best for: Comprehensive searches where accuracy is more important than speed.
|
|
|
|
## Architecture Diagram
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ User Query │
|
|
│ "museums in Amsterdam" │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Source Router │
|
|
│ (based on selectedSources array) │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
┌───────────────────────┼───────────────────────┐
|
|
▼ ▼ ▼
|
|
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
|
│ Qdrant │ │ Oxigraph │ │ TypeDB │
|
|
│ (Semantic) │ │ (SPARQL) │ │ (Graph) │
|
|
│ ~40ms │ │ ~100ms │ │ ~200ms │
|
|
└───────────────┘ └───────────────┘ └───────────────┘
|
|
│ │ │
|
|
└───────────────────────┼───────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Result Merger & Ranker │
|
|
│ (deduplication, relevance scoring, top-k selection) │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Answer Generator (LLM) │
|
|
│ (Claude via DSPy, only if requested) │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Optimization History
|
|
|
|
### December 2025: SPARQL LLM Bypass
|
|
**Problem**: Queries with `sources=["qdrant", "sparql"]` were taking 7-8 seconds.
|
|
|
|
**Root Cause**: The `retrieve_from_sparql()` function was redundantly calling DSPy/Claude LLM (~7s) for SPARQL query generation, even though graph expansion was already handled by batch SPARQL in `retrieve_from_qdrant()`.
|
|
|
|
**Solution**: Modified `/opt/glam-backend/rag/main.py` (line ~736) to skip LLM-based SPARQL retrieval:
|
|
|
|
```python
|
|
elif source == DataSource.SPARQL:
|
|
# OPTIMIZATION: Skip LLM-based SPARQL generation for RAG queries.
|
|
# Graph expansion is already handled in retrieve_from_qdrant() via batch SPARQL.
|
|
logger.debug("SPARQL source selected - graph expansion handled by Qdrant retrieval")
|
|
continue
|
|
```
|
|
|
|
**Result**: Query time reduced from 7-8s to 350-380ms (20x improvement).
|
|
|
|
## Caching Behavior
|
|
|
|
### Server-Side Caching
|
|
- **Semantic Cache**: Embeds questions and matches similar queries
|
|
- **TTL**: 15 minutes default
|
|
- **Scope**: Per-query result caching
|
|
|
|
### Client-Side Caching (IndexedDB)
|
|
- **Database**: `GLAM_SemanticCache`
|
|
- **Purpose**: Reduces redundant API calls
|
|
- **Clear**: Use "Clear cache" button in UI
|
|
|
|
### Cache Invalidation
|
|
```bash
|
|
# Server-side cache clear
|
|
curl -X POST "http://localhost:8010/api/cache/invalidate" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"clear_all": true}'
|
|
|
|
# Or via service restart
|
|
systemctl restart glam-rag-api
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
### Service Status
|
|
```bash
|
|
ssh root@91.98.224.44
|
|
systemctl status glam-rag-api
|
|
journalctl -u glam-rag-api -n 50 --no-pager
|
|
```
|
|
|
|
### Performance Testing
|
|
```bash
|
|
# Test specific source combination
|
|
curl -s -X POST "http://localhost:8010/api/rag/query" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"question": "museums in Amsterdam",
|
|
"sources": ["qdrant", "sparql"],
|
|
"k": 5,
|
|
"generate_answer": false
|
|
}' | jq '{time_ms: .query_time_ms, count: (.results | length)}'
|
|
```
|
|
|
|
### Expected Results
|
|
```json
|
|
{
|
|
"time_ms": 45.2,
|
|
"count": 5
|
|
}
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Slow SPARQL Queries (>1s)
|
|
1. Check if LLM bypass is active in `rag/main.py`
|
|
2. Verify Oxigraph is running: `systemctl status oxigraph`
|
|
3. Check for complex graph patterns in query
|
|
|
|
### TypeDB Connection Timeout
|
|
1. TypeDB has 200ms connection overhead - this is expected
|
|
2. First query after idle may take longer (~1.7s)
|
|
3. Consider excluding TypeDB for latency-critical queries
|
|
|
|
### High First-Query Latency
|
|
1. Embedding model loads on first query (~2s)
|
|
2. This is expected behavior
|
|
3. Subsequent queries will use cached model
|
|
|
|
## API Reference
|
|
|
|
### Query Endpoint
|
|
```
|
|
POST /api/rag/query
|
|
```
|
|
|
|
### Request Body
|
|
```json
|
|
{
|
|
"question": "string",
|
|
"sources": ["qdrant", "sparql", "typedb", "postgis"],
|
|
"k": 10,
|
|
"generate_answer": true,
|
|
"language": "en"
|
|
}
|
|
```
|
|
|
|
### Response
|
|
```json
|
|
{
|
|
"question": "museums in Amsterdam",
|
|
"answer": "There are several museums...",
|
|
"results": [...],
|
|
"sources_used": ["qdrant", "sparql"],
|
|
"query_time_ms": 45.2
|
|
}
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
- [Architecture](./01-architecture.md) - System components
|
|
- [Retrieval Patterns](./06-retrieval-patterns.md) - Hybrid search strategies
|
|
- [SPARQL Templates](./07-sparql-templates.md) - Query patterns
|
|
- [Evaluation](./08-evaluation.md) - Metrics and benchmarks
|