9.4 KiB
RAG Performance Guide
Overview
This document provides performance benchmarks and optimization guidelines for the multi-database RAG (Retrieval-Augmented Generation) pipeline used in bronhouder.nl.
Database Architecture
The RAG pipeline integrates four databases, each optimized for different query patterns:
| Database | Purpose | Vectors/Records | Typical Latency |
|---|---|---|---|
| Qdrant | Vector semantic search | 27,000+ vectors | 38-44ms (cached) |
| Oxigraph | RDF/SPARQL knowledge graph | 148,000+ triples | 36-143ms (cached) |
| TypeDB 3.0 | Graph relationships | 27,452 custodians | 200-230ms (connection overhead) |
| PostgreSQL/PostGIS | Source data & geospatial | Full dataset | 11-14ms (cached) |
Performance Benchmarks
Tested Configuration
- Server: Hetzner Cloud (91.98.224.44)
- Date: December 2025
- Test:
/api/rag/queryendpoint withgenerate_answer: false
Results by Source Combination
| Sources | First Query | Cached Query | Notes |
|---|---|---|---|
| qdrant only | ~2,200ms | 38-44ms | First query loads embedding model |
| qdrant + sparql | 357-382ms | 36-143ms | ✅ Recommended default |
| typedb only | ~1,750ms | 200-230ms | Connection warmup on first |
| qdrant + typedb | ~320ms | 200-250ms | TypeDB adds ~200ms overhead |
| postgis only | ~160ms | 11-14ms | Very fast, needs geo context |
| qdrant + postgis | ~73ms | Fast combination | |
| All sources | ~2,100ms | 230-330ms | TypeDB dominates latency |
Key Insights
- Qdrant + SPARQL is optimal for most queries (~350ms → ~40ms cached)
- TypeDB adds ~200ms overhead due to connection management
- PostGIS is fastest but only useful for geographic queries
- First query penalty exists for embedding model loading (~2s)
Recommended Configurations
Default: Semantic + Knowledge Graph
{
"sources": ["qdrant", "sparql"],
"k": 10,
"generate_answer": true
}
Best for: General heritage institution queries, name searches, collection queries.
Relationship Queries
{
"sources": ["qdrant", "sparql", "typedb"],
"k": 10,
"generate_answer": true
}
Best for: Complex relationship queries, organizational hierarchies, custodian networks.
Geographic Queries
{
"sources": ["qdrant", "postgis"],
"k": 10,
"generate_answer": true
}
Best for: Location-based queries, "museums in Amsterdam", proximity searches.
Maximum Coverage (Slower)
{
"sources": ["qdrant", "sparql", "typedb", "postgis"],
"k": 10,
"generate_answer": true
}
Best for: Comprehensive searches where accuracy is more important than speed.
Architecture Diagram
┌─────────────────────────────────────────────────────────────────────┐
│ User Query │
│ "museums in Amsterdam" │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Source Router │
│ (based on selectedSources array) │
└─────────────────────────────────────────────────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Qdrant │ │ Oxigraph │ │ TypeDB │
│ (Semantic) │ │ (SPARQL) │ │ (Graph) │
│ ~40ms │ │ ~100ms │ │ ~200ms │
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Result Merger & Ranker │
│ (deduplication, relevance scoring, top-k selection) │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Answer Generator (LLM) │
│ (Claude via DSPy, only if requested) │
└─────────────────────────────────────────────────────────────────────┘
Optimization History
December 2025: SPARQL LLM Bypass
Problem: Queries with sources=["qdrant", "sparql"] were taking 7-8 seconds.
Root Cause: The retrieve_from_sparql() function was redundantly calling DSPy/Claude LLM (~7s) for SPARQL query generation, even though graph expansion was already handled by batch SPARQL in retrieve_from_qdrant().
Solution: Modified /opt/glam-backend/rag/main.py (line ~736) to skip LLM-based SPARQL retrieval:
elif source == DataSource.SPARQL:
# OPTIMIZATION: Skip LLM-based SPARQL generation for RAG queries.
# Graph expansion is already handled in retrieve_from_qdrant() via batch SPARQL.
logger.debug("SPARQL source selected - graph expansion handled by Qdrant retrieval")
continue
Result: Query time reduced from 7-8s to 350-380ms (20x improvement).
Caching Behavior
Server-Side Caching
- Semantic Cache: Embeds questions and matches similar queries
- TTL: 15 minutes default
- Scope: Per-query result caching
Client-Side Caching (IndexedDB)
- Database:
GLAM_SemanticCache - Purpose: Reduces redundant API calls
- Clear: Use "Clear cache" button in UI
Cache Invalidation
# Server-side cache clear
curl -X POST "http://localhost:8010/api/cache/invalidate" \
-H "Content-Type: application/json" \
-d '{"clear_all": true}'
# Or via service restart
systemctl restart glam-rag-api
Monitoring
Service Status
ssh root@91.98.224.44
systemctl status glam-rag-api
journalctl -u glam-rag-api -n 50 --no-pager
Performance Testing
# Test specific source combination
curl -s -X POST "http://localhost:8010/api/rag/query" \
-H "Content-Type: application/json" \
-d '{
"question": "museums in Amsterdam",
"sources": ["qdrant", "sparql"],
"k": 5,
"generate_answer": false
}' | jq '{time_ms: .query_time_ms, count: (.results | length)}'
Expected Results
{
"time_ms": 45.2,
"count": 5
}
Troubleshooting
Slow SPARQL Queries (>1s)
- Check if LLM bypass is active in
rag/main.py - Verify Oxigraph is running:
systemctl status oxigraph - Check for complex graph patterns in query
TypeDB Connection Timeout
- TypeDB has 200ms connection overhead - this is expected
- First query after idle may take longer (~1.7s)
- Consider excluding TypeDB for latency-critical queries
High First-Query Latency
- Embedding model loads on first query (~2s)
- This is expected behavior
- Subsequent queries will use cached model
API Reference
Query Endpoint
POST /api/rag/query
Request Body
{
"question": "string",
"sources": ["qdrant", "sparql", "typedb", "postgis"],
"k": 10,
"generate_answer": true,
"language": "en"
}
Response
{
"question": "museums in Amsterdam",
"answer": "There are several museums...",
"results": [...],
"sources_used": ["qdrant", "sparql"],
"query_time_ms": 45.2
}
Related Documentation
- Architecture - System components
- Retrieval Patterns - Hybrid search strategies
- SPARQL Templates - Query patterns
- Evaluation - Metrics and benchmarks