kempersc b0416efc7d enrich custodians and persons

2025-12-16 11:57:34 +01:00

9.4 KiB

Raw Blame History

RAG Performance Guide

Overview

This document provides performance benchmarks and optimization guidelines for the multi-database RAG (Retrieval-Augmented Generation) pipeline used in bronhouder.nl.

Database Architecture

The RAG pipeline integrates four databases, each optimized for different query patterns:

Database	Purpose	Vectors/Records	Typical Latency
Qdrant	Vector semantic search	27,000+ vectors	38-44ms (cached)
Oxigraph	RDF/SPARQL knowledge graph	148,000+ triples	36-143ms (cached)
TypeDB 3.0	Graph relationships	27,452 custodians	200-230ms (connection overhead)
PostgreSQL/PostGIS	Source data & geospatial	Full dataset	11-14ms (cached)

Performance Benchmarks

Tested Configuration

Server: Hetzner Cloud (91.98.224.44)
Date: December 2025
Test: /api/rag/query endpoint with generate_answer: false

Results by Source Combination

Sources	First Query	Cached Query	Notes
qdrant only	~2,200ms	38-44ms	First query loads embedding model
qdrant + sparql	357-382ms	36-143ms	✅ Recommended default
typedb only	~1,750ms	200-230ms	Connection warmup on first
qdrant + typedb	~320ms	200-250ms	TypeDB adds ~200ms overhead
postgis only	~160ms	11-14ms	Very fast, needs geo context
qdrant + postgis	~73ms	Fast combination
All sources	~2,100ms	230-330ms	TypeDB dominates latency

Key Insights

Qdrant + SPARQL is optimal for most queries (~350ms → ~40ms cached)
TypeDB adds ~200ms overhead due to connection management
PostGIS is fastest but only useful for geographic queries
First query penalty exists for embedding model loading (~2s)

Recommended Configurations

Default: Semantic + Knowledge Graph

{
  "sources": ["qdrant", "sparql"],
  "k": 10,
  "generate_answer": true
}

Best for: General heritage institution queries, name searches, collection queries.

Relationship Queries

{
  "sources": ["qdrant", "sparql", "typedb"],
  "k": 10,
  "generate_answer": true
}

Best for: Complex relationship queries, organizational hierarchies, custodian networks.

Geographic Queries

{
  "sources": ["qdrant", "postgis"],
  "k": 10,
  "generate_answer": true
}

Best for: Location-based queries, "museums in Amsterdam", proximity searches.

Maximum Coverage (Slower)

{
  "sources": ["qdrant", "sparql", "typedb", "postgis"],
  "k": 10,
  "generate_answer": true
}

Best for: Comprehensive searches where accuracy is more important than speed.

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                          User Query                                  │
│                    "museums in Amsterdam"                            │
└─────────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       Source Router                                  │
│              (based on selectedSources array)                        │
└─────────────────────────────────────────────────────────────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        ▼                       ▼                       ▼
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│    Qdrant     │       │   Oxigraph    │       │    TypeDB     │
│  (Semantic)   │       │   (SPARQL)    │       │    (Graph)    │
│   ~40ms       │       │   ~100ms      │       │   ~200ms      │
└───────────────┘       └───────────────┘       └───────────────┘
        │                       │                       │
        └───────────────────────┼───────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Result Merger & Ranker                            │
│         (deduplication, relevance scoring, top-k selection)          │
└─────────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     Answer Generator (LLM)                           │
│              (Claude via DSPy, only if requested)                    │
└─────────────────────────────────────────────────────────────────────┘

Optimization History

December 2025: SPARQL LLM Bypass

Problem: Queries with sources=["qdrant", "sparql"] were taking 7-8 seconds.

Root Cause: The retrieve_from_sparql() function was redundantly calling DSPy/Claude LLM (~7s) for SPARQL query generation, even though graph expansion was already handled by batch SPARQL in retrieve_from_qdrant().

Solution: Modified /opt/glam-backend/rag/main.py (line ~736) to skip LLM-based SPARQL retrieval:

elif source == DataSource.SPARQL:
    # OPTIMIZATION: Skip LLM-based SPARQL generation for RAG queries.
    # Graph expansion is already handled in retrieve_from_qdrant() via batch SPARQL.
    logger.debug("SPARQL source selected - graph expansion handled by Qdrant retrieval")
    continue

Result: Query time reduced from 7-8s to 350-380ms (20x improvement).

Caching Behavior

Server-Side Caching

Semantic Cache: Embeds questions and matches similar queries
TTL: 15 minutes default
Scope: Per-query result caching

Client-Side Caching (IndexedDB)

Database: GLAM_SemanticCache
Purpose: Reduces redundant API calls
Clear: Use "Clear cache" button in UI

Cache Invalidation

# Server-side cache clear
curl -X POST "http://localhost:8010/api/cache/invalidate" \
  -H "Content-Type: application/json" \
  -d '{"clear_all": true}'

# Or via service restart
systemctl restart glam-rag-api

Monitoring

Service Status

ssh root@91.98.224.44
systemctl status glam-rag-api
journalctl -u glam-rag-api -n 50 --no-pager

Performance Testing

# Test specific source combination
curl -s -X POST "http://localhost:8010/api/rag/query" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "museums in Amsterdam",
    "sources": ["qdrant", "sparql"],
    "k": 5,
    "generate_answer": false
  }' | jq '{time_ms: .query_time_ms, count: (.results | length)}'

Expected Results

{
  "time_ms": 45.2,
  "count": 5
}

Troubleshooting

Slow SPARQL Queries (>1s)

Check if LLM bypass is active in rag/main.py
Verify Oxigraph is running: systemctl status oxigraph
Check for complex graph patterns in query

TypeDB Connection Timeout

TypeDB has 200ms connection overhead - this is expected
First query after idle may take longer (~1.7s)
Consider excluding TypeDB for latency-critical queries

High First-Query Latency

Embedding model loads on first query (~2s)
This is expected behavior
Subsequent queries will use cached model

API Reference

Query Endpoint

POST /api/rag/query

Request Body

{
  "question": "string",
  "sources": ["qdrant", "sparql", "typedb", "postgis"],
  "k": 10,
  "generate_answer": true,
  "language": "en"
}

Response

{
  "question": "museums in Amsterdam",
  "answer": "There are several museums...",
  "results": [...],
  "sources_used": ["qdrant", "sparql"],
  "query_time_ms": 45.2
}

Architecture - System components
Retrieval Patterns - Hybrid search strategies
SPARQL Templates - Query patterns
Evaluation - Metrics and benchmarks

9.4 KiB Raw Blame History