GraphRAG Pattern Comparison Matrix
Purpose: Quick reference comparing our current implementation against external patterns.
Comparison Matrix
| Capability |
Our Current State |
Microsoft GraphRAG |
ROGRAG |
Zep |
HyperGraphRAG |
LightRAG |
| Vector Search |
Qdrant |
Azure Cognitive |
Faiss |
Custom |
Sentence-BERT |
Faiss |
| Knowledge Graph |
Oxigraph (RDF) + TypeDB |
LanceDB |
TuGraph |
Neo4j |
Custom hypergraph |
Neo4j |
| LLM Orchestration |
DSPy |
Azure OpenAI |
Qwen |
OpenAI |
GPT-4o |
Various |
| Community Detection |
Not implemented |
Leiden algorithm |
None |
Dynamic clustering |
None |
Louvain |
| Temporal Modeling |
GHCID history |
Not built-in |
None |
Bitemporal (T, T') |
None |
None |
| Multi-hop Retrieval |
SPARQL traversal |
Graph expansion |
Logic form |
BFS |
Hyperedge walk |
Graph paths |
| Verification Layer |
Not implemented |
Claim extraction |
Argument checking |
None |
None |
None |
| N-ary Relations |
CIDOC-CRM events |
Binary only |
Binary only |
Binary only |
Hyperedges |
Binary only |
| Cost Optimization |
Semantic caching |
Community summaries |
Minimal graph |
Caching |
None |
Simple graph |
Gap Analysis
What We Have (Strengths)
| Feature |
Description |
Files |
| Template SPARQL |
65% precision vs 10% LLM-only |
template_sparql.py |
| Semantic caching |
Redis-backed, reduces LLM calls |
semantic_cache.py |
| Cost tracking |
Token/latency monitoring |
cost_tracker.py |
| Ontology grounding |
LinkML schema validation |
schema_loader.py |
| Temporal tracking |
GHCID history with valid_from/to |
LinkML schema |
| Multi-hop SPARQL |
Graph traversal via SPARQL |
dspy_heritage_rag.py |
| Entity extraction |
Heritage-specific NER |
DSPy signatures |
What We're Missing (Gaps)
| Gap |
Priority |
Implementation Effort |
Benefit |
| Retrieval verification |
High |
Low (DSPy signature) |
Reduces hallucination |
| Community summaries |
High |
Medium (Leiden + indexing) |
Enables global questions |
| Dual-level extraction |
High |
Low (DSPy signature) |
Better entity+relation matching |
| Graph context enrichment |
Medium |
Low (extend retrieval) |
Fixes weak embeddings |
| Exploration suggestions |
Medium |
Medium (session memory) |
Improves user experience |
| Hypergraph memory |
Low |
High (new architecture) |
Multi-step reasoning |
Implementation Priority
Priority 1 (This Sprint)
├── Retrieval Verification Layer
│ └── ArgumentVerifier DSPy signature
├── Dual-Level Entity Extraction
│ └── Extend HeritageEntityExtractor
└── Temporal SPARQL Templates
└── Point-in-time query mode
Priority 2 (Next Sprint)
├── Community Detection Pipeline
│ └── Leiden algorithm on institution graph
├── Community Summary Indexing
│ └── Store in Qdrant with embeddings
└── Global Search Mode
└── Search summaries for holistic queries
Priority 3 (Backlog)
├── Session Memory Evolution
│ └── HGMEM-style working memory
├── CIDOC-CRM Event Hyperedges
│ └── Rich custody transfer modeling
└── Exploration Suggestions
└── Suggest related queries
Quick Reference: Pattern Mapping
| External Pattern |
Our Implementation Approach |
| GraphRAG communities |
Pre-compute Leiden clusters in Oxigraph, store summaries in Qdrant |
| ROGRAG dual-level |
DSPy signature: entities (low) + relations (high) |
| ROGRAG verification |
DSPy signature: ArgumentVerifier before generation |
| Zep bitemporal |
Already have via GHCID history (extend SPARQL templates) |
| HyperGraphRAG hyperedges |
CIDOC-CRM events (crm:E10_Transfer_of_Custody) |
| LightRAG simple graph |
We use more complete graph, but can adopt "star graph sufficiency" thinking |
Files to Modify
| File |
Changes |
dspy_heritage_rag.py |
Add ArgumentVerifier, DualLevelExtractor, global_search mode |
template_sparql.py |
Add temporal query templates |
session_manager.py |
Add working memory evolution |
New: community_indexer.py |
Leiden detection, summary generation |
New: exploration_suggester.py |
Pattern-based query suggestions |