# GraphRAG Pattern Comparison Matrix **Purpose**: Quick reference comparing our current implementation against external patterns. ## Comparison Matrix | Capability | Our Current State | Microsoft GraphRAG | ROGRAG | Zep | HyperGraphRAG | LightRAG | |------------|-------------------|-------------------|--------|-----|---------------|----------| | **Vector Search** | Qdrant | Azure Cognitive | Faiss | Custom | Sentence-BERT | Faiss | | **Knowledge Graph** | Oxigraph (RDF) + TypeDB | LanceDB | TuGraph | Neo4j | Custom hypergraph | Neo4j | | **LLM Orchestration** | DSPy | Azure OpenAI | Qwen | OpenAI | GPT-4o | Various | | **Community Detection** | Not implemented | Leiden algorithm | None | Dynamic clustering | None | Louvain | | **Temporal Modeling** | GHCID history | Not built-in | None | Bitemporal (T, T') | None | None | | **Multi-hop Retrieval** | SPARQL traversal | Graph expansion | Logic form | BFS | Hyperedge walk | Graph paths | | **Verification Layer** | Not implemented | Claim extraction | Argument checking | None | None | None | | **N-ary Relations** | CIDOC-CRM events | Binary only | Binary only | Binary only | Hyperedges | Binary only | | **Cost Optimization** | Semantic caching | Community summaries | Minimal graph | Caching | None | Simple graph | ## Gap Analysis ### What We Have (Strengths) | Feature | Description | Files | |---------|-------------|-------| | Template SPARQL | 65% precision vs 10% LLM-only | `template_sparql.py` | | Semantic caching | Redis-backed, reduces LLM calls | `semantic_cache.py` | | Cost tracking | Token/latency monitoring | `cost_tracker.py` | | Ontology grounding | LinkML schema validation | `schema_loader.py` | | Temporal tracking | GHCID history with valid_from/to | LinkML schema | | Multi-hop SPARQL | Graph traversal via SPARQL | `dspy_heritage_rag.py` | | Entity extraction | Heritage-specific NER | DSPy signatures | ### What We're Missing (Gaps) | Gap | Priority | Implementation Effort | Benefit | |-----|----------|----------------------|---------| | Retrieval verification | High | Low (DSPy signature) | Reduces hallucination | | Community summaries | High | Medium (Leiden + indexing) | Enables global questions | | Dual-level extraction | High | Low (DSPy signature) | Better entity+relation matching | | Graph context enrichment | Medium | Low (extend retrieval) | Fixes weak embeddings | | Exploration suggestions | Medium | Medium (session memory) | Improves user experience | | Hypergraph memory | Low | High (new architecture) | Multi-step reasoning | ## Implementation Priority ``` Priority 1 (This Sprint) ├── Retrieval Verification Layer │ └── ArgumentVerifier DSPy signature ├── Dual-Level Entity Extraction │ └── Extend HeritageEntityExtractor └── Temporal SPARQL Templates └── Point-in-time query mode Priority 2 (Next Sprint) ├── Community Detection Pipeline │ └── Leiden algorithm on institution graph ├── Community Summary Indexing │ └── Store in Qdrant with embeddings └── Global Search Mode └── Search summaries for holistic queries Priority 3 (Backlog) ├── Session Memory Evolution │ └── HGMEM-style working memory ├── CIDOC-CRM Event Hyperedges │ └── Rich custody transfer modeling └── Exploration Suggestions └── Suggest related queries ``` ## Quick Reference: Pattern Mapping | External Pattern | Our Implementation Approach | |-----------------|----------------------------| | GraphRAG communities | Pre-compute Leiden clusters in Oxigraph, store summaries in Qdrant | | ROGRAG dual-level | DSPy signature: entities (low) + relations (high) | | ROGRAG verification | DSPy signature: ArgumentVerifier before generation | | Zep bitemporal | Already have via GHCID history (extend SPARQL templates) | | HyperGraphRAG hyperedges | CIDOC-CRM events (crm:E10_Transfer_of_Custody) | | LightRAG simple graph | We use more complete graph, but can adopt "star graph sufficiency" thinking | ## Files to Modify | File | Changes | |------|---------| | `dspy_heritage_rag.py` | Add ArgumentVerifier, DualLevelExtractor, global_search mode | | `template_sparql.py` | Add temporal query templates | | `session_manager.py` | Add working memory evolution | | **New**: `community_indexer.py` | Leiden detection, summary generation | | **New**: `exploration_suggester.py` | Pattern-based query suggestions |