# GraphRAG Pattern Comparison Matrix

**Purpose**: Quick reference comparing our current implementation against external patterns.

## Comparison Matrix

| Capability | Our Current State | Microsoft GraphRAG | ROGRAG | Zep | HyperGraphRAG | LightRAG |
|------------|-------------------|-------------------|--------|-----|---------------|----------|
| **Vector Search** | Qdrant | Azure Cognitive | Faiss | Custom | Sentence-BERT | Faiss |
| **Knowledge Graph** | Oxigraph (RDF) + TypeDB | LanceDB | TuGraph | Neo4j | Custom hypergraph | Neo4j |
| **LLM Orchestration** | DSPy | Azure OpenAI | Qwen | OpenAI | GPT-4o | Various |
| **Community Detection** | Not implemented | Leiden algorithm | None | Dynamic clustering | None | Louvain |
| **Temporal Modeling** | GHCID history | Not built-in | None | Bitemporal (T, T') | None | None |
| **Multi-hop Retrieval** | SPARQL traversal | Graph expansion | Logic form | BFS | Hyperedge walk | Graph paths |
| **Verification Layer** | Not implemented | Claim extraction | Argument checking | None | None | None |
| **N-ary Relations** | CIDOC-CRM events | Binary only | Binary only | Binary only | Hyperedges | Binary only |
| **Cost Optimization** | Semantic caching | Community summaries | Minimal graph | Caching | None | Simple graph |

## Gap Analysis

### What We Have (Strengths)

| Feature | Description | Files |
|---------|-------------|-------|
| Template SPARQL | 65% precision vs 10% LLM-only | `template_sparql.py` |
| Semantic caching | Redis-backed, reduces LLM calls | `semantic_cache.py` |
| Cost tracking | Token/latency monitoring | `cost_tracker.py` |
| Ontology grounding | LinkML schema validation | `schema_loader.py` |
| Temporal tracking | GHCID history with valid_from/to | LinkML schema |
| Multi-hop SPARQL | Graph traversal via SPARQL | `dspy_heritage_rag.py` |
| Entity extraction | Heritage-specific NER | DSPy signatures |

### What We're Missing (Gaps)

| Gap | Priority | Implementation Effort | Benefit |
|-----|----------|----------------------|---------|
| Retrieval verification | High | Low (DSPy signature) | Reduces hallucination |
| Community summaries | High | Medium (Leiden + indexing) | Enables global questions |
| Dual-level extraction | High | Low (DSPy signature) | Better entity+relation matching |
| Graph context enrichment | Medium | Low (extend retrieval) | Fixes weak embeddings |
| Exploration suggestions | Medium | Medium (session memory) | Improves user experience |
| Hypergraph memory | Low | High (new architecture) | Multi-step reasoning |

## Implementation Priority

```
Priority 1 (This Sprint)
├── Retrieval Verification Layer
│   └── ArgumentVerifier DSPy signature
├── Dual-Level Entity Extraction
│   └── Extend HeritageEntityExtractor
└── Temporal SPARQL Templates
    └── Point-in-time query mode

Priority 2 (Next Sprint)
├── Community Detection Pipeline
│   └── Leiden algorithm on institution graph
├── Community Summary Indexing
│   └── Store in Qdrant with embeddings
└── Global Search Mode
    └── Search summaries for holistic queries

Priority 3 (Backlog)
├── Session Memory Evolution
│   └── HGMEM-style working memory
├── CIDOC-CRM Event Hyperedges
│   └── Rich custody transfer modeling
└── Exploration Suggestions
    └── Suggest related queries
```

## Quick Reference: Pattern Mapping

| External Pattern | Our Implementation Approach |
|-----------------|----------------------------|
| GraphRAG communities | Pre-compute Leiden clusters in Oxigraph, store summaries in Qdrant |
| ROGRAG dual-level | DSPy signature: entities (low) + relations (high) |
| ROGRAG verification | DSPy signature: ArgumentVerifier before generation |
| Zep bitemporal | Already have via GHCID history (extend SPARQL templates) |
| HyperGraphRAG hyperedges | CIDOC-CRM events (crm:E10_Transfer_of_Custody) |
| LightRAG simple graph | We use more complete graph, but can adopt "star graph sufficiency" thinking |

## Files to Modify

| File | Changes |
|------|---------|
| `dspy_heritage_rag.py` | Add ArgumentVerifier, DualLevelExtractor, global_search mode |
| `template_sparql.py` | Add temporal query templates |
| `session_manager.py` | Add working memory evolution |
| **New**: `community_indexer.py` | Leiden detection, summary generation |
| **New**: `exploration_suggester.py` | Pattern-based query suggestions |