glam/docs/plan/external_design_patterns/02_comparison_matrix.md
kempersc 98c42bf272 Fix LinkML URI conflicts and generate RDF outputs
- Fix scope_note → finding_aid_scope_note in FindingAid.yaml
- Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead)
- Remove duplicate rico_record_set_type from class_metadata_slots.yaml
- Fix range types for equals_string compatibility (uriorcurie → string)
- Move class names from close_mappings to see_also in 10 RecordSetTypes files
- Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context
- Sync schemas to frontend/public/schemas/

Files: 1,151 changed (includes prior CustodianType migration)
2026-01-07 12:32:59 +01:00

4.3 KiB

GraphRAG Pattern Comparison Matrix

Purpose: Quick reference comparing our current implementation against external patterns.

Comparison Matrix

Capability Our Current State Microsoft GraphRAG ROGRAG Zep HyperGraphRAG LightRAG
Vector Search Qdrant Azure Cognitive Faiss Custom Sentence-BERT Faiss
Knowledge Graph Oxigraph (RDF) + TypeDB LanceDB TuGraph Neo4j Custom hypergraph Neo4j
LLM Orchestration DSPy Azure OpenAI Qwen OpenAI GPT-4o Various
Community Detection Not implemented Leiden algorithm None Dynamic clustering None Louvain
Temporal Modeling GHCID history Not built-in None Bitemporal (T, T') None None
Multi-hop Retrieval SPARQL traversal Graph expansion Logic form BFS Hyperedge walk Graph paths
Verification Layer Not implemented Claim extraction Argument checking None None None
N-ary Relations CIDOC-CRM events Binary only Binary only Binary only Hyperedges Binary only
Cost Optimization Semantic caching Community summaries Minimal graph Caching None Simple graph

Gap Analysis

What We Have (Strengths)

Feature Description Files
Template SPARQL 65% precision vs 10% LLM-only template_sparql.py
Semantic caching Redis-backed, reduces LLM calls semantic_cache.py
Cost tracking Token/latency monitoring cost_tracker.py
Ontology grounding LinkML schema validation schema_loader.py
Temporal tracking GHCID history with valid_from/to LinkML schema
Multi-hop SPARQL Graph traversal via SPARQL dspy_heritage_rag.py
Entity extraction Heritage-specific NER DSPy signatures

What We're Missing (Gaps)

Gap Priority Implementation Effort Benefit
Retrieval verification High Low (DSPy signature) Reduces hallucination
Community summaries High Medium (Leiden + indexing) Enables global questions
Dual-level extraction High Low (DSPy signature) Better entity+relation matching
Graph context enrichment Medium Low (extend retrieval) Fixes weak embeddings
Exploration suggestions Medium Medium (session memory) Improves user experience
Hypergraph memory Low High (new architecture) Multi-step reasoning

Implementation Priority

Priority 1 (This Sprint)
├── Retrieval Verification Layer
│   └── ArgumentVerifier DSPy signature
├── Dual-Level Entity Extraction
│   └── Extend HeritageEntityExtractor
└── Temporal SPARQL Templates
    └── Point-in-time query mode

Priority 2 (Next Sprint)
├── Community Detection Pipeline
│   └── Leiden algorithm on institution graph
├── Community Summary Indexing
│   └── Store in Qdrant with embeddings
└── Global Search Mode
    └── Search summaries for holistic queries

Priority 3 (Backlog)
├── Session Memory Evolution
│   └── HGMEM-style working memory
├── CIDOC-CRM Event Hyperedges
│   └── Rich custody transfer modeling
└── Exploration Suggestions
    └── Suggest related queries

Quick Reference: Pattern Mapping

External Pattern Our Implementation Approach
GraphRAG communities Pre-compute Leiden clusters in Oxigraph, store summaries in Qdrant
ROGRAG dual-level DSPy signature: entities (low) + relations (high)
ROGRAG verification DSPy signature: ArgumentVerifier before generation
Zep bitemporal Already have via GHCID history (extend SPARQL templates)
HyperGraphRAG hyperedges CIDOC-CRM events (crm:E10_Transfer_of_Custody)
LightRAG simple graph We use more complete graph, but can adopt "star graph sufficiency" thinking

Files to Modify

File Changes
dspy_heritage_rag.py Add ArgumentVerifier, DualLevelExtractor, global_search mode
template_sparql.py Add temporal query templates
session_manager.py Add working memory evolution
New: community_indexer.py Leiden detection, summary generation
New: exploration_suggester.py Pattern-based query suggestions