# Triplestore Setup: Oxigraph for RDF Visualizer **Date**: 2025-11-22 **Status**: Planning / Not Yet Implemented **Priority**: Phase 3 - Task 6 & 7 (Query Builder + SPARQL Execution) --- ## Overview The GLAM project will use **Oxigraph** as the RDF triplestore for the frontend visualization application. Oxigraph was explicitly chosen in the project planning documents (September 2025) as part of the knowledge graph infrastructure. --- ## Why Oxigraph? ### Selected Benefits From the project documentation (ontology conversation `2025-09-09`): > **Phase 4 - Knowledge Graph Infrastructure (120 hours):** > - TypeDB hypergraph database > - Oxigraph RDF triple store ### Advantages for This Project 1. **Lightweight & Embeddable** - Can run in-process with the frontend (via WASM) OR as a separate server - Minimal setup compared to Virtuoso, Blazegraph, or GraphDB - Perfect for prototype/demonstration use cases 2. **SPARQL 1.1 Compliant** - Full SPARQL query support (SELECT, CONSTRUCT, ASK, DESCRIBE) - Standards-compliant implementation - Compatible with all RDF formats (Turtle, N-Triples, RDF/XML, JSON-LD, etc.) 3. **Active Development** - Modern Rust implementation (high performance, memory safety) - GitHub: https://github.com/oxigraph/oxigraph - Actively maintained (2018-present) 4. **Multiple Deployment Options** - **Server mode**: HTTP API for remote queries - **Library mode**: Embedded in Python/Rust applications - **WASM mode**: In-browser triplestore (experimental) 5. **Cultural Heritage Sector Adoption** - Used in European cultural heritage projects - Recommended for prototype/exploratory projects (530-hour estimate research) - Proven for organizational data modeling --- ## Current Status ### Backend (Python) The Python backend includes RDF processing dependencies: ```toml # pyproject.toml - RDF and semantic web dependencies rdflib = "^7.0.0" # RDF parsing/serialization SPARQLWrapper = "^2.0.0" # SPARQL query execution ``` **Status**: ✅ Dependencies installed **Usage**: Backend can parse RDF files and generate SPARQL queries ### Frontend (TypeScript/React) **Status**: ❌ Not yet integrated **Todo**: Add Oxigraph client or SPARQL HTTP client --- ## Architecture Options ### Option 1: Oxigraph Server (Recommended) **Setup**: ```bash # Install Oxigraph server cargo install oxigraph_server # Or use Docker docker pull oxigraph/oxigraph # Start server with persistent storage oxigraph_server --location ./data/oxigraph --bind 127.0.0.1:7878 ``` **Frontend Integration**: ```typescript // src/lib/sparql/oxigraph-client.ts export async function querySparql(query: string): Promise { const response = await fetch('http://localhost:7878/query', { method: 'POST', headers: { 'Content-Type': 'application/sparql-query', 'Accept': 'application/sparql-results+json', }, body: query, }); return response.json(); } ``` **Pros**: - ✅ Separate process (doesn't block frontend) - ✅ Standard HTTP API (easy to integrate) - ✅ Supports large datasets - ✅ Can be shared across multiple clients **Cons**: - ⚠️ Requires running separate server process - ⚠️ Adds deployment complexity --- ### Option 2: Oxigraph WASM (Experimental) **Setup**: ```bash # Install Oxigraph WASM package npm install oxigraph ``` **Frontend Integration**: ```typescript // src/lib/sparql/oxigraph-wasm.ts import { Store } from 'oxigraph'; export class OxigraphStore { private store: Store; constructor() { this.store = new Store(); } async loadRDF(rdfData: string, format: string) { await this.store.load(rdfData, format, null, null); } async query(sparql: string) { return this.store.query(sparql); } } ``` **Pros**: - ✅ No server required (runs in browser) - ✅ Zero configuration - ✅ Perfect for demos and prototypes - ✅ Offline-capable **Cons**: - ⚠️ Experimental (WASM support still evolving) - ⚠️ Limited to browser memory (dataset size constraints) - ⚠️ May have performance limitations --- ### Option 3: Backend Proxy (Hybrid) **Setup**: - Backend runs Oxigraph server - Frontend queries backend API - Backend proxies to Oxigraph **Frontend Integration**: ```typescript // src/lib/sparql/backend-proxy.ts export async function querySparql(query: string): Promise { const response = await fetch('/api/sparql', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ query }), }); return response.json(); } ``` **Pros**: - ✅ Backend controls triplestore lifecycle - ✅ Can add authentication/authorization - ✅ Can preprocess/validate queries - ✅ Hides triplestore implementation details **Cons**: - ⚠️ Requires backend development - ⚠️ More complex architecture --- ## Recommendation **For Phase 3 (Task 6 & 7)**: Use **Option 1 (Oxigraph Server)** ### Rationale 1. **Proven Approach**: Standard HTTP API matches project architecture research 2. **Scalable**: Can handle Denmark dataset (43,429 triples) and beyond 3. **Simple Development**: Frontend can focus on SPARQL query building, not triplestore management 4. **Future-Proof**: Easy to swap for other triplestores (Virtuoso, Blazegraph) later if needed 5. **Docker-Ready**: Can be containerized for production deployment --- ## Implementation Plan ### Phase 3 - Task 6: Query Builder **Goal**: Visual SPARQL query builder UI **Deliverables**: 1. Query builder component with visual interface 2. Subject-Predicate-Object pattern builder 3. Filter conditions UI 4. SPARQL syntax preview (live syntax highlighting) 5. Query validation (syntax checking) 6. Query templates library (pre-built queries) **Oxigraph Integration**: - NOT REQUIRED yet - Query builder generates SPARQL strings only - Can test queries against static RDF files using `rdflib` (Python backend) **Estimated Time**: 4-5 hours --- ### Phase 3 - Task 7: SPARQL Query Execution **Goal**: Execute SPARQL queries against loaded RDF data **Deliverables**: 1. Oxigraph server setup and configuration 2. SPARQL HTTP client (`src/lib/sparql/oxigraph-client.ts`) 3. Query execution hook (`src/hooks/useSparqlQuery.ts`) 4. Results viewer component (table, JSON, graph views) 5. Export results (CSV, JSON, RDF) 6. Query performance metrics (execution time, result count) **Oxigraph Integration Steps**: #### Step 1: Install Oxigraph Server ```bash # Via Cargo cargo install oxigraph_server # OR via Docker docker pull oxigraph/oxigraph ``` #### Step 2: Start Oxigraph with Test Data ```bash # Load Denmark dataset (43,429 triples) oxigraph_server \ --location ./data/oxigraph \ --bind 127.0.0.1:7878 # In another terminal, load RDF data curl -X POST \ -H 'Content-Type: text/turtle' \ --data-binary @data/rdf/denmark_complete.ttl \ http://localhost:7878/store ``` #### Step 3: Create SPARQL Client Module ```typescript // src/lib/sparql/oxigraph-client.ts export interface SparqlResult { head: { vars: string[] }; results: { bindings: Array>; }; } export async function executeSparql(query: string): Promise { const response = await fetch('http://localhost:7878/query', { method: 'POST', headers: { 'Content-Type': 'application/sparql-query', 'Accept': 'application/sparql-results+json', }, body: query, }); if (!response.ok) { throw new Error(`SPARQL query failed: ${response.statusText}`); } return response.json(); } ``` #### Step 4: Create Query Execution Hook ```typescript // src/hooks/useSparqlQuery.ts import { useState, useCallback } from 'react'; import { executeSparql, type SparqlResult } from '../lib/sparql/oxigraph-client'; export function useSparqlQuery() { const [results, setResults] = useState(null); const [isLoading, setIsLoading] = useState(false); const [error, setError] = useState(null); const [executionTime, setExecutionTime] = useState(0); const executeQuery = useCallback(async (query: string) => { setIsLoading(true); setError(null); const startTime = performance.now(); try { const result = await executeSparql(query); setResults(result); setExecutionTime(performance.now() - startTime); } catch (err) { setError(err instanceof Error ? err.message : 'Query execution failed'); setResults(null); } finally { setIsLoading(false); } }, []); return { results, isLoading, error, executionTime, executeQuery }; } ``` #### Step 5: Create Results Viewer Component ```typescript // src/components/query/ResultsViewer.tsx import type { SparqlResult } from '../../lib/sparql/oxigraph-client'; interface ResultsViewerProps { results: SparqlResult; executionTime: number; } export function ResultsViewer({ results, executionTime }: ResultsViewerProps) { const { head, results: data } = results; const bindings = data.bindings; return (
{bindings.length} results Execution time: {executionTime.toFixed(2)}ms
{head.vars.map(variable => ( ))} {bindings.map((binding, index) => ( {head.vars.map(variable => ( ))} ))}
{variable}
{binding[variable]?.value || '-'}
); } ``` **Estimated Time**: 6-8 hours --- ## Testing Strategy ### Unit Tests ```typescript // tests/unit/sparql-client.test.ts import { describe, it, expect, vi } from 'vitest'; import { executeSparql } from '../../src/lib/sparql/oxigraph-client'; describe('executeSparql', () => { it('should execute SELECT query and return results', async () => { global.fetch = vi.fn().mockResolvedValue({ ok: true, json: async () => ({ head: { vars: ['subject'] }, results: { bindings: [ { subject: { type: 'uri', value: 'http://example.org/inst1' } }, ], }, }), }); const query = 'SELECT ?subject WHERE { ?subject a schema:Museum }'; const results = await executeSparql(query); expect(results.results.bindings).toHaveLength(1); }); it('should throw error on failed query', async () => { global.fetch = vi.fn().mockResolvedValue({ ok: false, statusText: 'Bad Request', }); const query = 'INVALID SPARQL'; await expect(executeSparql(query)).rejects.toThrow('SPARQL query failed'); }); }); ``` ### Integration Tests ```typescript // tests/integration/oxigraph.test.ts import { describe, it, expect, beforeAll } from 'vitest'; import { executeSparql } from '../../src/lib/sparql/oxigraph-client'; describe('Oxigraph Integration', () => { beforeAll(async () => { // Assumes Oxigraph server is running on localhost:7878 // with test data loaded }); it('should query loaded RDF data', async () => { const query = ` PREFIX schema: SELECT ?museum WHERE { ?museum a schema:Museum . } LIMIT 10 `; const results = await executeSparql(query); expect(results.results.bindings.length).toBeGreaterThan(0); }); }); ``` --- ## Sample SPARQL Queries ### Query 1: Find All Museums ```sparql PREFIX schema: SELECT ?museum ?name WHERE { ?museum a schema:Museum . ?museum schema:name ?name . } ORDER BY ?name LIMIT 100 ``` ### Query 2: Count Institutions by Type ```sparql PREFIX schema: PREFIX cpov: SELECT ?type (COUNT(?institution) AS ?count) WHERE { ?institution a ?type . FILTER(?type IN (schema:Museum, schema:Library, schema:ArchiveOrganization)) } GROUP BY ?type ORDER BY DESC(?count) ``` ### Query 3: Find Institutions in a City ```sparql PREFIX schema: SELECT ?institution ?name ?address WHERE { ?institution schema:name ?name . ?institution schema:address ?addrNode . ?addrNode schema:addressLocality "København K" . ?addrNode schema:streetAddress ?address . } ORDER BY ?name ``` ### Query 4: Find Institutions with Wikidata Links ```sparql PREFIX owl: PREFIX schema: SELECT ?institution ?name ?wikidataURI WHERE { ?institution schema:name ?name . ?institution owl:sameAs ?wikidataURI . FILTER(STRSTARTS(STR(?wikidataURI), "http://www.wikidata.org/entity/Q")) } LIMIT 100 ``` --- ## Configuration ### Oxigraph Server Config ```bash # Start Oxigraph server with configuration oxigraph_server \ --location ./data/oxigraph \ # Data directory --bind 0.0.0.0:7878 \ # Bind address --cors "*" \ # CORS for frontend development --readonly false # Allow data loading ``` ### Environment Variables ```bash # .env.local (frontend) VITE_SPARQL_ENDPOINT=http://localhost:7878 VITE_SPARQL_QUERY_TIMEOUT=30000 # 30 seconds ``` ### TypeScript Config ```typescript // src/config/sparql.ts export const SPARQL_CONFIG = { endpoint: import.meta.env.VITE_SPARQL_ENDPOINT || 'http://localhost:7878', timeout: Number(import.meta.env.VITE_SPARQL_QUERY_TIMEOUT) || 30000, corsEnabled: true, }; ``` --- ## Performance Considerations ### Dataset Size | Dataset | Triples | Query Time | Notes | |---------|---------|------------|-------| | Denmark (Current) | 43,429 | <100ms | Fast for prototyping | | Netherlands (Estimated) | ~500,000 | <500ms | Should be manageable | | Global (Future) | 5-10M | 1-5s | May need optimization | ### Optimization Strategies 1. **Indexing**: Oxigraph automatically indexes triples (no manual configuration) 2. **Query Limits**: Always use `LIMIT` clauses during development 3. **Caching**: Cache frequent query results in frontend (localStorage) 4. **Pagination**: Implement OFFSET/LIMIT for large result sets 5. **Prefixes**: Use PREFIX declarations to reduce query size --- ## Deployment ### Development ```bash # Start Oxigraph locally oxigraph_server --location ./data/oxigraph --bind 127.0.0.1:7878 # Load test data curl -X POST \ -H 'Content-Type: text/turtle' \ --data-binary @data/rdf/denmark_complete.ttl \ http://localhost:7878/store ``` ### Docker (Production) ```dockerfile # Dockerfile.oxigraph FROM oxigraph/oxigraph:latest # Copy RDF data to container COPY data/rdf /data/rdf # Expose SPARQL endpoint EXPOSE 7878 # Start server with persistent storage CMD ["--location", "/data/oxigraph", "--bind", "0.0.0.0:7878"] ``` ```yaml # docker-compose.yml version: '3.8' services: oxigraph: image: oxigraph/oxigraph:latest ports: - "7878:7878" volumes: - ./data/oxigraph:/data/oxigraph - ./data/rdf:/data/rdf:ro command: --location /data/oxigraph --bind 0.0.0.0:7878 --cors "*" frontend: build: ./frontend ports: - "5173:5173" environment: - VITE_SPARQL_ENDPOINT=http://oxigraph:7878 depends_on: - oxigraph ``` --- ## Alternatives Considered ### Virtuoso - **Pros**: Very mature, enterprise-grade, excellent performance - **Cons**: Complex setup, heavyweight (2GB+ memory), overkill for prototype ### Blazegraph - **Pros**: Full SPARQL 1.1, good documentation - **Cons**: Java dependency, discontinued (last release 2019) ### Apache Jena Fuseki - **Pros**: Mature, full-featured, active development - **Cons**: Java dependency, more complex than Oxigraph ### GraphDB - **Pros**: Commercial support, advanced reasoning, SHACL validation - **Cons**: Proprietary (free edition has limits), complex setup **Decision**: Oxigraph wins for simplicity, modern stack (Rust), and cultural heritage sector adoption. --- ## Next Steps ### Immediate (Task 6 - Query Builder) 1. ✅ Document Oxigraph decision (this file) 2. ⏳ Implement visual query builder UI (4-5 hours) 3. ⏳ Create SPARQL syntax validator 4. ⏳ Build query templates library ### Next Session (Task 7 - SPARQL Execution) 1. ⏳ Install Oxigraph server (local + Docker) 2. ⏳ Load Denmark dataset (43,429 triples) 3. ⏳ Create SPARQL client module 4. ⏳ Implement query execution hook 5. ⏳ Build results viewer component 6. ⏳ Add export functionality (CSV, JSON, RDF) 7. ⏳ Write integration tests --- ## References ### Oxigraph Documentation - **GitHub**: https://github.com/oxigraph/oxigraph - **Architecture**: https://github.com/oxigraph/oxigraph/wiki/Architecture - **HTTP API**: https://github.com/oxigraph/oxigraph/wiki/HTTP-API ### Project Documentation - **Planning**: `ontology/2025-09-09T08-31-07-*-Linked_Data_Cultural_Heritage_Project.json` - **RDF Datasets**: `data/rdf/README.md` - **Schema**: `schemas/20251121/rdf/` (8 RDF formats) ### SPARQL Resources - **W3C SPARQL 1.1**: https://www.w3.org/TR/sparql11-query/ - **SPARQL by Example**: https://www.w3.org/2009/Talks/0615-qbe/ - **RDF/SPARQL Tutorial**: https://www.linkeddatatools.com/querying-semantic-data --- **Status**: Planning Complete ✅ **Next**: Implement Query Builder (Phase 3 - Task 6) **Estimated Timeline**: 10-13 hours total (Query Builder 4-5h + SPARQL Execution 6-8h) **Last Updated**: 2025-11-22 **Author**: OpenCode AI Agent