- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams. - Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams. - Added two new PlantUML files for custodian multi-aspect diagrams.
17 KiB
Triplestore Setup: Oxigraph for RDF Visualizer
Date: 2025-11-22
Status: Planning / Not Yet Implemented
Priority: Phase 3 - Task 6 & 7 (Query Builder + SPARQL Execution)
Overview
The GLAM project will use Oxigraph as the RDF triplestore for the frontend visualization application. Oxigraph was explicitly chosen in the project planning documents (September 2025) as part of the knowledge graph infrastructure.
Why Oxigraph?
Selected Benefits
From the project documentation (ontology conversation 2025-09-09):
Phase 4 - Knowledge Graph Infrastructure (120 hours):
- TypeDB hypergraph database
- Oxigraph RDF triple store
Advantages for This Project
-
Lightweight & Embeddable
- Can run in-process with the frontend (via WASM) OR as a separate server
- Minimal setup compared to Virtuoso, Blazegraph, or GraphDB
- Perfect for prototype/demonstration use cases
-
SPARQL 1.1 Compliant
- Full SPARQL query support (SELECT, CONSTRUCT, ASK, DESCRIBE)
- Standards-compliant implementation
- Compatible with all RDF formats (Turtle, N-Triples, RDF/XML, JSON-LD, etc.)
-
Active Development
- Modern Rust implementation (high performance, memory safety)
- GitHub: https://github.com/oxigraph/oxigraph
- Actively maintained (2018-present)
-
Multiple Deployment Options
- Server mode: HTTP API for remote queries
- Library mode: Embedded in Python/Rust applications
- WASM mode: In-browser triplestore (experimental)
-
Cultural Heritage Sector Adoption
- Used in European cultural heritage projects
- Recommended for prototype/exploratory projects (530-hour estimate research)
- Proven for organizational data modeling
Current Status
Backend (Python)
The Python backend includes RDF processing dependencies:
# pyproject.toml - RDF and semantic web dependencies
rdflib = "^7.0.0" # RDF parsing/serialization
SPARQLWrapper = "^2.0.0" # SPARQL query execution
Status: ✅ Dependencies installed
Usage: Backend can parse RDF files and generate SPARQL queries
Frontend (TypeScript/React)
Status: ❌ Not yet integrated
Todo: Add Oxigraph client or SPARQL HTTP client
Architecture Options
Option 1: Oxigraph Server (Recommended)
Setup:
# Install Oxigraph server
cargo install oxigraph_server
# Or use Docker
docker pull oxigraph/oxigraph
# Start server with persistent storage
oxigraph_server --location ./data/oxigraph --bind 127.0.0.1:7878
Frontend Integration:
// src/lib/sparql/oxigraph-client.ts
export async function querySparql(query: string): Promise<any> {
const response = await fetch('http://localhost:7878/query', {
method: 'POST',
headers: {
'Content-Type': 'application/sparql-query',
'Accept': 'application/sparql-results+json',
},
body: query,
});
return response.json();
}
Pros:
- ✅ Separate process (doesn't block frontend)
- ✅ Standard HTTP API (easy to integrate)
- ✅ Supports large datasets
- ✅ Can be shared across multiple clients
Cons:
- ⚠️ Requires running separate server process
- ⚠️ Adds deployment complexity
Option 2: Oxigraph WASM (Experimental)
Setup:
# Install Oxigraph WASM package
npm install oxigraph
Frontend Integration:
// src/lib/sparql/oxigraph-wasm.ts
import { Store } from 'oxigraph';
export class OxigraphStore {
private store: Store;
constructor() {
this.store = new Store();
}
async loadRDF(rdfData: string, format: string) {
await this.store.load(rdfData, format, null, null);
}
async query(sparql: string) {
return this.store.query(sparql);
}
}
Pros:
- ✅ No server required (runs in browser)
- ✅ Zero configuration
- ✅ Perfect for demos and prototypes
- ✅ Offline-capable
Cons:
- ⚠️ Experimental (WASM support still evolving)
- ⚠️ Limited to browser memory (dataset size constraints)
- ⚠️ May have performance limitations
Option 3: Backend Proxy (Hybrid)
Setup:
- Backend runs Oxigraph server
- Frontend queries backend API
- Backend proxies to Oxigraph
Frontend Integration:
// src/lib/sparql/backend-proxy.ts
export async function querySparql(query: string): Promise<any> {
const response = await fetch('/api/sparql', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query }),
});
return response.json();
}
Pros:
- ✅ Backend controls triplestore lifecycle
- ✅ Can add authentication/authorization
- ✅ Can preprocess/validate queries
- ✅ Hides triplestore implementation details
Cons:
- ⚠️ Requires backend development
- ⚠️ More complex architecture
Recommendation
For Phase 3 (Task 6 & 7): Use Option 1 (Oxigraph Server)
Rationale
- Proven Approach: Standard HTTP API matches project architecture research
- Scalable: Can handle Denmark dataset (43,429 triples) and beyond
- Simple Development: Frontend can focus on SPARQL query building, not triplestore management
- Future-Proof: Easy to swap for other triplestores (Virtuoso, Blazegraph) later if needed
- Docker-Ready: Can be containerized for production deployment
Implementation Plan
Phase 3 - Task 6: Query Builder
Goal: Visual SPARQL query builder UI
Deliverables:
- Query builder component with visual interface
- Subject-Predicate-Object pattern builder
- Filter conditions UI
- SPARQL syntax preview (live syntax highlighting)
- Query validation (syntax checking)
- Query templates library (pre-built queries)
Oxigraph Integration:
- NOT REQUIRED yet
- Query builder generates SPARQL strings only
- Can test queries against static RDF files using
rdflib(Python backend)
Estimated Time: 4-5 hours
Phase 3 - Task 7: SPARQL Query Execution
Goal: Execute SPARQL queries against loaded RDF data
Deliverables:
- Oxigraph server setup and configuration
- SPARQL HTTP client (
src/lib/sparql/oxigraph-client.ts) - Query execution hook (
src/hooks/useSparqlQuery.ts) - Results viewer component (table, JSON, graph views)
- Export results (CSV, JSON, RDF)
- Query performance metrics (execution time, result count)
Oxigraph Integration Steps:
Step 1: Install Oxigraph Server
# Via Cargo
cargo install oxigraph_server
# OR via Docker
docker pull oxigraph/oxigraph
Step 2: Start Oxigraph with Test Data
# Load Denmark dataset (43,429 triples)
oxigraph_server \
--location ./data/oxigraph \
--bind 127.0.0.1:7878
# In another terminal, load RDF data
curl -X POST \
-H 'Content-Type: text/turtle' \
--data-binary @data/rdf/denmark_complete.ttl \
http://localhost:7878/store
Step 3: Create SPARQL Client Module
// src/lib/sparql/oxigraph-client.ts
export interface SparqlResult {
head: { vars: string[] };
results: {
bindings: Array<Record<string, { type: string; value: string }>>;
};
}
export async function executeSparql(query: string): Promise<SparqlResult> {
const response = await fetch('http://localhost:7878/query', {
method: 'POST',
headers: {
'Content-Type': 'application/sparql-query',
'Accept': 'application/sparql-results+json',
},
body: query,
});
if (!response.ok) {
throw new Error(`SPARQL query failed: ${response.statusText}`);
}
return response.json();
}
Step 4: Create Query Execution Hook
// src/hooks/useSparqlQuery.ts
import { useState, useCallback } from 'react';
import { executeSparql, type SparqlResult } from '../lib/sparql/oxigraph-client';
export function useSparqlQuery() {
const [results, setResults] = useState<SparqlResult | null>(null);
const [isLoading, setIsLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const [executionTime, setExecutionTime] = useState<number>(0);
const executeQuery = useCallback(async (query: string) => {
setIsLoading(true);
setError(null);
const startTime = performance.now();
try {
const result = await executeSparql(query);
setResults(result);
setExecutionTime(performance.now() - startTime);
} catch (err) {
setError(err instanceof Error ? err.message : 'Query execution failed');
setResults(null);
} finally {
setIsLoading(false);
}
}, []);
return { results, isLoading, error, executionTime, executeQuery };
}
Step 5: Create Results Viewer Component
// src/components/query/ResultsViewer.tsx
import type { SparqlResult } from '../../lib/sparql/oxigraph-client';
interface ResultsViewerProps {
results: SparqlResult;
executionTime: number;
}
export function ResultsViewer({ results, executionTime }: ResultsViewerProps) {
const { head, results: data } = results;
const bindings = data.bindings;
return (
<div className="results-viewer">
<div className="results-header">
<span>{bindings.length} results</span>
<span>Execution time: {executionTime.toFixed(2)}ms</span>
</div>
<table className="results-table">
<thead>
<tr>
{head.vars.map(variable => (
<th key={variable}>{variable}</th>
))}
</tr>
</thead>
<tbody>
{bindings.map((binding, index) => (
<tr key={index}>
{head.vars.map(variable => (
<td key={variable}>
{binding[variable]?.value || '-'}
</td>
))}
</tr>
))}
</tbody>
</table>
</div>
);
}
Estimated Time: 6-8 hours
Testing Strategy
Unit Tests
// tests/unit/sparql-client.test.ts
import { describe, it, expect, vi } from 'vitest';
import { executeSparql } from '../../src/lib/sparql/oxigraph-client';
describe('executeSparql', () => {
it('should execute SELECT query and return results', async () => {
global.fetch = vi.fn().mockResolvedValue({
ok: true,
json: async () => ({
head: { vars: ['subject'] },
results: {
bindings: [
{ subject: { type: 'uri', value: 'http://example.org/inst1' } },
],
},
}),
});
const query = 'SELECT ?subject WHERE { ?subject a schema:Museum }';
const results = await executeSparql(query);
expect(results.results.bindings).toHaveLength(1);
});
it('should throw error on failed query', async () => {
global.fetch = vi.fn().mockResolvedValue({
ok: false,
statusText: 'Bad Request',
});
const query = 'INVALID SPARQL';
await expect(executeSparql(query)).rejects.toThrow('SPARQL query failed');
});
});
Integration Tests
// tests/integration/oxigraph.test.ts
import { describe, it, expect, beforeAll } from 'vitest';
import { executeSparql } from '../../src/lib/sparql/oxigraph-client';
describe('Oxigraph Integration', () => {
beforeAll(async () => {
// Assumes Oxigraph server is running on localhost:7878
// with test data loaded
});
it('should query loaded RDF data', async () => {
const query = `
PREFIX schema: <http://schema.org/>
SELECT ?museum WHERE {
?museum a schema:Museum .
} LIMIT 10
`;
const results = await executeSparql(query);
expect(results.results.bindings.length).toBeGreaterThan(0);
});
});
Sample SPARQL Queries
Query 1: Find All Museums
PREFIX schema: <http://schema.org/>
SELECT ?museum ?name WHERE {
?museum a schema:Museum .
?museum schema:name ?name .
}
ORDER BY ?name
LIMIT 100
Query 2: Count Institutions by Type
PREFIX schema: <http://schema.org/>
PREFIX cpov: <http://data.europa.eu/m8g/>
SELECT ?type (COUNT(?institution) AS ?count) WHERE {
?institution a ?type .
FILTER(?type IN (schema:Museum, schema:Library, schema:ArchiveOrganization))
}
GROUP BY ?type
ORDER BY DESC(?count)
Query 3: Find Institutions in a City
PREFIX schema: <http://schema.org/>
SELECT ?institution ?name ?address WHERE {
?institution schema:name ?name .
?institution schema:address ?addrNode .
?addrNode schema:addressLocality "København K" .
?addrNode schema:streetAddress ?address .
}
ORDER BY ?name
Query 4: Find Institutions with Wikidata Links
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX schema: <http://schema.org/>
SELECT ?institution ?name ?wikidataURI WHERE {
?institution schema:name ?name .
?institution owl:sameAs ?wikidataURI .
FILTER(STRSTARTS(STR(?wikidataURI), "http://www.wikidata.org/entity/Q"))
}
LIMIT 100
Configuration
Oxigraph Server Config
# Start Oxigraph server with configuration
oxigraph_server \
--location ./data/oxigraph \ # Data directory
--bind 0.0.0.0:7878 \ # Bind address
--cors "*" \ # CORS for frontend development
--readonly false # Allow data loading
Environment Variables
# .env.local (frontend)
VITE_SPARQL_ENDPOINT=http://localhost:7878
VITE_SPARQL_QUERY_TIMEOUT=30000 # 30 seconds
TypeScript Config
// src/config/sparql.ts
export const SPARQL_CONFIG = {
endpoint: import.meta.env.VITE_SPARQL_ENDPOINT || 'http://localhost:7878',
timeout: Number(import.meta.env.VITE_SPARQL_QUERY_TIMEOUT) || 30000,
corsEnabled: true,
};
Performance Considerations
Dataset Size
| Dataset | Triples | Query Time | Notes |
|---|---|---|---|
| Denmark (Current) | 43,429 | <100ms | Fast for prototyping |
| Netherlands (Estimated) | ~500,000 | <500ms | Should be manageable |
| Global (Future) | 5-10M | 1-5s | May need optimization |
Optimization Strategies
- Indexing: Oxigraph automatically indexes triples (no manual configuration)
- Query Limits: Always use
LIMITclauses during development - Caching: Cache frequent query results in frontend (localStorage)
- Pagination: Implement OFFSET/LIMIT for large result sets
- Prefixes: Use PREFIX declarations to reduce query size
Deployment
Development
# Start Oxigraph locally
oxigraph_server --location ./data/oxigraph --bind 127.0.0.1:7878
# Load test data
curl -X POST \
-H 'Content-Type: text/turtle' \
--data-binary @data/rdf/denmark_complete.ttl \
http://localhost:7878/store
Docker (Production)
# Dockerfile.oxigraph
FROM oxigraph/oxigraph:latest
# Copy RDF data to container
COPY data/rdf /data/rdf
# Expose SPARQL endpoint
EXPOSE 7878
# Start server with persistent storage
CMD ["--location", "/data/oxigraph", "--bind", "0.0.0.0:7878"]
# docker-compose.yml
version: '3.8'
services:
oxigraph:
image: oxigraph/oxigraph:latest
ports:
- "7878:7878"
volumes:
- ./data/oxigraph:/data/oxigraph
- ./data/rdf:/data/rdf:ro
command: --location /data/oxigraph --bind 0.0.0.0:7878 --cors "*"
frontend:
build: ./frontend
ports:
- "5173:5173"
environment:
- VITE_SPARQL_ENDPOINT=http://oxigraph:7878
depends_on:
- oxigraph
Alternatives Considered
Virtuoso
- Pros: Very mature, enterprise-grade, excellent performance
- Cons: Complex setup, heavyweight (2GB+ memory), overkill for prototype
Blazegraph
- Pros: Full SPARQL 1.1, good documentation
- Cons: Java dependency, discontinued (last release 2019)
Apache Jena Fuseki
- Pros: Mature, full-featured, active development
- Cons: Java dependency, more complex than Oxigraph
GraphDB
- Pros: Commercial support, advanced reasoning, SHACL validation
- Cons: Proprietary (free edition has limits), complex setup
Decision: Oxigraph wins for simplicity, modern stack (Rust), and cultural heritage sector adoption.
Next Steps
Immediate (Task 6 - Query Builder)
- ✅ Document Oxigraph decision (this file)
- ⏳ Implement visual query builder UI (4-5 hours)
- ⏳ Create SPARQL syntax validator
- ⏳ Build query templates library
Next Session (Task 7 - SPARQL Execution)
- ⏳ Install Oxigraph server (local + Docker)
- ⏳ Load Denmark dataset (43,429 triples)
- ⏳ Create SPARQL client module
- ⏳ Implement query execution hook
- ⏳ Build results viewer component
- ⏳ Add export functionality (CSV, JSON, RDF)
- ⏳ Write integration tests
References
Oxigraph Documentation
- GitHub: https://github.com/oxigraph/oxigraph
- Architecture: https://github.com/oxigraph/oxigraph/wiki/Architecture
- HTTP API: https://github.com/oxigraph/oxigraph/wiki/HTTP-API
Project Documentation
- Planning:
ontology/2025-09-09T08-31-07-*-Linked_Data_Cultural_Heritage_Project.json - RDF Datasets:
data/rdf/README.md - Schema:
schemas/20251121/rdf/(8 RDF formats)
SPARQL Resources
- W3C SPARQL 1.1: https://www.w3.org/TR/sparql11-query/
- SPARQL by Example: https://www.w3.org/2009/Talks/0615-qbe/
- RDF/SPARQL Tutorial: https://www.linkeddatatools.com/querying-semantic-data
Status: Planning Complete ✅
Next: Implement Query Builder (Phase 3 - Task 6)
Estimated Timeline: 10-13 hours total (Query Builder 4-5h + SPARQL Execution 6-8h)
Last Updated: 2025-11-22
Author: OpenCode AI Agent