glam/frontend/TRIPLESTORE_OXIGRAPH_SETUP.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

17 KiB

Triplestore Setup: Oxigraph for RDF Visualizer

Date: 2025-11-22
Status: Planning / Not Yet Implemented
Priority: Phase 3 - Task 6 & 7 (Query Builder + SPARQL Execution)


Overview

The GLAM project will use Oxigraph as the RDF triplestore for the frontend visualization application. Oxigraph was explicitly chosen in the project planning documents (September 2025) as part of the knowledge graph infrastructure.


Why Oxigraph?

Selected Benefits

From the project documentation (ontology conversation 2025-09-09):

Phase 4 - Knowledge Graph Infrastructure (120 hours):

  • TypeDB hypergraph database
  • Oxigraph RDF triple store

Advantages for This Project

  1. Lightweight & Embeddable

    • Can run in-process with the frontend (via WASM) OR as a separate server
    • Minimal setup compared to Virtuoso, Blazegraph, or GraphDB
    • Perfect for prototype/demonstration use cases
  2. SPARQL 1.1 Compliant

    • Full SPARQL query support (SELECT, CONSTRUCT, ASK, DESCRIBE)
    • Standards-compliant implementation
    • Compatible with all RDF formats (Turtle, N-Triples, RDF/XML, JSON-LD, etc.)
  3. Active Development

  4. Multiple Deployment Options

    • Server mode: HTTP API for remote queries
    • Library mode: Embedded in Python/Rust applications
    • WASM mode: In-browser triplestore (experimental)
  5. Cultural Heritage Sector Adoption

    • Used in European cultural heritage projects
    • Recommended for prototype/exploratory projects (530-hour estimate research)
    • Proven for organizational data modeling

Current Status

Backend (Python)

The Python backend includes RDF processing dependencies:

# pyproject.toml - RDF and semantic web dependencies
rdflib = "^7.0.0"           # RDF parsing/serialization
SPARQLWrapper = "^2.0.0"    # SPARQL query execution

Status: Dependencies installed
Usage: Backend can parse RDF files and generate SPARQL queries

Frontend (TypeScript/React)

Status: Not yet integrated
Todo: Add Oxigraph client or SPARQL HTTP client


Architecture Options

Setup:

# Install Oxigraph server
cargo install oxigraph_server

# Or use Docker
docker pull oxigraph/oxigraph

# Start server with persistent storage
oxigraph_server --location ./data/oxigraph --bind 127.0.0.1:7878

Frontend Integration:

// src/lib/sparql/oxigraph-client.ts
export async function querySparql(query: string): Promise<any> {
  const response = await fetch('http://localhost:7878/query', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/sparql-query',
      'Accept': 'application/sparql-results+json',
    },
    body: query,
  });
  
  return response.json();
}

Pros:

  • Separate process (doesn't block frontend)
  • Standard HTTP API (easy to integrate)
  • Supports large datasets
  • Can be shared across multiple clients

Cons:

  • ⚠️ Requires running separate server process
  • ⚠️ Adds deployment complexity

Option 2: Oxigraph WASM (Experimental)

Setup:

# Install Oxigraph WASM package
npm install oxigraph

Frontend Integration:

// src/lib/sparql/oxigraph-wasm.ts
import { Store } from 'oxigraph';

export class OxigraphStore {
  private store: Store;
  
  constructor() {
    this.store = new Store();
  }
  
  async loadRDF(rdfData: string, format: string) {
    await this.store.load(rdfData, format, null, null);
  }
  
  async query(sparql: string) {
    return this.store.query(sparql);
  }
}

Pros:

  • No server required (runs in browser)
  • Zero configuration
  • Perfect for demos and prototypes
  • Offline-capable

Cons:

  • ⚠️ Experimental (WASM support still evolving)
  • ⚠️ Limited to browser memory (dataset size constraints)
  • ⚠️ May have performance limitations

Option 3: Backend Proxy (Hybrid)

Setup:

  • Backend runs Oxigraph server
  • Frontend queries backend API
  • Backend proxies to Oxigraph

Frontend Integration:

// src/lib/sparql/backend-proxy.ts
export async function querySparql(query: string): Promise<any> {
  const response = await fetch('/api/sparql', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ query }),
  });
  
  return response.json();
}

Pros:

  • Backend controls triplestore lifecycle
  • Can add authentication/authorization
  • Can preprocess/validate queries
  • Hides triplestore implementation details

Cons:

  • ⚠️ Requires backend development
  • ⚠️ More complex architecture

Recommendation

For Phase 3 (Task 6 & 7): Use Option 1 (Oxigraph Server)

Rationale

  1. Proven Approach: Standard HTTP API matches project architecture research
  2. Scalable: Can handle Denmark dataset (43,429 triples) and beyond
  3. Simple Development: Frontend can focus on SPARQL query building, not triplestore management
  4. Future-Proof: Easy to swap for other triplestores (Virtuoso, Blazegraph) later if needed
  5. Docker-Ready: Can be containerized for production deployment

Implementation Plan

Phase 3 - Task 6: Query Builder

Goal: Visual SPARQL query builder UI

Deliverables:

  1. Query builder component with visual interface
  2. Subject-Predicate-Object pattern builder
  3. Filter conditions UI
  4. SPARQL syntax preview (live syntax highlighting)
  5. Query validation (syntax checking)
  6. Query templates library (pre-built queries)

Oxigraph Integration:

  • NOT REQUIRED yet
  • Query builder generates SPARQL strings only
  • Can test queries against static RDF files using rdflib (Python backend)

Estimated Time: 4-5 hours


Phase 3 - Task 7: SPARQL Query Execution

Goal: Execute SPARQL queries against loaded RDF data

Deliverables:

  1. Oxigraph server setup and configuration
  2. SPARQL HTTP client (src/lib/sparql/oxigraph-client.ts)
  3. Query execution hook (src/hooks/useSparqlQuery.ts)
  4. Results viewer component (table, JSON, graph views)
  5. Export results (CSV, JSON, RDF)
  6. Query performance metrics (execution time, result count)

Oxigraph Integration Steps:

Step 1: Install Oxigraph Server

# Via Cargo
cargo install oxigraph_server

# OR via Docker
docker pull oxigraph/oxigraph

Step 2: Start Oxigraph with Test Data

# Load Denmark dataset (43,429 triples)
oxigraph_server \
  --location ./data/oxigraph \
  --bind 127.0.0.1:7878

# In another terminal, load RDF data
curl -X POST \
  -H 'Content-Type: text/turtle' \
  --data-binary @data/rdf/denmark_complete.ttl \
  http://localhost:7878/store

Step 3: Create SPARQL Client Module

// src/lib/sparql/oxigraph-client.ts
export interface SparqlResult {
  head: { vars: string[] };
  results: {
    bindings: Array<Record<string, { type: string; value: string }>>;
  };
}

export async function executeSparql(query: string): Promise<SparqlResult> {
  const response = await fetch('http://localhost:7878/query', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/sparql-query',
      'Accept': 'application/sparql-results+json',
    },
    body: query,
  });
  
  if (!response.ok) {
    throw new Error(`SPARQL query failed: ${response.statusText}`);
  }
  
  return response.json();
}

Step 4: Create Query Execution Hook

// src/hooks/useSparqlQuery.ts
import { useState, useCallback } from 'react';
import { executeSparql, type SparqlResult } from '../lib/sparql/oxigraph-client';

export function useSparqlQuery() {
  const [results, setResults] = useState<SparqlResult | null>(null);
  const [isLoading, setIsLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);
  const [executionTime, setExecutionTime] = useState<number>(0);
  
  const executeQuery = useCallback(async (query: string) => {
    setIsLoading(true);
    setError(null);
    
    const startTime = performance.now();
    
    try {
      const result = await executeSparql(query);
      setResults(result);
      setExecutionTime(performance.now() - startTime);
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Query execution failed');
      setResults(null);
    } finally {
      setIsLoading(false);
    }
  }, []);
  
  return { results, isLoading, error, executionTime, executeQuery };
}

Step 5: Create Results Viewer Component

// src/components/query/ResultsViewer.tsx
import type { SparqlResult } from '../../lib/sparql/oxigraph-client';

interface ResultsViewerProps {
  results: SparqlResult;
  executionTime: number;
}

export function ResultsViewer({ results, executionTime }: ResultsViewerProps) {
  const { head, results: data } = results;
  const bindings = data.bindings;
  
  return (
    <div className="results-viewer">
      <div className="results-header">
        <span>{bindings.length} results</span>
        <span>Execution time: {executionTime.toFixed(2)}ms</span>
      </div>
      
      <table className="results-table">
        <thead>
          <tr>
            {head.vars.map(variable => (
              <th key={variable}>{variable}</th>
            ))}
          </tr>
        </thead>
        <tbody>
          {bindings.map((binding, index) => (
            <tr key={index}>
              {head.vars.map(variable => (
                <td key={variable}>
                  {binding[variable]?.value || '-'}
                </td>
              ))}
            </tr>
          ))}
        </tbody>
      </table>
    </div>
  );
}

Estimated Time: 6-8 hours


Testing Strategy

Unit Tests

// tests/unit/sparql-client.test.ts
import { describe, it, expect, vi } from 'vitest';
import { executeSparql } from '../../src/lib/sparql/oxigraph-client';

describe('executeSparql', () => {
  it('should execute SELECT query and return results', async () => {
    global.fetch = vi.fn().mockResolvedValue({
      ok: true,
      json: async () => ({
        head: { vars: ['subject'] },
        results: {
          bindings: [
            { subject: { type: 'uri', value: 'http://example.org/inst1' } },
          ],
        },
      }),
    });
    
    const query = 'SELECT ?subject WHERE { ?subject a schema:Museum }';
    const results = await executeSparql(query);
    
    expect(results.results.bindings).toHaveLength(1);
  });
  
  it('should throw error on failed query', async () => {
    global.fetch = vi.fn().mockResolvedValue({
      ok: false,
      statusText: 'Bad Request',
    });
    
    const query = 'INVALID SPARQL';
    await expect(executeSparql(query)).rejects.toThrow('SPARQL query failed');
  });
});

Integration Tests

// tests/integration/oxigraph.test.ts
import { describe, it, expect, beforeAll } from 'vitest';
import { executeSparql } from '../../src/lib/sparql/oxigraph-client';

describe('Oxigraph Integration', () => {
  beforeAll(async () => {
    // Assumes Oxigraph server is running on localhost:7878
    // with test data loaded
  });
  
  it('should query loaded RDF data', async () => {
    const query = `
      PREFIX schema: <http://schema.org/>
      SELECT ?museum WHERE {
        ?museum a schema:Museum .
      } LIMIT 10
    `;
    
    const results = await executeSparql(query);
    expect(results.results.bindings.length).toBeGreaterThan(0);
  });
});

Sample SPARQL Queries

Query 1: Find All Museums

PREFIX schema: <http://schema.org/>

SELECT ?museum ?name WHERE {
  ?museum a schema:Museum .
  ?museum schema:name ?name .
}
ORDER BY ?name
LIMIT 100

Query 2: Count Institutions by Type

PREFIX schema: <http://schema.org/>
PREFIX cpov: <http://data.europa.eu/m8g/>

SELECT ?type (COUNT(?institution) AS ?count) WHERE {
  ?institution a ?type .
  FILTER(?type IN (schema:Museum, schema:Library, schema:ArchiveOrganization))
}
GROUP BY ?type
ORDER BY DESC(?count)

Query 3: Find Institutions in a City

PREFIX schema: <http://schema.org/>

SELECT ?institution ?name ?address WHERE {
  ?institution schema:name ?name .
  ?institution schema:address ?addrNode .
  ?addrNode schema:addressLocality "København K" .
  ?addrNode schema:streetAddress ?address .
}
ORDER BY ?name
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX schema: <http://schema.org/>

SELECT ?institution ?name ?wikidataURI WHERE {
  ?institution schema:name ?name .
  ?institution owl:sameAs ?wikidataURI .
  FILTER(STRSTARTS(STR(?wikidataURI), "http://www.wikidata.org/entity/Q"))
}
LIMIT 100

Configuration

Oxigraph Server Config

# Start Oxigraph server with configuration
oxigraph_server \
  --location ./data/oxigraph \        # Data directory
  --bind 0.0.0.0:7878 \               # Bind address
  --cors "*" \                        # CORS for frontend development
  --readonly false                    # Allow data loading

Environment Variables

# .env.local (frontend)
VITE_SPARQL_ENDPOINT=http://localhost:7878
VITE_SPARQL_QUERY_TIMEOUT=30000  # 30 seconds

TypeScript Config

// src/config/sparql.ts
export const SPARQL_CONFIG = {
  endpoint: import.meta.env.VITE_SPARQL_ENDPOINT || 'http://localhost:7878',
  timeout: Number(import.meta.env.VITE_SPARQL_QUERY_TIMEOUT) || 30000,
  corsEnabled: true,
};

Performance Considerations

Dataset Size

Dataset Triples Query Time Notes
Denmark (Current) 43,429 <100ms Fast for prototyping
Netherlands (Estimated) ~500,000 <500ms Should be manageable
Global (Future) 5-10M 1-5s May need optimization

Optimization Strategies

  1. Indexing: Oxigraph automatically indexes triples (no manual configuration)
  2. Query Limits: Always use LIMIT clauses during development
  3. Caching: Cache frequent query results in frontend (localStorage)
  4. Pagination: Implement OFFSET/LIMIT for large result sets
  5. Prefixes: Use PREFIX declarations to reduce query size

Deployment

Development

# Start Oxigraph locally
oxigraph_server --location ./data/oxigraph --bind 127.0.0.1:7878

# Load test data
curl -X POST \
  -H 'Content-Type: text/turtle' \
  --data-binary @data/rdf/denmark_complete.ttl \
  http://localhost:7878/store

Docker (Production)

# Dockerfile.oxigraph
FROM oxigraph/oxigraph:latest

# Copy RDF data to container
COPY data/rdf /data/rdf

# Expose SPARQL endpoint
EXPOSE 7878

# Start server with persistent storage
CMD ["--location", "/data/oxigraph", "--bind", "0.0.0.0:7878"]
# docker-compose.yml
version: '3.8'

services:
  oxigraph:
    image: oxigraph/oxigraph:latest
    ports:
      - "7878:7878"
    volumes:
      - ./data/oxigraph:/data/oxigraph
      - ./data/rdf:/data/rdf:ro
    command: --location /data/oxigraph --bind 0.0.0.0:7878 --cors "*"
  
  frontend:
    build: ./frontend
    ports:
      - "5173:5173"
    environment:
      - VITE_SPARQL_ENDPOINT=http://oxigraph:7878
    depends_on:
      - oxigraph

Alternatives Considered

Virtuoso

  • Pros: Very mature, enterprise-grade, excellent performance
  • Cons: Complex setup, heavyweight (2GB+ memory), overkill for prototype

Blazegraph

  • Pros: Full SPARQL 1.1, good documentation
  • Cons: Java dependency, discontinued (last release 2019)

Apache Jena Fuseki

  • Pros: Mature, full-featured, active development
  • Cons: Java dependency, more complex than Oxigraph

GraphDB

  • Pros: Commercial support, advanced reasoning, SHACL validation
  • Cons: Proprietary (free edition has limits), complex setup

Decision: Oxigraph wins for simplicity, modern stack (Rust), and cultural heritage sector adoption.


Next Steps

Immediate (Task 6 - Query Builder)

  1. Document Oxigraph decision (this file)
  2. Implement visual query builder UI (4-5 hours)
  3. Create SPARQL syntax validator
  4. Build query templates library

Next Session (Task 7 - SPARQL Execution)

  1. Install Oxigraph server (local + Docker)
  2. Load Denmark dataset (43,429 triples)
  3. Create SPARQL client module
  4. Implement query execution hook
  5. Build results viewer component
  6. Add export functionality (CSV, JSON, RDF)
  7. Write integration tests

References

Oxigraph Documentation

Project Documentation

  • Planning: ontology/2025-09-09T08-31-07-*-Linked_Data_Cultural_Heritage_Project.json
  • RDF Datasets: data/rdf/README.md
  • Schema: schemas/20251121/rdf/ (8 RDF formats)

SPARQL Resources


Status: Planning Complete
Next: Implement Query Builder (Phase 3 - Task 6)
Estimated Timeline: 10-13 hours total (Query Builder 4-5h + SPARQL Execution 6-8h)

Last Updated: 2025-11-22
Author: OpenCode AI Agent