kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams

- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.

2025-11-22 23:01:13 +01:00

17 KiB

Raw Blame History

Triplestore Setup: Oxigraph for RDF Visualizer

Date: 2025-11-22
Status: Planning / Not Yet Implemented
Priority: Phase 3 - Task 6 & 7 (Query Builder + SPARQL Execution)

Overview

The GLAM project will use Oxigraph as the RDF triplestore for the frontend visualization application. Oxigraph was explicitly chosen in the project planning documents (September 2025) as part of the knowledge graph infrastructure.

Why Oxigraph?

Selected Benefits

From the project documentation (ontology conversation 2025-09-09):

Phase 4 - Knowledge Graph Infrastructure (120 hours):

TypeDB hypergraph database

Oxigraph RDF triple store

Advantages for This Project

Lightweight & Embeddable
- Can run in-process with the frontend (via WASM) OR as a separate server
- Minimal setup compared to Virtuoso, Blazegraph, or GraphDB
- Perfect for prototype/demonstration use cases
SPARQL 1.1 Compliant
- Full SPARQL query support (SELECT, CONSTRUCT, ASK, DESCRIBE)
- Standards-compliant implementation
- Compatible with all RDF formats (Turtle, N-Triples, RDF/XML, JSON-LD, etc.)
Active Development
- Modern Rust implementation (high performance, memory safety)
- GitHub: https://github.com/oxigraph/oxigraph
- Actively maintained (2018-present)
Multiple Deployment Options
- Server mode: HTTP API for remote queries
- Library mode: Embedded in Python/Rust applications
- WASM mode: In-browser triplestore (experimental)
Cultural Heritage Sector Adoption
- Used in European cultural heritage projects
- Recommended for prototype/exploratory projects (530-hour estimate research)
- Proven for organizational data modeling

Current Status

Backend (Python)

The Python backend includes RDF processing dependencies:

# pyproject.toml - RDF and semantic web dependencies
rdflib = "^7.0.0"           # RDF parsing/serialization
SPARQLWrapper = "^2.0.0"    # SPARQL query execution

Status: ✅ Dependencies installed
Usage: Backend can parse RDF files and generate SPARQL queries

Frontend (TypeScript/React)

Status: ❌ Not yet integrated
Todo: Add Oxigraph client or SPARQL HTTP client

Architecture Options

Option 1: Oxigraph Server (Recommended)

Setup:

# Install Oxigraph server
cargo install oxigraph_server

# Or use Docker
docker pull oxigraph/oxigraph

# Start server with persistent storage
oxigraph_server --location ./data/oxigraph --bind 127.0.0.1:7878

Frontend Integration:

// src/lib/sparql/oxigraph-client.ts
export async function querySparql(query: string): Promise<any> {
  const response = await fetch('http://localhost:7878/query', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/sparql-query',
      'Accept': 'application/sparql-results+json',
    },
    body: query,
  });
  
  return response.json();
}

Pros:

✅ Separate process (doesn't block frontend)
✅ Standard HTTP API (easy to integrate)
✅ Supports large datasets
✅ Can be shared across multiple clients

Cons:

⚠️ Requires running separate server process
⚠️ Adds deployment complexity

Option 2: Oxigraph WASM (Experimental)

Setup:

# Install Oxigraph WASM package
npm install oxigraph

Frontend Integration:

// src/lib/sparql/oxigraph-wasm.ts
import { Store } from 'oxigraph';

export class OxigraphStore {
  private store: Store;
  
  constructor() {
    this.store = new Store();
  }
  
  async loadRDF(rdfData: string, format: string) {
    await this.store.load(rdfData, format, null, null);
  }
  
  async query(sparql: string) {
    return this.store.query(sparql);
  }
}

Pros:

✅ No server required (runs in browser)
✅ Zero configuration
✅ Perfect for demos and prototypes
✅ Offline-capable

Cons:

⚠️ Experimental (WASM support still evolving)
⚠️ Limited to browser memory (dataset size constraints)
⚠️ May have performance limitations

Option 3: Backend Proxy (Hybrid)

Setup:

Backend runs Oxigraph server
Frontend queries backend API
Backend proxies to Oxigraph

Frontend Integration:

// src/lib/sparql/backend-proxy.ts
export async function querySparql(query: string): Promise<any> {
  const response = await fetch('/api/sparql', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ query }),
  });
  
  return response.json();
}

Pros:

✅ Backend controls triplestore lifecycle
✅ Can add authentication/authorization
✅ Can preprocess/validate queries
✅ Hides triplestore implementation details

Cons:

⚠️ Requires backend development
⚠️ More complex architecture

Recommendation

For Phase 3 (Task 6 & 7): Use Option 1 (Oxigraph Server)

Rationale

Proven Approach: Standard HTTP API matches project architecture research
Scalable: Can handle Denmark dataset (43,429 triples) and beyond
Simple Development: Frontend can focus on SPARQL query building, not triplestore management
Future-Proof: Easy to swap for other triplestores (Virtuoso, Blazegraph) later if needed
Docker-Ready: Can be containerized for production deployment

Implementation Plan

Phase 3 - Task 6: Query Builder

Goal: Visual SPARQL query builder UI

Deliverables:

Query builder component with visual interface
Subject-Predicate-Object pattern builder
Filter conditions UI
SPARQL syntax preview (live syntax highlighting)
Query validation (syntax checking)
Query templates library (pre-built queries)

Oxigraph Integration:

NOT REQUIRED yet
Query builder generates SPARQL strings only
Can test queries against static RDF files using rdflib (Python backend)

Estimated Time: 4-5 hours

Phase 3 - Task 7: SPARQL Query Execution

Goal: Execute SPARQL queries against loaded RDF data

Deliverables:

Oxigraph server setup and configuration
SPARQL HTTP client (src/lib/sparql/oxigraph-client.ts)
Query execution hook (src/hooks/useSparqlQuery.ts)
Results viewer component (table, JSON, graph views)
Export results (CSV, JSON, RDF)
Query performance metrics (execution time, result count)

Oxigraph Integration Steps:

Step 1: Install Oxigraph Server

# Via Cargo
cargo install oxigraph_server

# OR via Docker
docker pull oxigraph/oxigraph

Step 2: Start Oxigraph with Test Data

# Load Denmark dataset (43,429 triples)
oxigraph_server \
  --location ./data/oxigraph \
  --bind 127.0.0.1:7878

# In another terminal, load RDF data
curl -X POST \
  -H 'Content-Type: text/turtle' \
  --data-binary @data/rdf/denmark_complete.ttl \
  http://localhost:7878/store

Step 3: Create SPARQL Client Module

// src/lib/sparql/oxigraph-client.ts
export interface SparqlResult {
  head: { vars: string[] };
  results: {
    bindings: Array<Record<string, { type: string; value: string }>>;
  };
}

export async function executeSparql(query: string): Promise<SparqlResult> {
  const response = await fetch('http://localhost:7878/query', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/sparql-query',
      'Accept': 'application/sparql-results+json',
    },
    body: query,
  });
  
  if (!response.ok) {
    throw new Error(`SPARQL query failed: ${response.statusText}`);
  }
  
  return response.json();
}

Step 4: Create Query Execution Hook

// src/hooks/useSparqlQuery.ts
import { useState, useCallback } from 'react';
import { executeSparql, type SparqlResult } from '../lib/sparql/oxigraph-client';

export function useSparqlQuery() {
  const [results, setResults] = useState<SparqlResult | null>(null);
  const [isLoading, setIsLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);
  const [executionTime, setExecutionTime] = useState<number>(0);
  
  const executeQuery = useCallback(async (query: string) => {
    setIsLoading(true);
    setError(null);
    
    const startTime = performance.now();
    
    try {
      const result = await executeSparql(query);
      setResults(result);
      setExecutionTime(performance.now() - startTime);
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Query execution failed');
      setResults(null);
    } finally {
      setIsLoading(false);
    }
  }, []);
  
  return { results, isLoading, error, executionTime, executeQuery };
}

Step 5: Create Results Viewer Component

// src/components/query/ResultsViewer.tsx
import type { SparqlResult } from '../../lib/sparql/oxigraph-client';

interface ResultsViewerProps {
  results: SparqlResult;
  executionTime: number;
}

export function ResultsViewer({ results, executionTime }: ResultsViewerProps) {
  const { head, results: data } = results;
  const bindings = data.bindings;
  
  return (
    <div className="results-viewer">
      <div className="results-header">
        <span>{bindings.length} results</span>
        <span>Execution time: {executionTime.toFixed(2)}ms</span>
      </div>
      
      <table className="results-table">
        <thead>
          <tr>
            {head.vars.map(variable => (
              <th key={variable}>{variable}</th>
            ))}
          </tr>
        </thead>
        <tbody>
          {bindings.map((binding, index) => (
            <tr key={index}>
              {head.vars.map(variable => (
                <td key={variable}>
                  {binding[variable]?.value || '-'}
                </td>
              ))}
            </tr>
          ))}
        </tbody>
      </table>
    </div>
  );
}

Estimated Time: 6-8 hours

Testing Strategy

Unit Tests

// tests/unit/sparql-client.test.ts
import { describe, it, expect, vi } from 'vitest';
import { executeSparql } from '../../src/lib/sparql/oxigraph-client';

describe('executeSparql', () => {
  it('should execute SELECT query and return results', async () => {
    global.fetch = vi.fn().mockResolvedValue({
      ok: true,
      json: async () => ({
        head: { vars: ['subject'] },
        results: {
          bindings: [
            { subject: { type: 'uri', value: 'http://example.org/inst1' } },
          ],
        },
      }),
    });
    
    const query = 'SELECT ?subject WHERE { ?subject a schema:Museum }';
    const results = await executeSparql(query);
    
    expect(results.results.bindings).toHaveLength(1);
  });
  
  it('should throw error on failed query', async () => {
    global.fetch = vi.fn().mockResolvedValue({
      ok: false,
      statusText: 'Bad Request',
    });
    
    const query = 'INVALID SPARQL';
    await expect(executeSparql(query)).rejects.toThrow('SPARQL query failed');
  });
});

Integration Tests

// tests/integration/oxigraph.test.ts
import { describe, it, expect, beforeAll } from 'vitest';
import { executeSparql } from '../../src/lib/sparql/oxigraph-client';

describe('Oxigraph Integration', () => {
  beforeAll(async () => {
    // Assumes Oxigraph server is running on localhost:7878
    // with test data loaded
  });
  
  it('should query loaded RDF data', async () => {
    const query = `
      PREFIX schema: <http://schema.org/>
      SELECT ?museum WHERE {
        ?museum a schema:Museum .
      } LIMIT 10
    `;
    
    const results = await executeSparql(query);
    expect(results.results.bindings.length).toBeGreaterThan(0);
  });
});

Sample SPARQL Queries

Query 1: Find All Museums

PREFIX schema: <http://schema.org/>

SELECT ?museum ?name WHERE {
  ?museum a schema:Museum .
  ?museum schema:name ?name .
}
ORDER BY ?name
LIMIT 100

Query 2: Count Institutions by Type

PREFIX schema: <http://schema.org/>
PREFIX cpov: <http://data.europa.eu/m8g/>

SELECT ?type (COUNT(?institution) AS ?count) WHERE {
  ?institution a ?type .
  FILTER(?type IN (schema:Museum, schema:Library, schema:ArchiveOrganization))
}
GROUP BY ?type
ORDER BY DESC(?count)

Query 3: Find Institutions in a City

PREFIX schema: <http://schema.org/>

SELECT ?institution ?name ?address WHERE {
  ?institution schema:name ?name .
  ?institution schema:address ?addrNode .
  ?addrNode schema:addressLocality "København K" .
  ?addrNode schema:streetAddress ?address .
}
ORDER BY ?name

Query 4: Find Institutions with Wikidata Links

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX schema: <http://schema.org/>

SELECT ?institution ?name ?wikidataURI WHERE {
  ?institution schema:name ?name .
  ?institution owl:sameAs ?wikidataURI .
  FILTER(STRSTARTS(STR(?wikidataURI), "http://www.wikidata.org/entity/Q"))
}
LIMIT 100

Configuration

Oxigraph Server Config

# Start Oxigraph server with configuration
oxigraph_server \
  --location ./data/oxigraph \        # Data directory
  --bind 0.0.0.0:7878 \               # Bind address
  --cors "*" \                        # CORS for frontend development
  --readonly false                    # Allow data loading

Environment Variables

# .env.local (frontend)
VITE_SPARQL_ENDPOINT=http://localhost:7878
VITE_SPARQL_QUERY_TIMEOUT=30000  # 30 seconds

TypeScript Config

// src/config/sparql.ts
export const SPARQL_CONFIG = {
  endpoint: import.meta.env.VITE_SPARQL_ENDPOINT || 'http://localhost:7878',
  timeout: Number(import.meta.env.VITE_SPARQL_QUERY_TIMEOUT) || 30000,
  corsEnabled: true,
};

Performance Considerations

Dataset Size

Dataset	Triples	Query Time	Notes
Denmark (Current)	43,429	<100ms	Fast for prototyping
Netherlands (Estimated)	~500,000	<500ms	Should be manageable
Global (Future)	5-10M	1-5s	May need optimization

Optimization Strategies

Indexing: Oxigraph automatically indexes triples (no manual configuration)
Query Limits: Always use LIMIT clauses during development
Caching: Cache frequent query results in frontend (localStorage)
Pagination: Implement OFFSET/LIMIT for large result sets
Prefixes: Use PREFIX declarations to reduce query size

Deployment

Development

# Start Oxigraph locally
oxigraph_server --location ./data/oxigraph --bind 127.0.0.1:7878

# Load test data
curl -X POST \
  -H 'Content-Type: text/turtle' \
  --data-binary @data/rdf/denmark_complete.ttl \
  http://localhost:7878/store

Docker (Production)

# Dockerfile.oxigraph
FROM oxigraph/oxigraph:latest

# Copy RDF data to container
COPY data/rdf /data/rdf

# Expose SPARQL endpoint
EXPOSE 7878

# Start server with persistent storage
CMD ["--location", "/data/oxigraph", "--bind", "0.0.0.0:7878"]

# docker-compose.yml
version: '3.8'

services:
  oxigraph:
    image: oxigraph/oxigraph:latest
    ports:
      - "7878:7878"
    volumes:
      - ./data/oxigraph:/data/oxigraph
      - ./data/rdf:/data/rdf:ro
    command: --location /data/oxigraph --bind 0.0.0.0:7878 --cors "*"
  
  frontend:
    build: ./frontend
    ports:
      - "5173:5173"
    environment:
      - VITE_SPARQL_ENDPOINT=http://oxigraph:7878
    depends_on:
      - oxigraph

Alternatives Considered

Virtuoso

Pros: Very mature, enterprise-grade, excellent performance
Cons: Complex setup, heavyweight (2GB+ memory), overkill for prototype

Blazegraph

Pros: Full SPARQL 1.1, good documentation
Cons: Java dependency, discontinued (last release 2019)

Apache Jena Fuseki

Pros: Mature, full-featured, active development
Cons: Java dependency, more complex than Oxigraph

GraphDB

Pros: Commercial support, advanced reasoning, SHACL validation
Cons: Proprietary (free edition has limits), complex setup

Decision: Oxigraph wins for simplicity, modern stack (Rust), and cultural heritage sector adoption.

Next Steps

Immediate (Task 6 - Query Builder)

✅ Document Oxigraph decision (this file)
⏳ Implement visual query builder UI (4-5 hours)
⏳ Create SPARQL syntax validator
⏳ Build query templates library

Next Session (Task 7 - SPARQL Execution)

⏳ Install Oxigraph server (local + Docker)
⏳ Load Denmark dataset (43,429 triples)
⏳ Create SPARQL client module
⏳ Implement query execution hook
⏳ Build results viewer component
⏳ Add export functionality (CSV, JSON, RDF)
⏳ Write integration tests

References

Oxigraph Documentation

GitHub: https://github.com/oxigraph/oxigraph
Architecture: https://github.com/oxigraph/oxigraph/wiki/Architecture
HTTP API: https://github.com/oxigraph/oxigraph/wiki/HTTP-API

Project Documentation

Planning: ontology/2025-09-09T08-31-07-*-Linked_Data_Cultural_Heritage_Project.json
RDF Datasets: data/rdf/README.md
Schema: schemas/20251121/rdf/ (8 RDF formats)

SPARQL Resources

W3C SPARQL 1.1: https://www.w3.org/TR/sparql11-query/
SPARQL by Example: https://www.w3.org/2009/Talks/0615-qbe/
RDF/SPARQL Tutorial: https://www.linkeddatatools.com/querying-semantic-data

Status: Planning Complete ✅
Next: Implement Query Builder (Phase 3 - Task 6)
Estimated Timeline: 10-13 hours total (Query Builder 4-5h + SPARQL Execution 6-8h)

Last Updated: 2025-11-22
Author: OpenCode AI Agent

17 KiB Raw Blame History

Triplestore Setup: Oxigraph for RDF Visualizer

Overview

Why Oxigraph?

Selected Benefits

Advantages for This Project

Current Status

Backend (Python)

Frontend (TypeScript/React)

Architecture Options

Option 1: Oxigraph Server (Recommended)

Option 2: Oxigraph WASM (Experimental)

Option 3: Backend Proxy (Hybrid)

Recommendation

Rationale

Implementation Plan

Phase 3 - Task 6: Query Builder

Phase 3 - Task 7: SPARQL Query Execution

Step 1: Install Oxigraph Server

Step 2: Start Oxigraph with Test Data

Step 3: Create SPARQL Client Module

Step 4: Create Query Execution Hook

Step 5: Create Results Viewer Component

Testing Strategy

Unit Tests

Integration Tests

Sample SPARQL Queries

Query 1: Find All Museums

Query 2: Count Institutions by Type

Query 3: Find Institutions in a City

Query 4: Find Institutions with Wikidata Links

Configuration

Oxigraph Server Config

Environment Variables

TypeScript Config

Performance Considerations

Dataset Size

Optimization Strategies

Deployment

Development

Docker (Production)

Alternatives Considered

Virtuoso

Blazegraph

Apache Jena Fuseki

GraphDB

Next Steps

Immediate (Task 6 - Query Builder)

Next Session (Task 7 - SPARQL Execution)

References

Oxigraph Documentation

Project Documentation

SPARQL Resources

17 KiB

Raw Blame History