glam/frontend/TRIPLESTORE_OXIGRAPH_SETUP.md

# Triplestore Setup: Oxigraph for RDF Visualizer

**Date**: 2025-11-22
**Status**: Planning / Not Yet Implemented
**Priority**: Phase 3 - Task 6 & 7 (Query Builder + SPARQL Execution)

---

## Overview

The GLAM project will use **Oxigraph** as the RDF triplestore for the frontend visualization application. Oxigraph was explicitly chosen in the project planning documents (September 2025) as part of the knowledge graph infrastructure.

---

## Why Oxigraph?

### Selected Benefits

From the project documentation (ontology conversation `2025-09-09`):

> **Phase 4 - Knowledge Graph Infrastructure (120 hours):**
> - TypeDB hypergraph database
> - Oxigraph RDF triple store

### Advantages for This Project

1. **Lightweight & Embeddable**
   - Can run in-process with the frontend (via WASM) OR as a separate server
   - Minimal setup compared to Virtuoso, Blazegraph, or GraphDB
   - Perfect for prototype/demonstration use cases

2. **SPARQL 1.1 Compliant**
   - Full SPARQL query support (SELECT, CONSTRUCT, ASK, DESCRIBE)
   - Standards-compliant implementation
   - Compatible with all RDF formats (Turtle, N-Triples, RDF/XML, JSON-LD, etc.)

3. **Active Development**
   - Modern Rust implementation (high performance, memory safety)
   - GitHub: https://github.com/oxigraph/oxigraph
   - Actively maintained (2018-present)

4. **Multiple Deployment Options**
   - **Server mode**: HTTP API for remote queries
   - **Library mode**: Embedded in Python/Rust applications
   - **WASM mode**: In-browser triplestore (experimental)

5. **Cultural Heritage Sector Adoption**
   - Used in European cultural heritage projects
   - Recommended for prototype/exploratory projects (530-hour estimate research)
   - Proven for organizational data modeling

---

## Current Status

### Backend (Python)

The Python backend includes RDF processing dependencies:

```toml
# pyproject.toml - RDF and semantic web dependencies
rdflib = "^7.0.0"           # RDF parsing/serialization
SPARQLWrapper = "^2.0.0"    # SPARQL query execution
```

**Status**: ✅ Dependencies installed
**Usage**: Backend can parse RDF files and generate SPARQL queries

### Frontend (TypeScript/React)

**Status**: ❌ Not yet integrated
**Todo**: Add Oxigraph client or SPARQL HTTP client

---

## Architecture Options

### Option 1: Oxigraph Server (Recommended)

**Setup**:
```bash
# Install Oxigraph server
cargo install oxigraph_server

# Or use Docker
docker pull oxigraph/oxigraph

# Start server with persistent storage
oxigraph_server --location ./data/oxigraph --bind 127.0.0.1:7878
```

**Frontend Integration**:
```typescript
// src/lib/sparql/oxigraph-client.ts
export async function querySparql(query: string): Promise<any> {
  const response = await fetch('http://localhost:7878/query', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/sparql-query',
      'Accept': 'application/sparql-results+json',
    },
    body: query,
  });

  return response.json();
}
```

**Pros**:
- ✅ Separate process (doesn't block frontend)
- ✅ Standard HTTP API (easy to integrate)
- ✅ Supports large datasets
- ✅ Can be shared across multiple clients

**Cons**:
- ⚠️ Requires running separate server process
- ⚠️ Adds deployment complexity

---

### Option 2: Oxigraph WASM (Experimental)

**Setup**:
```bash
# Install Oxigraph WASM package
npm install oxigraph
```

**Frontend Integration**:
```typescript
// src/lib/sparql/oxigraph-wasm.ts
import { Store } from 'oxigraph';

export class OxigraphStore {
  private store: Store;

  constructor() {
    this.store = new Store();
  }

  async loadRDF(rdfData: string, format: string) {
    await this.store.load(rdfData, format, null, null);
  }

  async query(sparql: string) {
    return this.store.query(sparql);
  }
}
```

**Pros**:
- ✅ No server required (runs in browser)
- ✅ Zero configuration
- ✅ Perfect for demos and prototypes
- ✅ Offline-capable

**Cons**:
- ⚠️ Experimental (WASM support still evolving)
- ⚠️ Limited to browser memory (dataset size constraints)
- ⚠️ May have performance limitations

---

### Option 3: Backend Proxy (Hybrid)

**Setup**:
- Backend runs Oxigraph server
- Frontend queries backend API
- Backend proxies to Oxigraph

**Frontend Integration**:
```typescript
// src/lib/sparql/backend-proxy.ts
export async function querySparql(query: string): Promise<any> {
  const response = await fetch('/api/sparql', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ query }),
  });

  return response.json();
}
```

**Pros**:
- ✅ Backend controls triplestore lifecycle
- ✅ Can add authentication/authorization
- ✅ Can preprocess/validate queries
- ✅ Hides triplestore implementation details

**Cons**:
- ⚠️ Requires backend development
- ⚠️ More complex architecture

---

## Recommendation

**For Phase 3 (Task 6 & 7)**: Use **Option 1 (Oxigraph Server)**

### Rationale

1. **Proven Approach**: Standard HTTP API matches project architecture research
2. **Scalable**: Can handle Denmark dataset (43,429 triples) and beyond
3. **Simple Development**: Frontend can focus on SPARQL query building, not triplestore management
4. **Future-Proof**: Easy to swap for other triplestores (Virtuoso, Blazegraph) later if needed
5. **Docker-Ready**: Can be containerized for production deployment

---

## Implementation Plan

### Phase 3 - Task 6: Query Builder

**Goal**: Visual SPARQL query builder UI

**Deliverables**:
1. Query builder component with visual interface
2. Subject-Predicate-Object pattern builder
3. Filter conditions UI
4. SPARQL syntax preview (live syntax highlighting)
5. Query validation (syntax checking)
6. Query templates library (pre-built queries)

**Oxigraph Integration**:
- NOT REQUIRED yet
- Query builder generates SPARQL strings only
- Can test queries against static RDF files using `rdflib` (Python backend)

**Estimated Time**: 4-5 hours

---

### Phase 3 - Task 7: SPARQL Query Execution

**Goal**: Execute SPARQL queries against loaded RDF data

**Deliverables**:
1. Oxigraph server setup and configuration
2. SPARQL HTTP client (`src/lib/sparql/oxigraph-client.ts`)
3. Query execution hook (`src/hooks/useSparqlQuery.ts`)
4. Results viewer component (table, JSON, graph views)
5. Export results (CSV, JSON, RDF)
6. Query performance metrics (execution time, result count)

**Oxigraph Integration Steps**:

#### Step 1: Install Oxigraph Server
```bash
# Via Cargo
cargo install oxigraph_server

# OR via Docker
docker pull oxigraph/oxigraph
```

#### Step 2: Start Oxigraph with Test Data
```bash
# Load Denmark dataset (43,429 triples)
oxigraph_server \
  --location ./data/oxigraph \
  --bind 127.0.0.1:7878

# In another terminal, load RDF data
curl -X POST \
  -H 'Content-Type: text/turtle' \
  --data-binary @data/rdf/denmark_complete.ttl \
  http://localhost:7878/store
```

#### Step 3: Create SPARQL Client Module
```typescript
// src/lib/sparql/oxigraph-client.ts
export interface SparqlResult {
  head: { vars: string[] };
  results: {
    bindings: Array<Record<string, { type: string; value: string }>>;
  };
}

export async function executeSparql(query: string): Promise<SparqlResult> {
  const response = await fetch('http://localhost:7878/query', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/sparql-query',
      'Accept': 'application/sparql-results+json',
    },
    body: query,
  });

  if (!response.ok) {
    throw new Error(`SPARQL query failed: ${response.statusText}`);
  }

  return response.json();
}
```

#### Step 4: Create Query Execution Hook
```typescript
// src/hooks/useSparqlQuery.ts
import { useState, useCallback } from 'react';
import { executeSparql, type SparqlResult } from '../lib/sparql/oxigraph-client';

export function useSparqlQuery() {
  const [results, setResults] = useState<SparqlResult | null>(null);
  const [isLoading, setIsLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);
  const [executionTime, setExecutionTime] = useState<number>(0);

  const executeQuery = useCallback(async (query: string) => {
    setIsLoading(true);
    setError(null);

    const startTime = performance.now();

    try {
      const result = await executeSparql(query);
      setResults(result);
      setExecutionTime(performance.now() - startTime);
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Query execution failed');
      setResults(null);
    } finally {
      setIsLoading(false);
    }
  }, []);

  return { results, isLoading, error, executionTime, executeQuery };
}
```

#### Step 5: Create Results Viewer Component
```typescript
// src/components/query/ResultsViewer.tsx
import type { SparqlResult } from '../../lib/sparql/oxigraph-client';

interface ResultsViewerProps {
  results: SparqlResult;
  executionTime: number;
}

export function ResultsViewer({ results, executionTime }: ResultsViewerProps) {
  const { head, results: data } = results;
  const bindings = data.bindings;

  return (
    <div className="results-viewer">
      <div className="results-header">
        <span>{bindings.length} results</span>
        <span>Execution time: {executionTime.toFixed(2)}ms</span>
      </div>

      <table className="results-table">
        <thead>
          <tr>
            {head.vars.map(variable => (
              <th key={variable}>{variable}</th>
            ))}
          </tr>
        </thead>
        <tbody>
          {bindings.map((binding, index) => (
            <tr key={index}>
              {head.vars.map(variable => (
                <td key={variable}>
                  {binding[variable]?.value || '-'}
                </td>
              ))}
            </tr>
          ))}
        </tbody>
      </table>
    </div>
  );
}
```

**Estimated Time**: 6-8 hours

---

## Testing Strategy

### Unit Tests

```typescript
// tests/unit/sparql-client.test.ts
import { describe, it, expect, vi } from 'vitest';
import { executeSparql } from '../../src/lib/sparql/oxigraph-client';

describe('executeSparql', () => {
  it('should execute SELECT query and return results', async () => {
    global.fetch = vi.fn().mockResolvedValue({
      ok: true,
      json: async () => ({
        head: { vars: ['subject'] },
        results: {
          bindings: [
            { subject: { type: 'uri', value: 'http://example.org/inst1' } },
          ],
        },
      }),
    });

    const query = 'SELECT ?subject WHERE { ?subject a schema:Museum }';
    const results = await executeSparql(query);

    expect(results.results.bindings).toHaveLength(1);
  });

  it('should throw error on failed query', async () => {
    global.fetch = vi.fn().mockResolvedValue({
      ok: false,
      statusText: 'Bad Request',
    });

    const query = 'INVALID SPARQL';
    await expect(executeSparql(query)).rejects.toThrow('SPARQL query failed');
  });
});
```

### Integration Tests

```typescript
// tests/integration/oxigraph.test.ts
import { describe, it, expect, beforeAll } from 'vitest';
import { executeSparql } from '../../src/lib/sparql/oxigraph-client';

describe('Oxigraph Integration', () => {
  beforeAll(async () => {
    // Assumes Oxigraph server is running on localhost:7878
    // with test data loaded
  });

  it('should query loaded RDF data', async () => {
    const query = `
      PREFIX schema: <http://schema.org/>
      SELECT ?museum WHERE {
        ?museum a schema:Museum .
      } LIMIT 10
    `;

    const results = await executeSparql(query);
    expect(results.results.bindings.length).toBeGreaterThan(0);
  });
});
```

---

## Sample SPARQL Queries

### Query 1: Find All Museums

```sparql
PREFIX schema: <http://schema.org/>

SELECT ?museum ?name WHERE {
  ?museum a schema:Museum .
  ?museum schema:name ?name .
}
ORDER BY ?name
LIMIT 100
```

### Query 2: Count Institutions by Type

```sparql
PREFIX schema: <http://schema.org/>
PREFIX cpov: <http://data.europa.eu/m8g/>

SELECT ?type (COUNT(?institution) AS ?count) WHERE {
  ?institution a ?type .
  FILTER(?type IN (schema:Museum, schema:Library, schema:ArchiveOrganization))
}
GROUP BY ?type
ORDER BY DESC(?count)
```

### Query 3: Find Institutions in a City

```sparql
PREFIX schema: <http://schema.org/>

SELECT ?institution ?name ?address WHERE {
  ?institution schema:name ?name .
  ?institution schema:address ?addrNode .
  ?addrNode schema:addressLocality "København K" .
  ?addrNode schema:streetAddress ?address .
}
ORDER BY ?name
```

### Query 4: Find Institutions with Wikidata Links

```sparql
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX schema: <http://schema.org/>

SELECT ?institution ?name ?wikidataURI WHERE {
  ?institution schema:name ?name .
  ?institution owl:sameAs ?wikidataURI .
  FILTER(STRSTARTS(STR(?wikidataURI), "http://www.wikidata.org/entity/Q"))
}
LIMIT 100
```

---

## Configuration

### Oxigraph Server Config

```bash
# Start Oxigraph server with configuration
oxigraph_server \
  --location ./data/oxigraph \        # Data directory
  --bind 0.0.0.0:7878 \               # Bind address
  --cors "*" \                        # CORS for frontend development
  --readonly false                    # Allow data loading
```

### Environment Variables

```bash
# .env.local (frontend)
VITE_SPARQL_ENDPOINT=http://localhost:7878
VITE_SPARQL_QUERY_TIMEOUT=30000  # 30 seconds
```

### TypeScript Config

```typescript
// src/config/sparql.ts
export const SPARQL_CONFIG = {
  endpoint: import.meta.env.VITE_SPARQL_ENDPOINT || 'http://localhost:7878',
  timeout: Number(import.meta.env.VITE_SPARQL_QUERY_TIMEOUT) || 30000,
  corsEnabled: true,
};
```

---

## Performance Considerations

### Dataset Size

| Dataset | Triples | Query Time | Notes |
|---------|---------|------------|-------|
| Denmark (Current) | 43,429 | <100ms | Fast for prototyping |
| Netherlands (Estimated) | ~500,000 | <500ms | Should be manageable |
| Global (Future) | 5-10M | 1-5s | May need optimization |

### Optimization Strategies

1. **Indexing**: Oxigraph automatically indexes triples (no manual configuration)
2. **Query Limits**: Always use `LIMIT` clauses during development
3. **Caching**: Cache frequent query results in frontend (localStorage)
4. **Pagination**: Implement OFFSET/LIMIT for large result sets
5. **Prefixes**: Use PREFIX declarations to reduce query size

---

## Deployment

### Development

```bash
# Start Oxigraph locally
oxigraph_server --location ./data/oxigraph --bind 127.0.0.1:7878

# Load test data
curl -X POST \
  -H 'Content-Type: text/turtle' \
  --data-binary @data/rdf/denmark_complete.ttl \
  http://localhost:7878/store
```

### Docker (Production)

```dockerfile
# Dockerfile.oxigraph
FROM oxigraph/oxigraph:latest

# Copy RDF data to container
COPY data/rdf /data/rdf

# Expose SPARQL endpoint
EXPOSE 7878

# Start server with persistent storage
CMD ["--location", "/data/oxigraph", "--bind", "0.0.0.0:7878"]
```

```yaml
# docker-compose.yml
version: '3.8'

services:
  oxigraph:
    image: oxigraph/oxigraph:latest
    ports:
      - "7878:7878"
    volumes:
      - ./data/oxigraph:/data/oxigraph
      - ./data/rdf:/data/rdf:ro
    command: --location /data/oxigraph --bind 0.0.0.0:7878 --cors "*"

  frontend:
    build: ./frontend
    ports:
      - "5173:5173"
    environment:
      - VITE_SPARQL_ENDPOINT=http://oxigraph:7878
    depends_on:
      - oxigraph
```

---

## Alternatives Considered

### Virtuoso

- **Pros**: Very mature, enterprise-grade, excellent performance
- **Cons**: Complex setup, heavyweight (2GB+ memory), overkill for prototype

### Blazegraph

- **Pros**: Full SPARQL 1.1, good documentation
- **Cons**: Java dependency, discontinued (last release 2019)

### Apache Jena Fuseki

- **Pros**: Mature, full-featured, active development
- **Cons**: Java dependency, more complex than Oxigraph

### GraphDB

- **Pros**: Commercial support, advanced reasoning, SHACL validation
- **Cons**: Proprietary (free edition has limits), complex setup

**Decision**: Oxigraph wins for simplicity, modern stack (Rust), and cultural heritage sector adoption.

---

## Next Steps

### Immediate (Task 6 - Query Builder)

1. ✅ Document Oxigraph decision (this file)
2. ⏳ Implement visual query builder UI (4-5 hours)
3. ⏳ Create SPARQL syntax validator
4. ⏳ Build query templates library

### Next Session (Task 7 - SPARQL Execution)

1. ⏳ Install Oxigraph server (local + Docker)
2. ⏳ Load Denmark dataset (43,429 triples)
3. ⏳ Create SPARQL client module
4. ⏳ Implement query execution hook
5. ⏳ Build results viewer component
6. ⏳ Add export functionality (CSV, JSON, RDF)
7. ⏳ Write integration tests

---

## References

### Oxigraph Documentation

- **GitHub**: https://github.com/oxigraph/oxigraph
- **Architecture**: https://github.com/oxigraph/oxigraph/wiki/Architecture
- **HTTP API**: https://github.com/oxigraph/oxigraph/wiki/HTTP-API

### Project Documentation

- **Planning**: `ontology/2025-09-09T08-31-07-*-Linked_Data_Cultural_Heritage_Project.json`
- **RDF Datasets**: `data/rdf/README.md`
- **Schema**: `schemas/20251121/rdf/` (8 RDF formats)

### SPARQL Resources

- **W3C SPARQL 1.1**: https://www.w3.org/TR/sparql11-query/
- **SPARQL by Example**: https://www.w3.org/2009/Talks/0615-qbe/
- **RDF/SPARQL Tutorial**: https://www.linkeddatatools.com/querying-semantic-data

---

**Status**: Planning Complete ✅
**Next**: Implement Query Builder (Phase 3 - Task 6)
**Estimated Timeline**: 10-13 hours total (Query Builder 4-5h + SPARQL Execution 6-8h)

**Last Updated**: 2025-11-22
**Author**: OpenCode AI Agent