glam/docs/MULTI_DATABASE_ARCHITECTURE.md
2025-12-06 19:50:04 +01:00

9.2 KiB

Multi-Database Architecture for bronhouder.nl

This document describes the multi-database architecture implemented for the GLAM data platform at bronhouder.nl.

Overview

The Database page (/database) provides a unified interface for exploring heritage custodian data across four different database systems, each optimized for different use cases:

┌─────────────────────────────────────────────────────────────────┐
│                     bronhouder.nl/database                       │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
│  │ 🦆 DuckDB│  │🐘Postgres│  │ 🧠 TypeDB│  │🔗Oxigraph│        │
│  │ (Browser)│  │ (Server) │  │ (Server) │  │ (Server) │        │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘        │
│       │             │             │             │               │
│  In-Browser     REST API      REST API      SPARQL             │
│  WASM           Proxy         Proxy         Endpoint           │
└───────┼─────────────┼─────────────┼─────────────┼───────────────┘
        │             │             │             │
        ▼             ▼             ▼             ▼
   ┌─────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
   │ Browser │  │PostgreSQL│  │  TypeDB  │  │ Oxigraph │
   │ Memory  │  │ Database │  │ Database │  │  Store   │
   └─────────┘  └──────────┘  └──────────┘  └──────────┘

Database Systems

1. DuckDB (In-Browser OLAP)

Status: Fully Operational

Technology: DuckDB-WASM running entirely in the browser

Use Cases:

  • Ad-hoc SQL analytics on heritage institution data
  • Fast aggregations and filtering
  • Data exploration without server round-trips

Data Source: /data/nde_institutions.json (10.8 MB, 1,863 institutions)

Features:

  • Upload JSON/CSV/Parquet files directly
  • Run SQL queries in-browser
  • No server dependency
  • Export query results

Hook: frontend/src/hooks/useDuckDB.ts

Panel: frontend/src/components/database/DuckDBPanel.tsx

2. PostgreSQL (Relational)

Status: Fully Operational (as of 2025-12-06)

Technology: PostgreSQL 16.11 with FastAPI REST proxy

Endpoint: https://bronhouder.nl/api/postgres

Use Cases:

  • Complex relational queries
  • Full-text search on institution names
  • Transactional operations
  • Integration with existing tools

Data: 1,838 NDE heritage institutions with:

  • 32 columns including coordinates, ratings, reviews
  • GHCID identifiers (text, UUID, numeric)
  • JSONB fields for wikidata_types, reviews, identifiers, genealogiewerkbalk

API Endpoints:

  • GET / - Health check and statistics
  • POST /query - Execute SQL query (read-only SELECT/WITH)
  • GET /tables - List all tables with metadata
  • GET /schema/{table} - Get table schema
  • GET /stats - Detailed database statistics

Backend: /opt/glam-backend/postgres/ on server

  • main.py - FastAPI application
  • load_nde_data.py - Data loading script
  • Systemd service: glam-postgres-api.service

Hook: frontend/src/hooks/usePostgreSQL.ts

Panel: frontend/src/components/database/PostgreSQLPanel.tsx

3. Oxigraph (RDF/SPARQL)

Status: Fully Operational

Technology: Oxigraph SPARQL endpoint on server

Use Cases:

  • Linked Data queries
  • Ontology exploration
  • Cross-referencing with Wikidata, Schema.org
  • Semantic reasoning

Endpoint: https://bronhouder.nl/sparql (proxied to 91.98.224.44:7878)

Triple Count: 426,243 triples

Features:

  • SPARQL 1.1 query interface
  • Graph exploration
  • Namespace prefix management
  • RDF upload (Turtle, N-Triples, JSON-LD)

Hook: frontend/src/hooks/useOxigraph.ts

Panel: frontend/src/components/database/OxigraphPanel.tsx

3. PostgreSQL (Relational)

Status: Requires Backend API

Technology: PostgreSQL database with REST API proxy

Use Cases:

  • Complex relational queries
  • Full-text search
  • Transactional operations
  • Integration with existing tools

Required: REST API at /api/postgres or VITE_POSTGRES_API_URL

Planned Features:

  • Table/schema browser
  • SQL query interface
  • Query history
  • Export to CSV

Hook: frontend/src/hooks/usePostgreSQL.ts

Panel: frontend/src/components/database/PostgreSQLPanel.tsx

4. TypeDB (Knowledge Graph)

Status: Deferred - Server has only 3.7GB RAM, TypeDB requires 4GB+

Technology: TypeDB with REST API proxy

Use Cases:

  • Complex knowledge graph queries
  • Multi-hop relationship traversal
  • Temporal reasoning (organizational changes)
  • Entity resolution

Note: To enable TypeDB, upgrade server to cx32 (8GB RAM) or higher.

Hook: frontend/src/hooks/useTypeDB.ts

Panel: frontend/src/components/database/TypeDBPanel.tsx

Frontend Components

Database Page (/database)

The main Database page provides:

  1. Tab Navigation: Switch between database views
  2. All Databases Overview: Comparison grid with status indicators
  3. Individual Database Panels: Full-featured interface for each system

Component Structure

frontend/src/
├── pages/
│   ├── Database.tsx        # Main page with tab navigation
│   └── Database.css        # Styles for all database components
├── hooks/
│   ├── useDuckDB.ts        # DuckDB-WASM hook
│   ├── useOxigraph.ts      # Oxigraph SPARQL hook
│   ├── usePostgreSQL.ts    # PostgreSQL REST hook
│   └── useTypeDB.ts        # TypeDB REST hook
└── components/database/
    ├── DuckDBPanel.tsx     # DuckDB interface
    ├── OxigraphPanel.tsx   # Oxigraph interface
    ├── PostgreSQLPanel.tsx # PostgreSQL interface
    ├── TypeDBPanel.tsx     # TypeDB interface
    └── index.ts            # Exports

Data Flow

NDE Institution Data

YAML Files (data/nde/enriched/entries/)
    │
    ├── scripts/export_nde_for_duckdb.py
    │   └── frontend/public/data/nde_institutions.json (DuckDB)
    │
    ├── scripts/nde_to_hc_rdf.py
    │   └── data/nde/rdf/*.ttl → Oxigraph
    │
    └── [Future] scripts/nde_to_typedb.py
        └── TypeDB

LinkML Schema

schemas/20251121/linkml/
├── 01_custodian_name_modular.yaml    # Main ontology schema
├── nde_enriched_entry.yaml           # NDE entry schema
└── modules/                          # Modular components

API Contracts

PostgreSQL REST API (Required)

// Expected endpoints
POST /api/postgres/query
{
  "sql": "SELECT * FROM institutions LIMIT 10"
}
// Response: { rows: [...], columns: [...] }

GET /api/postgres/tables
// Response: { tables: [...] }

GET /api/postgres/schema/:table
// Response: { columns: [...] }

TypeDB REST API (Required)

// Expected endpoints
POST /api/typedb/query
{
  "query": "match $x isa institution; get $x; limit 10;"
}
// Response: { results: [...] }

GET /api/typedb/schema
// Response: { entity_types: [...], relation_types: [...] }

GET /api/typedb/entity-types
// Response: { types: [...] }

Environment Variables

# Frontend (.env)
VITE_SPARQL_ENDPOINT=https://bronhouder.nl/sparql
VITE_POSTGRES_API_URL=https://bronhouder.nl/api/postgres
VITE_TYPEDB_API_URL=https://bronhouder.nl/api/typedb

Server Infrastructure

Server: 91.98.224.44 (Hetzner cx22)
├── Caddy (reverse proxy)
│   ├── / → /var/www/glam-frontend/
│   ├── /sparql → localhost:7878 (Oxigraph)
│   └── /api → localhost:8000 (FastAPI)
├── Oxigraph (port 7878)
├── GLAM API (port 8000)
└── [Future] PostgreSQL, TypeDB

Deployment

# Deploy frontend only
./infrastructure/deploy.sh --frontend

# Check status
./infrastructure/deploy.sh --status

# Deploy everything
./infrastructure/deploy.sh --all

Next Steps

  1. PostgreSQL Backend: Create FastAPI endpoints for PostgreSQL queries
  2. TypeDB Backend: Create FastAPI endpoints for TypeDB queries
  3. Data Sync: Implement data loading scripts for PostgreSQL/TypeDB
  4. Query Builder: Add visual query builder for non-technical users
  5. Export: Enable data export in multiple formats