glam/docs/MULTI_DATABASE_ARCHITECTURE.md
2025-12-06 19:50:04 +01:00

296 lines
9.2 KiB
Markdown

# Multi-Database Architecture for bronhouder.nl
This document describes the multi-database architecture implemented for the GLAM data platform at bronhouder.nl.
## Overview
The Database page (`/database`) provides a unified interface for exploring heritage custodian data across four different database systems, each optimized for different use cases:
```
┌─────────────────────────────────────────────────────────────────┐
│ bronhouder.nl/database │
├─────────────────────────────────────────────────────────────────┤
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ 🦆 DuckDB│ │🐘Postgres│ │ 🧠 TypeDB│ │🔗Oxigraph│ │
│ │ (Browser)│ │ (Server) │ │ (Server) │ │ (Server) │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ In-Browser REST API REST API SPARQL │
│ WASM Proxy Proxy Endpoint │
└───────┼─────────────┼─────────────┼─────────────┼───────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Browser │ │PostgreSQL│ │ TypeDB │ │ Oxigraph │
│ Memory │ │ Database │ │ Database │ │ Store │
└─────────┘ └──────────┘ └──────────┘ └──────────┘
```
## Database Systems
### 1. DuckDB (In-Browser OLAP)
**Status**: ✅ Fully Operational
**Technology**: DuckDB-WASM running entirely in the browser
**Use Cases**:
- Ad-hoc SQL analytics on heritage institution data
- Fast aggregations and filtering
- Data exploration without server round-trips
**Data Source**: `/data/nde_institutions.json` (10.8 MB, 1,863 institutions)
**Features**:
- Upload JSON/CSV/Parquet files directly
- Run SQL queries in-browser
- No server dependency
- Export query results
**Hook**: `frontend/src/hooks/useDuckDB.ts`
**Panel**: `frontend/src/components/database/DuckDBPanel.tsx`
### 2. PostgreSQL (Relational)
**Status**: ✅ Fully Operational (as of 2025-12-06)
**Technology**: PostgreSQL 16.11 with FastAPI REST proxy
**Endpoint**: `https://bronhouder.nl/api/postgres`
**Use Cases**:
- Complex relational queries
- Full-text search on institution names
- Transactional operations
- Integration with existing tools
**Data**: 1,838 NDE heritage institutions with:
- 32 columns including coordinates, ratings, reviews
- GHCID identifiers (text, UUID, numeric)
- JSONB fields for wikidata_types, reviews, identifiers, genealogiewerkbalk
**API Endpoints**:
- `GET /` - Health check and statistics
- `POST /query` - Execute SQL query (read-only SELECT/WITH)
- `GET /tables` - List all tables with metadata
- `GET /schema/{table}` - Get table schema
- `GET /stats` - Detailed database statistics
**Backend**: `/opt/glam-backend/postgres/` on server
- `main.py` - FastAPI application
- `load_nde_data.py` - Data loading script
- Systemd service: `glam-postgres-api.service`
**Hook**: `frontend/src/hooks/usePostgreSQL.ts`
**Panel**: `frontend/src/components/database/PostgreSQLPanel.tsx`
### 3. Oxigraph (RDF/SPARQL)
**Status**: ✅ Fully Operational
**Technology**: Oxigraph SPARQL endpoint on server
**Use Cases**:
- Linked Data queries
- Ontology exploration
- Cross-referencing with Wikidata, Schema.org
- Semantic reasoning
**Endpoint**: `https://bronhouder.nl/sparql` (proxied to 91.98.224.44:7878)
**Triple Count**: 426,243 triples
**Features**:
- SPARQL 1.1 query interface
- Graph exploration
- Namespace prefix management
- RDF upload (Turtle, N-Triples, JSON-LD)
**Hook**: `frontend/src/hooks/useOxigraph.ts`
**Panel**: `frontend/src/components/database/OxigraphPanel.tsx`
### 3. PostgreSQL (Relational)
**Status**: ⏳ Requires Backend API
**Technology**: PostgreSQL database with REST API proxy
**Use Cases**:
- Complex relational queries
- Full-text search
- Transactional operations
- Integration with existing tools
**Required**: REST API at `/api/postgres` or `VITE_POSTGRES_API_URL`
**Planned Features**:
- Table/schema browser
- SQL query interface
- Query history
- Export to CSV
**Hook**: `frontend/src/hooks/usePostgreSQL.ts`
**Panel**: `frontend/src/components/database/PostgreSQLPanel.tsx`
### 4. TypeDB (Knowledge Graph)
**Status**: ⏳ Deferred - Server has only 3.7GB RAM, TypeDB requires 4GB+
**Technology**: TypeDB with REST API proxy
**Use Cases**:
- Complex knowledge graph queries
- Multi-hop relationship traversal
- Temporal reasoning (organizational changes)
- Entity resolution
**Note**: To enable TypeDB, upgrade server to cx32 (8GB RAM) or higher.
**Hook**: `frontend/src/hooks/useTypeDB.ts`
**Panel**: `frontend/src/components/database/TypeDBPanel.tsx`
## Frontend Components
### Database Page (`/database`)
The main Database page provides:
1. **Tab Navigation**: Switch between database views
2. **All Databases Overview**: Comparison grid with status indicators
3. **Individual Database Panels**: Full-featured interface for each system
### Component Structure
```
frontend/src/
├── pages/
│ ├── Database.tsx # Main page with tab navigation
│ └── Database.css # Styles for all database components
├── hooks/
│ ├── useDuckDB.ts # DuckDB-WASM hook
│ ├── useOxigraph.ts # Oxigraph SPARQL hook
│ ├── usePostgreSQL.ts # PostgreSQL REST hook
│ └── useTypeDB.ts # TypeDB REST hook
└── components/database/
├── DuckDBPanel.tsx # DuckDB interface
├── OxigraphPanel.tsx # Oxigraph interface
├── PostgreSQLPanel.tsx # PostgreSQL interface
├── TypeDBPanel.tsx # TypeDB interface
└── index.ts # Exports
```
## Data Flow
### NDE Institution Data
```
YAML Files (data/nde/enriched/entries/)
├── scripts/export_nde_for_duckdb.py
│ └── frontend/public/data/nde_institutions.json (DuckDB)
├── scripts/nde_to_hc_rdf.py
│ └── data/nde/rdf/*.ttl → Oxigraph
└── [Future] scripts/nde_to_typedb.py
└── TypeDB
```
### LinkML Schema
```
schemas/20251121/linkml/
├── 01_custodian_name_modular.yaml # Main ontology schema
├── nde_enriched_entry.yaml # NDE entry schema
└── modules/ # Modular components
```
## API Contracts
### PostgreSQL REST API (Required)
```typescript
// Expected endpoints
POST /api/postgres/query
{
"sql": "SELECT * FROM institutions LIMIT 10"
}
// Response: { rows: [...], columns: [...] }
GET /api/postgres/tables
// Response: { tables: [...] }
GET /api/postgres/schema/:table
// Response: { columns: [...] }
```
### TypeDB REST API (Required)
```typescript
// Expected endpoints
POST /api/typedb/query
{
"query": "match $x isa institution; get $x; limit 10;"
}
// Response: { results: [...] }
GET /api/typedb/schema
// Response: { entity_types: [...], relation_types: [...] }
GET /api/typedb/entity-types
// Response: { types: [...] }
```
## Environment Variables
```bash
# Frontend (.env)
VITE_SPARQL_ENDPOINT=https://bronhouder.nl/sparql
VITE_POSTGRES_API_URL=https://bronhouder.nl/api/postgres
VITE_TYPEDB_API_URL=https://bronhouder.nl/api/typedb
```
## Server Infrastructure
```
Server: 91.98.224.44 (Hetzner cx22)
├── Caddy (reverse proxy)
│ ├── / → /var/www/glam-frontend/
│ ├── /sparql → localhost:7878 (Oxigraph)
│ └── /api → localhost:8000 (FastAPI)
├── Oxigraph (port 7878)
├── GLAM API (port 8000)
└── [Future] PostgreSQL, TypeDB
```
## Deployment
```bash
# Deploy frontend only
./infrastructure/deploy.sh --frontend
# Check status
./infrastructure/deploy.sh --status
# Deploy everything
./infrastructure/deploy.sh --all
```
## Next Steps
1. **PostgreSQL Backend**: Create FastAPI endpoints for PostgreSQL queries
2. **TypeDB Backend**: Create FastAPI endpoints for TypeDB queries
3. **Data Sync**: Implement data loading scripts for PostgreSQL/TypeDB
4. **Query Builder**: Add visual query builder for non-technical users
5. **Export**: Enable data export in multiple formats
## Related Documentation
- [AGENTS.md](../AGENTS.md) - AI agent instructions
- [DEPLOYMENT_GUIDE.md](./DEPLOYMENT_GUIDE.md) - Deployment procedures
- [SCHEMA_MODULES.md](./SCHEMA_MODULES.md) - LinkML schema architecture