| .. | ||
| sql | ||
| terraform | ||
| api-requirements.txt | ||
| deploy-data.sh | ||
| deploy-linkml.sh | ||
| deploy.sh | ||
| README.md | ||
| setup-cicd.sh | ||
GLAM Infrastructure as Code
This directory contains Terraform configuration for deploying the GLAM Heritage Custodian Ontology infrastructure to Hetzner Cloud.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Hetzner Cloud │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ CX21 Server (2 vCPU, 4GB RAM) │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────────────┐ │ │
│ │ │ Caddy │ │ Oxigraph │ │ │
│ │ │ (Reverse Proxy) │──▶│ (SPARQL Triplestore) │ │ │
│ │ │ :443, :80 │ │ :7878 │ │ │
│ │ └──────────────────┘ └──────────────────────────┘ │ │
│ │ │ │ │ │
│ │ │ ┌─────────┴─────────┐ │ │
│ │ │ │ /mnt/data/oxigraph │ │
│ │ │ │ (Persistent Storage) │ │
│ │ │ └─────────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌────────▼────────────────────────────────────────────┐ │
│ │ │ Qdrant (Docker) │ │
│ │ │ Vector Database for RAG │ │
│ │ │ :6333 (REST), :6334 (gRPC) │ │
│ │ └─────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ │ ┌────────▼────────────────────────────────────────────┐ │
│ │ │ Frontend (Static Files) │ │
│ │ │ /var/www/glam-frontend │ │
│ │ └─────────────────────────────────────────────────────┘ │
│ │ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Hetzner Volume (50GB SSD) │ │
│ │ /mnt/data │ │
│ │ - Oxigraph database files │ │
│ │ - Qdrant storage & snapshots │ │
│ │ - Ontology files (.ttl, .rdf, .owl) │ │
│ │ - LinkML schemas (.yaml) │ │
│ │ - UML diagrams (.mmd) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Components
- Oxigraph: High-performance SPARQL triplestore for RDF data
- Qdrant: Vector database for semantic search and RAG (Retrieval-Augmented Generation)
- Caddy: Modern web server with automatic HTTPS
- Hetzner Cloud Server: CX21 (2 vCPU, 4GB RAM, 40GB SSD)
- Hetzner Volume: 50GB persistent storage for data
Prerequisites
- Terraform (>= 1.0)
- Hetzner Cloud Account
- Hetzner API Token
Setup
1. Create Hetzner API Token
- Log in to Hetzner Cloud Console
- Select your project (or create a new one)
- Go to Security > API Tokens
- Generate a new token with Read & Write permissions
- Copy the token (it's only shown once!)
2. Configure Terraform
cd infrastructure/terraform
# Copy the example variables file
cp terraform.tfvars.example terraform.tfvars
# Edit with your values
nano terraform.tfvars
Set the following variables:
hcloud_token: Your Hetzner API tokendomain: Your domain name (e.g.,sparql.glam-ontology.org)ssh_public_key: Path to your SSH public key
3. Initialize and Apply
# Initialize Terraform
terraform init
# Preview changes
terraform plan
# Apply configuration
terraform apply
4. Deploy Data
After the infrastructure is created, deploy your ontology data:
# SSH into the server
ssh root@$(terraform output -raw server_ip)
# Load ontology files into Oxigraph
cd /var/lib/glam/scripts
./load-ontologies.sh
Configuration Files
| File | Description |
|---|---|
main.tf |
Main infrastructure resources |
variables.tf |
Input variables |
outputs.tf |
Output values |
cloud-init.yaml |
Server initialization script |
terraform.tfvars.example |
Example variable values |
Costs
Estimated monthly costs (EUR):
| Resource | Cost |
|---|---|
| CX21 Server | ~€5.95/month |
| 50GB Volume | ~€2.50/month |
| IPv4 Address | Included |
| Total | ~€8.45/month |
Maintenance
Backup Data
# Backup Oxigraph database
ssh root@$SERVER_IP "tar -czf /tmp/oxigraph-backup.tar.gz /mnt/data/oxigraph"
scp root@$SERVER_IP:/tmp/oxigraph-backup.tar.gz ./backups/
Update Ontologies
# Copy new ontology files
scp -r schemas/20251121/rdf/*.ttl root@$SERVER_IP:/mnt/data/ontologies/
# Reload into Oxigraph
ssh root@$SERVER_IP "/var/lib/glam/scripts/load-ontologies.sh"
View Logs
ssh root@$SERVER_IP "journalctl -u oxigraph -f"
ssh root@$SERVER_IP "journalctl -u caddy -f"
CI/CD with GitHub Actions
The project includes automatic deployment via GitHub Actions. Changes to infrastructure, data, or frontend trigger automatic deployments.
Automatic Triggers
Deployments are triggered on push to main branch when files change in:
infrastructure/**→ Infrastructure deployment (Terraform)schemas/20251121/rdf/**→ Data syncschemas/20251121/linkml/**→ Data syncdata/ontology/**→ Data syncfrontend/**→ Frontend build & deploy
Setup CI/CD
Run the setup script to generate SSH keys and get instructions:
./setup-cicd.sh
This will:
- Generate an ED25519 SSH key pair for deployments
- Create a
.envfile template (if missing) - Print instructions for GitHub repository setup
Required GitHub Secrets
Go to Repository → Settings → Secrets and variables → Actions → Secrets:
| Secret | Description |
|---|---|
HETZNER_HC_API_TOKEN |
Hetzner Cloud API token (same as in .env) |
DEPLOY_SSH_PRIVATE_KEY |
Content of .ssh/glam_deploy_key (generated by setup script) |
TF_API_TOKEN |
(Optional) Terraform Cloud token for remote state |
Required GitHub Variables
Go to Repository → Settings → Secrets and variables → Actions → Variables:
| Variable | Example |
|---|---|
GLAM_DOMAIN |
sparql.glam-ontology.org |
ADMIN_EMAIL |
admin@example.org |
Manual Deployment
You can manually trigger deployments from the GitHub Actions tab with options:
- Deploy infrastructure changes - Run Terraform apply
- Deploy ontology/schema data - Sync data files to server
- Deploy frontend build - Build and deploy frontend
- Reload data into Oxigraph - Reimport all RDF data
Local Deployment
For local deployments without GitHub Actions:
# Full deployment
./deploy.sh --all
# Just infrastructure
./deploy.sh --infra
# Just data
./deploy.sh --data
# Just frontend
./deploy.sh --frontend
# Just Qdrant vector database
./deploy.sh --qdrant
# Reload Oxigraph
./deploy.sh --reload
Qdrant Vector Database
Qdrant is a self-hosted vector database used for semantic search over heritage institution data. It enables RAG (Retrieval-Augmented Generation) in the DSPy SPARQL generation module.
Configuration
- REST API:
http://localhost:6333(internal) - gRPC API:
http://localhost:6334(internal) - External Access:
https://<domain>/qdrant/(via Caddy reverse proxy) - Storage:
/mnt/data/qdrant/storage/ - Snapshots:
/mnt/data/qdrant/snapshots/ - Resource Limits: 2GB RAM, 2 CPUs
Deployment
# Deploy/restart Qdrant
./deploy.sh --qdrant
Health Check
# Check Qdrant health
ssh root@$SERVER_IP "curl http://localhost:6333/health"
# List collections
ssh root@$SERVER_IP "curl http://localhost:6333/collections"
DSPy Integration
The Qdrant retriever is integrated with DSPy for RAG-enhanced SPARQL query generation:
from glam_extractor.api.qdrant_retriever import HeritageCustodianRetriever
# Create retriever
retriever = HeritageCustodianRetriever(
host="localhost",
port=6333,
)
# Search for relevant institutions
results = retriever("museums in Amsterdam", k=5)
# Add institution to index
retriever.add_institution(
name="Rijksmuseum",
description="National museum of arts and history",
institution_type="MUSEUM",
country="NL",
city="Amsterdam",
)
Environment Variables
| Variable | Default | Description |
|---|---|---|
QDRANT_HOST |
localhost |
Qdrant server hostname |
QDRANT_PORT |
6333 |
Qdrant REST API port |
QDRANT_ENABLED |
true |
Enable/disable Qdrant integration |
OPENAI_API_KEY |
- | Required for embedding generation |
Destroy Infrastructure
terraform destroy
Warning: This will delete all data. Make sure to backup first!