# GLAM Infrastructure as Code This directory contains Terraform configuration for deploying the GLAM Heritage Custodian Ontology infrastructure to Hetzner Cloud. ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ Hetzner Cloud │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ CX21 Server (2 vCPU, 4GB RAM) │ │ │ │ │ │ │ │ ┌──────────────────┐ ┌──────────────────────────┐ │ │ │ │ │ Caddy │ │ Oxigraph │ │ │ │ │ │ (Reverse Proxy) │──▶│ (SPARQL Triplestore) │ │ │ │ │ │ :443, :80 │ │ :7878 │ │ │ │ │ └──────────────────┘ └──────────────────────────┘ │ │ │ │ │ │ │ │ │ │ │ ┌─────────┴─────────┐ │ │ │ │ │ │ /mnt/data/oxigraph │ │ │ │ │ │ (Persistent Storage) │ │ │ │ │ └─────────────────────────────┘ │ │ │ │ │ │ │ │ │ ┌────────▼────────────────────────────────────────────┐ │ │ │ │ Qdrant (Docker) │ │ │ │ │ Vector Database for RAG │ │ │ │ │ :6333 (REST), :6334 (gRPC) │ │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌────────▼────────────────────────────────────────────┐ │ │ │ │ Frontend (Static Files) │ │ │ │ │ /var/www/glam-frontend │ │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Hetzner Volume (50GB SSD) │ │ │ │ /mnt/data │ │ │ │ - Oxigraph database files │ │ │ │ - Qdrant storage & snapshots │ │ │ │ - Ontology files (.ttl, .rdf, .owl) │ │ │ │ - LinkML schemas (.yaml) │ │ │ │ - UML diagrams (.mmd) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ## Components - **Oxigraph**: High-performance SPARQL triplestore for RDF data - **Qdrant**: Vector database for semantic search and RAG (Retrieval-Augmented Generation) - **Caddy**: Modern web server with automatic HTTPS - **Hetzner Cloud Server**: CX21 (2 vCPU, 4GB RAM, 40GB SSD) - **Hetzner Volume**: 50GB persistent storage for data ## Prerequisites 1. [Terraform](https://www.terraform.io/downloads) (>= 1.0) 2. [Hetzner Cloud Account](https://console.hetzner.cloud/) 3. Hetzner API Token ## Setup ### 1. Create Hetzner API Token 1. Log in to [Hetzner Cloud Console](https://console.hetzner.cloud/) 2. Select your project (or create a new one) 3. Go to **Security** > **API Tokens** 4. Generate a new token with **Read & Write** permissions 5. Copy the token (it's only shown once!) ### 2. Configure Terraform ```bash cd infrastructure/terraform # Copy the example variables file cp terraform.tfvars.example terraform.tfvars # Edit with your values nano terraform.tfvars ``` Set the following variables: - `hcloud_token`: Your Hetzner API token - `domain`: Your domain name (e.g., `sparql.glam-ontology.org`) - `ssh_public_key`: Path to your SSH public key ### 3. Initialize and Apply ```bash # Initialize Terraform terraform init # Preview changes terraform plan # Apply configuration terraform apply ``` ### 4. Deploy Data After the infrastructure is created, deploy your ontology data: ```bash # SSH into the server ssh root@$(terraform output -raw server_ip) # Load ontology files into Oxigraph cd /var/lib/glam/scripts ./load-ontologies.sh ``` ## Configuration Files | File | Description | |------|-------------| | `main.tf` | Main infrastructure resources | | `variables.tf` | Input variables | | `outputs.tf` | Output values | | `cloud-init.yaml` | Server initialization script | | `terraform.tfvars.example` | Example variable values | ## Costs Estimated monthly costs (EUR): | Resource | Cost | |----------|------| | CX21 Server | ~€5.95/month | | 50GB Volume | ~€2.50/month | | IPv4 Address | Included | | **Total** | ~€8.45/month | ## Maintenance ### Backup Data ```bash # Backup Oxigraph database ssh root@$SERVER_IP "tar -czf /tmp/oxigraph-backup.tar.gz /mnt/data/oxigraph" scp root@$SERVER_IP:/tmp/oxigraph-backup.tar.gz ./backups/ ``` ### Update Ontologies ```bash # Copy new ontology files scp -r schemas/20251121/rdf/*.ttl root@$SERVER_IP:/mnt/data/ontologies/ # Reload into Oxigraph ssh root@$SERVER_IP "/var/lib/glam/scripts/load-ontologies.sh" ``` ### View Logs ```bash ssh root@$SERVER_IP "journalctl -u oxigraph -f" ssh root@$SERVER_IP "journalctl -u caddy -f" ``` ## CI/CD with GitHub Actions The project includes automatic deployment via GitHub Actions. Changes to infrastructure, data, or frontend trigger automatic deployments. ### Automatic Triggers Deployments are triggered on push to `main` branch when files change in: - `infrastructure/**` → Infrastructure deployment (Terraform) - `schemas/20251121/rdf/**` → Data sync - `schemas/20251121/linkml/**` → Data sync - `data/ontology/**` → Data sync - `frontend/**` → Frontend build & deploy ### Setup CI/CD Run the setup script to generate SSH keys and get instructions: ```bash ./setup-cicd.sh ``` This will: 1. Generate an ED25519 SSH key pair for deployments 2. Create a `.env` file template (if missing) 3. Print instructions for GitHub repository setup ### Required GitHub Secrets Go to **Repository → Settings → Secrets and variables → Actions → Secrets**: | Secret | Description | |--------|-------------| | `HETZNER_HC_API_TOKEN` | Hetzner Cloud API token (same as in `.env`) | | `DEPLOY_SSH_PRIVATE_KEY` | Content of `.ssh/glam_deploy_key` (generated by setup script) | | `TF_API_TOKEN` | (Optional) Terraform Cloud token for remote state | ### Required GitHub Variables Go to **Repository → Settings → Secrets and variables → Actions → Variables**: | Variable | Example | |----------|---------| | `GLAM_DOMAIN` | `sparql.glam-ontology.org` | | `ADMIN_EMAIL` | `admin@example.org` | ### Manual Deployment You can manually trigger deployments from the GitHub Actions tab with options: - **Deploy infrastructure changes** - Run Terraform apply - **Deploy ontology/schema data** - Sync data files to server - **Deploy frontend build** - Build and deploy frontend - **Reload data into Oxigraph** - Reimport all RDF data ### Local Deployment For local deployments without GitHub Actions: ```bash # Full deployment ./deploy.sh --all # Just infrastructure ./deploy.sh --infra # Just data ./deploy.sh --data # Just frontend ./deploy.sh --frontend # Just Qdrant vector database ./deploy.sh --qdrant # Reload Oxigraph ./deploy.sh --reload ``` ## Qdrant Vector Database Qdrant is a self-hosted vector database used for semantic search over heritage institution data. It enables RAG (Retrieval-Augmented Generation) in the DSPy SPARQL generation module. ### Configuration - **REST API**: `http://localhost:6333` (internal) - **gRPC API**: `http://localhost:6334` (internal) - **External Access**: `https:///qdrant/` (via Caddy reverse proxy) - **Storage**: `/mnt/data/qdrant/storage/` - **Snapshots**: `/mnt/data/qdrant/snapshots/` - **Resource Limits**: 2GB RAM, 2 CPUs ### Deployment ```bash # Deploy/restart Qdrant ./deploy.sh --qdrant ``` ### Health Check ```bash # Check Qdrant health ssh root@$SERVER_IP "curl http://localhost:6333/health" # List collections ssh root@$SERVER_IP "curl http://localhost:6333/collections" ``` ### DSPy Integration The Qdrant retriever is integrated with DSPy for RAG-enhanced SPARQL query generation: ```python from glam_extractor.api.qdrant_retriever import HeritageCustodianRetriever # Create retriever retriever = HeritageCustodianRetriever( host="localhost", port=6333, ) # Search for relevant institutions results = retriever("museums in Amsterdam", k=5) # Add institution to index retriever.add_institution( name="Rijksmuseum", description="National museum of arts and history", institution_type="MUSEUM", country="NL", city="Amsterdam", ) ``` ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `QDRANT_HOST` | `localhost` | Qdrant server hostname | | `QDRANT_PORT` | `6333` | Qdrant REST API port | | `QDRANT_ENABLED` | `true` | Enable/disable Qdrant integration | | `OPENAI_API_KEY` | - | Required for embedding generation | ## Destroy Infrastructure ```bash terraform destroy ``` **Warning**: This will delete all data. Make sure to backup first!