glam/docs/POSTGIS_BOUNDARY_ARCHITECTURE.md
kempersc 83ab098cf7 feat: add PostGIS international boundary architecture
Add schema and tooling for storing administrative boundaries in PostGIS:
- 002_postgis_boundaries.sql: Complete PostGIS schema with:
  - boundary_countries (ISO 3166-1)
  - boundary_admin1 (states/provinces/regions)
  - boundary_admin2 (municipalities/districts)
  - boundary_historical (HALC pre-modern territories)
  - custodian_service_areas (computed werkgebied geometries)
  - geonames_settlements (reverse geocoding)
  - Spatial functions: find_admin_for_point, find_nearest_settlement
  - Views for API access

- load_boundaries_postgis.py: Python loader supporting:
  - GADM (Global Administrative Areas) - primary global source
  - CBS (Dutch municipality boundaries)
  - GeoNames settlements for reverse geocoding
  - Cached downloads and upsert logic

- POSTGIS_BOUNDARY_ARCHITECTURE.md: Design documentation

This replaces the static GeoJSON approach for international coverage.
2025-12-07 14:34:39 +01:00

9 KiB

PostGIS International Boundary Architecture

Overview

This document describes the PostGIS-based architecture for storing and querying international administrative boundaries to compute heritage custodian service areas ("werkgebied").

Problem Statement

The current implementation uses static GeoJSON files for Netherlands-only boundaries:

  • netherlands_provinces.geojson - 12 provinces
  • netherlands_municipalities.geojson - ~350 municipalities
  • netherlands_historical_1500.geojson - HALC historical territories

This approach does not scale for international coverage (Japan, Czechia, Germany, Belgium, Brazil, etc.) because:

  1. GeoJSON files are too large for client-side loading (Germany has 400+ districts)
  2. No server-side spatial queries (point-in-polygon, intersection)
  3. No temporal versioning for boundary changes
  4. No consistent administrative hierarchy across countries

Solution Architecture

Database Schema

┌─────────────────────────────────────────────────────────────────┐
│                        PostGIS Database                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────┐     ┌─────────────────────┐           │
│  │ boundary_countries  │────▶│  boundary_admin1    │           │
│  │ (ISO 3166-1)        │     │  (States/Provinces) │           │
│  │ - iso_a2: NL, DE... │     │  - iso_3166_2       │           │
│  │ - geom: POLYGON     │     │  - geom: POLYGON    │           │
│  └─────────────────────┘     └──────────┬──────────┘           │
│                                         │                       │
│                                         ▼                       │
│                              ┌─────────────────────┐           │
│                              │  boundary_admin2    │           │
│                              │  (Municipalities)   │           │
│                              │  - geonames_id      │           │
│                              │  - cbs_gemeente_code│           │
│                              │  - geom: POLYGON    │           │
│                              └──────────┬──────────┘           │
│                                         │                       │
│  ┌─────────────────────┐               │                       │
│  │ boundary_historical │               │                       │
│  │ (HALC, pre-modern)  │               │                       │
│  │ - halc_adm1_code    │               │                       │
│  │ - period_start/end  │               │                       │
│  └─────────────────────┘               │                       │
│                                         ▼                       │
│                         ┌───────────────────────────┐          │
│                         │ custodian_service_areas   │          │
│                         │ (Computed werkgebied)     │          │
│                         │ - ghcid: NL-NH-HAA-A-NHA  │          │
│                         │ - admin2_ids: [1,2,3...]  │          │
│                         │ - geom: MULTIPOLYGON      │          │
│                         └───────────────────────────┘          │
│                                                                 │
│  ┌─────────────────────┐                                       │
│  │ geonames_settlements│ (For reverse geocoding)               │
│  │ - geonames_id       │                                       │
│  │ - name, ascii_name  │                                       │
│  │ - geom: POINT       │                                       │
│  └─────────────────────┘                                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Data Sources

Source Coverage License Admin Levels Use Case
GADM Global CC-BY-NC 0, 1, 2, 3+ Primary global boundaries
Natural Earth Global Public Domain 0, 1 Simplified country shapes
CBS Netherlands CC-BY-4.0 2 (gemeente) Official NL municipalities
HALC Low Countries Academic Historical Pre-1800 territories
OSM Global ODbL Variable Crowdsourced, current
Eurostat Europe Eurostat NUTS/LAU EU statistical regions

API Endpoints

The PostGIS database will be exposed via REST API:

GET /api/boundaries/countries
GET /api/boundaries/countries/{iso_a2}
GET /api/boundaries/admin1/{iso_a2}
GET /api/boundaries/admin2/{iso_a2}/{admin1_code}
GET /api/boundaries/point?lon={lon}&lat={lat}
GET /api/boundaries/service-area/{ghcid}
GET /api/boundaries/service-area/{ghcid}/geojson

Frontend Integration

The frontend will:

  1. Fetch service area GeoJSON via API (not static files)
  2. Use MapLibre vector tiles for admin boundaries (optional optimization)
  3. Cache frequently-accessed boundaries in browser
  4. Support historical boundary display with temporal filtering
// Example: Fetch service area for a custodian
const response = await fetch(`/api/boundaries/service-area/${ghcid}/geojson`);
const geojson = await response.json();
map.getSource('werkgebied').setData(geojson);

Temporal Versioning

Boundaries change over time (municipal mergers, etc.). The schema supports:

-- Current boundary (valid_to IS NULL)
SELECT * FROM boundary_admin2 WHERE cbs_gemeente_code = 'GM0363' AND valid_to IS NULL;

-- Historical boundary (before merger)
SELECT * FROM boundary_admin2 WHERE cbs_gemeente_code = 'GM0363' AND valid_to = '2001-01-01';

Service Area Computation

Service areas are computed from admin units:

-- Compute service area for Noord-Hollands Archief (serves Haarlem + region)
SELECT ST_Union(geom) AS service_area_geom
FROM boundary_admin2
WHERE id IN (SELECT unnest(admin2_ids) FROM custodian_service_areas WHERE ghcid = 'NL-NH-HAA-A-NHA');

Or pre-computed and stored:

-- Pre-compute and cache
INSERT INTO custodian_service_areas (ghcid, service_area_name, geom, admin2_ids)
VALUES (
    'NL-NH-HAA-A-NHA',
    'Noord-Hollands Archief Werkgebied',
    compute_service_area_geometry(ARRAY[123, 124, 125, 126]),  -- admin2 IDs
    ARRAY[123, 124, 125, 126]
);

Implementation Plan

Phase 1: Schema & Initial Data (Current Sprint)

  • Create PostGIS schema (002_postgis_boundaries.sql)
  • Create boundary loading script (load_boundaries_postgis.py)
  • Load Netherlands boundaries (CBS + provinces)
  • Load HALC historical boundaries
  • Migrate existing GeoJSON data

Phase 2: International Expansion

  • Load GADM for priority countries: JP, CZ, DE, BE, CH, AT
  • Load GeoNames settlements for reverse geocoding
  • Create API endpoints for boundary queries
  • Update frontend to use API instead of static files

Phase 3: Service Area Management

  • Compute service areas for existing custodians
  • Create admin UI for service area editing
  • Implement temporal boundary display
  • Add vector tile generation (optional optimization)

Files Created

File Description
infrastructure/sql/002_postgis_boundaries.sql PostGIS schema for boundaries
scripts/load_boundaries_postgis.py Python script to load boundary data
docs/POSTGIS_BOUNDARY_ARCHITECTURE.md This document

Dependencies

  • PostgreSQL 14+ with PostGIS 3.3+
  • Python: psycopg2, geopandas, shapely
  • GADM data (downloaded on demand)
  • CBS GeoJSON (existing in frontend/public/data/)

Migration from Static GeoJSON

The current static GeoJSON approach will be deprecated but not immediately removed:

  1. PostGIS becomes the source of truth for boundaries
  2. API serves boundary GeoJSON on demand
  3. Static files remain as fallback for development
  4. Frontend gradually migrates to API-based loading