glam/docs/POSTGIS_BOUNDARY_ARCHITECTURE.md
kempersc 83ab098cf7 feat: add PostGIS international boundary architecture
Add schema and tooling for storing administrative boundaries in PostGIS:
- 002_postgis_boundaries.sql: Complete PostGIS schema with:
  - boundary_countries (ISO 3166-1)
  - boundary_admin1 (states/provinces/regions)
  - boundary_admin2 (municipalities/districts)
  - boundary_historical (HALC pre-modern territories)
  - custodian_service_areas (computed werkgebied geometries)
  - geonames_settlements (reverse geocoding)
  - Spatial functions: find_admin_for_point, find_nearest_settlement
  - Views for API access

- load_boundaries_postgis.py: Python loader supporting:
  - GADM (Global Administrative Areas) - primary global source
  - CBS (Dutch municipality boundaries)
  - GeoNames settlements for reverse geocoding
  - Cached downloads and upsert logic

- POSTGIS_BOUNDARY_ARCHITECTURE.md: Design documentation

This replaces the static GeoJSON approach for international coverage.
2025-12-07 14:34:39 +01:00

189 lines
9 KiB
Markdown

# PostGIS International Boundary Architecture
## Overview
This document describes the PostGIS-based architecture for storing and querying international administrative boundaries to compute heritage custodian service areas ("werkgebied").
## Problem Statement
The current implementation uses static GeoJSON files for Netherlands-only boundaries:
- `netherlands_provinces.geojson` - 12 provinces
- `netherlands_municipalities.geojson` - ~350 municipalities
- `netherlands_historical_1500.geojson` - HALC historical territories
This approach **does not scale** for international coverage (Japan, Czechia, Germany, Belgium, Brazil, etc.) because:
1. GeoJSON files are too large for client-side loading (Germany has 400+ districts)
2. No server-side spatial queries (point-in-polygon, intersection)
3. No temporal versioning for boundary changes
4. No consistent administrative hierarchy across countries
## Solution Architecture
### Database Schema
```
┌─────────────────────────────────────────────────────────────────┐
│ PostGIS Database │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ boundary_countries │────▶│ boundary_admin1 │ │
│ │ (ISO 3166-1) │ │ (States/Provinces) │ │
│ │ - iso_a2: NL, DE... │ │ - iso_3166_2 │ │
│ │ - geom: POLYGON │ │ - geom: POLYGON │ │
│ └─────────────────────┘ └──────────┬──────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ boundary_admin2 │ │
│ │ (Municipalities) │ │
│ │ - geonames_id │ │
│ │ - cbs_gemeente_code│ │
│ │ - geom: POLYGON │ │
│ └──────────┬──────────┘ │
│ │ │
│ ┌─────────────────────┐ │ │
│ │ boundary_historical │ │ │
│ │ (HALC, pre-modern) │ │ │
│ │ - halc_adm1_code │ │ │
│ │ - period_start/end │ │ │
│ └─────────────────────┘ │ │
│ ▼ │
│ ┌───────────────────────────┐ │
│ │ custodian_service_areas │ │
│ │ (Computed werkgebied) │ │
│ │ - ghcid: NL-NH-HAA-A-NHA │ │
│ │ - admin2_ids: [1,2,3...] │ │
│ │ - geom: MULTIPOLYGON │ │
│ └───────────────────────────┘ │
│ │
│ ┌─────────────────────┐ │
│ │ geonames_settlements│ (For reverse geocoding) │
│ │ - geonames_id │ │
│ │ - name, ascii_name │ │
│ │ - geom: POINT │ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### Data Sources
| Source | Coverage | License | Admin Levels | Use Case |
|--------|----------|---------|--------------|----------|
| **GADM** | Global | CC-BY-NC | 0, 1, 2, 3+ | Primary global boundaries |
| **Natural Earth** | Global | Public Domain | 0, 1 | Simplified country shapes |
| **CBS** | Netherlands | CC-BY-4.0 | 2 (gemeente) | Official NL municipalities |
| **HALC** | Low Countries | Academic | Historical | Pre-1800 territories |
| **OSM** | Global | ODbL | Variable | Crowdsourced, current |
| **Eurostat** | Europe | Eurostat | NUTS/LAU | EU statistical regions |
### API Endpoints
The PostGIS database will be exposed via REST API:
```
GET /api/boundaries/countries
GET /api/boundaries/countries/{iso_a2}
GET /api/boundaries/admin1/{iso_a2}
GET /api/boundaries/admin2/{iso_a2}/{admin1_code}
GET /api/boundaries/point?lon={lon}&lat={lat}
GET /api/boundaries/service-area/{ghcid}
GET /api/boundaries/service-area/{ghcid}/geojson
```
### Frontend Integration
The frontend will:
1. Fetch service area GeoJSON via API (not static files)
2. Use MapLibre vector tiles for admin boundaries (optional optimization)
3. Cache frequently-accessed boundaries in browser
4. Support historical boundary display with temporal filtering
```typescript
// Example: Fetch service area for a custodian
const response = await fetch(`/api/boundaries/service-area/${ghcid}/geojson`);
const geojson = await response.json();
map.getSource('werkgebied').setData(geojson);
```
### Temporal Versioning
Boundaries change over time (municipal mergers, etc.). The schema supports:
```sql
-- Current boundary (valid_to IS NULL)
SELECT * FROM boundary_admin2 WHERE cbs_gemeente_code = 'GM0363' AND valid_to IS NULL;
-- Historical boundary (before merger)
SELECT * FROM boundary_admin2 WHERE cbs_gemeente_code = 'GM0363' AND valid_to = '2001-01-01';
```
### Service Area Computation
Service areas are computed from admin units:
```sql
-- Compute service area for Noord-Hollands Archief (serves Haarlem + region)
SELECT ST_Union(geom) AS service_area_geom
FROM boundary_admin2
WHERE id IN (SELECT unnest(admin2_ids) FROM custodian_service_areas WHERE ghcid = 'NL-NH-HAA-A-NHA');
```
Or pre-computed and stored:
```sql
-- Pre-compute and cache
INSERT INTO custodian_service_areas (ghcid, service_area_name, geom, admin2_ids)
VALUES (
'NL-NH-HAA-A-NHA',
'Noord-Hollands Archief Werkgebied',
compute_service_area_geometry(ARRAY[123, 124, 125, 126]), -- admin2 IDs
ARRAY[123, 124, 125, 126]
);
```
## Implementation Plan
### Phase 1: Schema & Initial Data (Current Sprint)
- [x] Create PostGIS schema (`002_postgis_boundaries.sql`)
- [x] Create boundary loading script (`load_boundaries_postgis.py`)
- [ ] Load Netherlands boundaries (CBS + provinces)
- [ ] Load HALC historical boundaries
- [ ] Migrate existing GeoJSON data
### Phase 2: International Expansion
- [ ] Load GADM for priority countries: JP, CZ, DE, BE, CH, AT
- [ ] Load GeoNames settlements for reverse geocoding
- [ ] Create API endpoints for boundary queries
- [ ] Update frontend to use API instead of static files
### Phase 3: Service Area Management
- [ ] Compute service areas for existing custodians
- [ ] Create admin UI for service area editing
- [ ] Implement temporal boundary display
- [ ] Add vector tile generation (optional optimization)
## Files Created
| File | Description |
|------|-------------|
| `infrastructure/sql/002_postgis_boundaries.sql` | PostGIS schema for boundaries |
| `scripts/load_boundaries_postgis.py` | Python script to load boundary data |
| `docs/POSTGIS_BOUNDARY_ARCHITECTURE.md` | This document |
## Dependencies
- PostgreSQL 14+ with PostGIS 3.3+
- Python: `psycopg2`, `geopandas`, `shapely`
- GADM data (downloaded on demand)
- CBS GeoJSON (existing in `frontend/public/data/`)
## Migration from Static GeoJSON
The current static GeoJSON approach will be **deprecated** but **not immediately removed**:
1. PostGIS becomes the source of truth for boundaries
2. API serves boundary GeoJSON on demand
3. Static files remain as fallback for development
4. Frontend gradually migrates to API-based loading