# PostGIS International Boundary Architecture ## Overview This document describes the PostGIS-based architecture for storing and querying international administrative boundaries to compute heritage custodian service areas ("werkgebied"). ## Problem Statement The current implementation uses static GeoJSON files for Netherlands-only boundaries: - `netherlands_provinces.geojson` - 12 provinces - `netherlands_municipalities.geojson` - ~350 municipalities - `netherlands_historical_1500.geojson` - HALC historical territories This approach **does not scale** for international coverage (Japan, Czechia, Germany, Belgium, Brazil, etc.) because: 1. GeoJSON files are too large for client-side loading (Germany has 400+ districts) 2. No server-side spatial queries (point-in-polygon, intersection) 3. No temporal versioning for boundary changes 4. No consistent administrative hierarchy across countries ## Solution Architecture ### Database Schema ``` ┌─────────────────────────────────────────────────────────────────┐ │ PostGIS Database │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ │ boundary_countries │────▶│ boundary_admin1 │ │ │ │ (ISO 3166-1) │ │ (States/Provinces) │ │ │ │ - iso_a2: NL, DE... │ │ - iso_3166_2 │ │ │ │ - geom: POLYGON │ │ - geom: POLYGON │ │ │ └─────────────────────┘ └──────────┬──────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ boundary_admin2 │ │ │ │ (Municipalities) │ │ │ │ - geonames_id │ │ │ │ - cbs_gemeente_code│ │ │ │ - geom: POLYGON │ │ │ └──────────┬──────────┘ │ │ │ │ │ ┌─────────────────────┐ │ │ │ │ boundary_historical │ │ │ │ │ (HALC, pre-modern) │ │ │ │ │ - halc_adm1_code │ │ │ │ │ - period_start/end │ │ │ │ └─────────────────────┘ │ │ │ ▼ │ │ ┌───────────────────────────┐ │ │ │ custodian_service_areas │ │ │ │ (Computed werkgebied) │ │ │ │ - ghcid: NL-NH-HAA-A-NHA │ │ │ │ - admin2_ids: [1,2,3...] │ │ │ │ - geom: MULTIPOLYGON │ │ │ └───────────────────────────┘ │ │ │ │ ┌─────────────────────┐ │ │ │ geonames_settlements│ (For reverse geocoding) │ │ │ - geonames_id │ │ │ │ - name, ascii_name │ │ │ │ - geom: POINT │ │ │ └─────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Data Sources | Source | Coverage | License | Admin Levels | Use Case | |--------|----------|---------|--------------|----------| | **GADM** | Global | CC-BY-NC | 0, 1, 2, 3+ | Primary global boundaries | | **Natural Earth** | Global | Public Domain | 0, 1 | Simplified country shapes | | **CBS** | Netherlands | CC-BY-4.0 | 2 (gemeente) | Official NL municipalities | | **HALC** | Low Countries | Academic | Historical | Pre-1800 territories | | **OSM** | Global | ODbL | Variable | Crowdsourced, current | | **Eurostat** | Europe | Eurostat | NUTS/LAU | EU statistical regions | ### API Endpoints The PostGIS database will be exposed via REST API: ``` GET /api/boundaries/countries GET /api/boundaries/countries/{iso_a2} GET /api/boundaries/admin1/{iso_a2} GET /api/boundaries/admin2/{iso_a2}/{admin1_code} GET /api/boundaries/point?lon={lon}&lat={lat} GET /api/boundaries/service-area/{ghcid} GET /api/boundaries/service-area/{ghcid}/geojson ``` ### Frontend Integration The frontend will: 1. Fetch service area GeoJSON via API (not static files) 2. Use MapLibre vector tiles for admin boundaries (optional optimization) 3. Cache frequently-accessed boundaries in browser 4. Support historical boundary display with temporal filtering ```typescript // Example: Fetch service area for a custodian const response = await fetch(`/api/boundaries/service-area/${ghcid}/geojson`); const geojson = await response.json(); map.getSource('werkgebied').setData(geojson); ``` ### Temporal Versioning Boundaries change over time (municipal mergers, etc.). The schema supports: ```sql -- Current boundary (valid_to IS NULL) SELECT * FROM boundary_admin2 WHERE cbs_gemeente_code = 'GM0363' AND valid_to IS NULL; -- Historical boundary (before merger) SELECT * FROM boundary_admin2 WHERE cbs_gemeente_code = 'GM0363' AND valid_to = '2001-01-01'; ``` ### Service Area Computation Service areas are computed from admin units: ```sql -- Compute service area for Noord-Hollands Archief (serves Haarlem + region) SELECT ST_Union(geom) AS service_area_geom FROM boundary_admin2 WHERE id IN (SELECT unnest(admin2_ids) FROM custodian_service_areas WHERE ghcid = 'NL-NH-HAA-A-NHA'); ``` Or pre-computed and stored: ```sql -- Pre-compute and cache INSERT INTO custodian_service_areas (ghcid, service_area_name, geom, admin2_ids) VALUES ( 'NL-NH-HAA-A-NHA', 'Noord-Hollands Archief Werkgebied', compute_service_area_geometry(ARRAY[123, 124, 125, 126]), -- admin2 IDs ARRAY[123, 124, 125, 126] ); ``` ## Implementation Plan ### Phase 1: Schema & Initial Data (Current Sprint) - [x] Create PostGIS schema (`002_postgis_boundaries.sql`) - [x] Create boundary loading script (`load_boundaries_postgis.py`) - [ ] Load Netherlands boundaries (CBS + provinces) - [ ] Load HALC historical boundaries - [ ] Migrate existing GeoJSON data ### Phase 2: International Expansion - [ ] Load GADM for priority countries: JP, CZ, DE, BE, CH, AT - [ ] Load GeoNames settlements for reverse geocoding - [ ] Create API endpoints for boundary queries - [ ] Update frontend to use API instead of static files ### Phase 3: Service Area Management - [ ] Compute service areas for existing custodians - [ ] Create admin UI for service area editing - [ ] Implement temporal boundary display - [ ] Add vector tile generation (optional optimization) ## Files Created | File | Description | |------|-------------| | `infrastructure/sql/002_postgis_boundaries.sql` | PostGIS schema for boundaries | | `scripts/load_boundaries_postgis.py` | Python script to load boundary data | | `docs/POSTGIS_BOUNDARY_ARCHITECTURE.md` | This document | ## Dependencies - PostgreSQL 14+ with PostGIS 3.3+ - Python: `psycopg2`, `geopandas`, `shapely` - GADM data (downloaded on demand) - CBS GeoJSON (existing in `frontend/public/data/`) ## Migration from Static GeoJSON The current static GeoJSON approach will be **deprecated** but **not immediately removed**: 1. PostGIS becomes the source of truth for boundaries 2. API serves boundary GeoJSON on demand 3. Static files remain as fallback for development 4. Frontend gradually migrates to API-based loading