Add schema and tooling for storing administrative boundaries in PostGIS: - 002_postgis_boundaries.sql: Complete PostGIS schema with: - boundary_countries (ISO 3166-1) - boundary_admin1 (states/provinces/regions) - boundary_admin2 (municipalities/districts) - boundary_historical (HALC pre-modern territories) - custodian_service_areas (computed werkgebied geometries) - geonames_settlements (reverse geocoding) - Spatial functions: find_admin_for_point, find_nearest_settlement - Views for API access - load_boundaries_postgis.py: Python loader supporting: - GADM (Global Administrative Areas) - primary global source - CBS (Dutch municipality boundaries) - GeoNames settlements for reverse geocoding - Cached downloads and upsert logic - POSTGIS_BOUNDARY_ARCHITECTURE.md: Design documentation This replaces the static GeoJSON approach for international coverage.
189 lines
9 KiB
Markdown
189 lines
9 KiB
Markdown
# PostGIS International Boundary Architecture
|
|
|
|
## Overview
|
|
|
|
This document describes the PostGIS-based architecture for storing and querying international administrative boundaries to compute heritage custodian service areas ("werkgebied").
|
|
|
|
## Problem Statement
|
|
|
|
The current implementation uses static GeoJSON files for Netherlands-only boundaries:
|
|
- `netherlands_provinces.geojson` - 12 provinces
|
|
- `netherlands_municipalities.geojson` - ~350 municipalities
|
|
- `netherlands_historical_1500.geojson` - HALC historical territories
|
|
|
|
This approach **does not scale** for international coverage (Japan, Czechia, Germany, Belgium, Brazil, etc.) because:
|
|
1. GeoJSON files are too large for client-side loading (Germany has 400+ districts)
|
|
2. No server-side spatial queries (point-in-polygon, intersection)
|
|
3. No temporal versioning for boundary changes
|
|
4. No consistent administrative hierarchy across countries
|
|
|
|
## Solution Architecture
|
|
|
|
### Database Schema
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ PostGIS Database │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
|
│ │ boundary_countries │────▶│ boundary_admin1 │ │
|
|
│ │ (ISO 3166-1) │ │ (States/Provinces) │ │
|
|
│ │ - iso_a2: NL, DE... │ │ - iso_3166_2 │ │
|
|
│ │ - geom: POLYGON │ │ - geom: POLYGON │ │
|
|
│ └─────────────────────┘ └──────────┬──────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌─────────────────────┐ │
|
|
│ │ boundary_admin2 │ │
|
|
│ │ (Municipalities) │ │
|
|
│ │ - geonames_id │ │
|
|
│ │ - cbs_gemeente_code│ │
|
|
│ │ - geom: POLYGON │ │
|
|
│ └──────────┬──────────┘ │
|
|
│ │ │
|
|
│ ┌─────────────────────┐ │ │
|
|
│ │ boundary_historical │ │ │
|
|
│ │ (HALC, pre-modern) │ │ │
|
|
│ │ - halc_adm1_code │ │ │
|
|
│ │ - period_start/end │ │ │
|
|
│ └─────────────────────┘ │ │
|
|
│ ▼ │
|
|
│ ┌───────────────────────────┐ │
|
|
│ │ custodian_service_areas │ │
|
|
│ │ (Computed werkgebied) │ │
|
|
│ │ - ghcid: NL-NH-HAA-A-NHA │ │
|
|
│ │ - admin2_ids: [1,2,3...] │ │
|
|
│ │ - geom: MULTIPOLYGON │ │
|
|
│ └───────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────┐ │
|
|
│ │ geonames_settlements│ (For reverse geocoding) │
|
|
│ │ - geonames_id │ │
|
|
│ │ - name, ascii_name │ │
|
|
│ │ - geom: POINT │ │
|
|
│ └─────────────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Data Sources
|
|
|
|
| Source | Coverage | License | Admin Levels | Use Case |
|
|
|--------|----------|---------|--------------|----------|
|
|
| **GADM** | Global | CC-BY-NC | 0, 1, 2, 3+ | Primary global boundaries |
|
|
| **Natural Earth** | Global | Public Domain | 0, 1 | Simplified country shapes |
|
|
| **CBS** | Netherlands | CC-BY-4.0 | 2 (gemeente) | Official NL municipalities |
|
|
| **HALC** | Low Countries | Academic | Historical | Pre-1800 territories |
|
|
| **OSM** | Global | ODbL | Variable | Crowdsourced, current |
|
|
| **Eurostat** | Europe | Eurostat | NUTS/LAU | EU statistical regions |
|
|
|
|
### API Endpoints
|
|
|
|
The PostGIS database will be exposed via REST API:
|
|
|
|
```
|
|
GET /api/boundaries/countries
|
|
GET /api/boundaries/countries/{iso_a2}
|
|
GET /api/boundaries/admin1/{iso_a2}
|
|
GET /api/boundaries/admin2/{iso_a2}/{admin1_code}
|
|
GET /api/boundaries/point?lon={lon}&lat={lat}
|
|
GET /api/boundaries/service-area/{ghcid}
|
|
GET /api/boundaries/service-area/{ghcid}/geojson
|
|
```
|
|
|
|
### Frontend Integration
|
|
|
|
The frontend will:
|
|
1. Fetch service area GeoJSON via API (not static files)
|
|
2. Use MapLibre vector tiles for admin boundaries (optional optimization)
|
|
3. Cache frequently-accessed boundaries in browser
|
|
4. Support historical boundary display with temporal filtering
|
|
|
|
```typescript
|
|
// Example: Fetch service area for a custodian
|
|
const response = await fetch(`/api/boundaries/service-area/${ghcid}/geojson`);
|
|
const geojson = await response.json();
|
|
map.getSource('werkgebied').setData(geojson);
|
|
```
|
|
|
|
### Temporal Versioning
|
|
|
|
Boundaries change over time (municipal mergers, etc.). The schema supports:
|
|
|
|
```sql
|
|
-- Current boundary (valid_to IS NULL)
|
|
SELECT * FROM boundary_admin2 WHERE cbs_gemeente_code = 'GM0363' AND valid_to IS NULL;
|
|
|
|
-- Historical boundary (before merger)
|
|
SELECT * FROM boundary_admin2 WHERE cbs_gemeente_code = 'GM0363' AND valid_to = '2001-01-01';
|
|
```
|
|
|
|
### Service Area Computation
|
|
|
|
Service areas are computed from admin units:
|
|
|
|
```sql
|
|
-- Compute service area for Noord-Hollands Archief (serves Haarlem + region)
|
|
SELECT ST_Union(geom) AS service_area_geom
|
|
FROM boundary_admin2
|
|
WHERE id IN (SELECT unnest(admin2_ids) FROM custodian_service_areas WHERE ghcid = 'NL-NH-HAA-A-NHA');
|
|
```
|
|
|
|
Or pre-computed and stored:
|
|
|
|
```sql
|
|
-- Pre-compute and cache
|
|
INSERT INTO custodian_service_areas (ghcid, service_area_name, geom, admin2_ids)
|
|
VALUES (
|
|
'NL-NH-HAA-A-NHA',
|
|
'Noord-Hollands Archief Werkgebied',
|
|
compute_service_area_geometry(ARRAY[123, 124, 125, 126]), -- admin2 IDs
|
|
ARRAY[123, 124, 125, 126]
|
|
);
|
|
```
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: Schema & Initial Data (Current Sprint)
|
|
- [x] Create PostGIS schema (`002_postgis_boundaries.sql`)
|
|
- [x] Create boundary loading script (`load_boundaries_postgis.py`)
|
|
- [ ] Load Netherlands boundaries (CBS + provinces)
|
|
- [ ] Load HALC historical boundaries
|
|
- [ ] Migrate existing GeoJSON data
|
|
|
|
### Phase 2: International Expansion
|
|
- [ ] Load GADM for priority countries: JP, CZ, DE, BE, CH, AT
|
|
- [ ] Load GeoNames settlements for reverse geocoding
|
|
- [ ] Create API endpoints for boundary queries
|
|
- [ ] Update frontend to use API instead of static files
|
|
|
|
### Phase 3: Service Area Management
|
|
- [ ] Compute service areas for existing custodians
|
|
- [ ] Create admin UI for service area editing
|
|
- [ ] Implement temporal boundary display
|
|
- [ ] Add vector tile generation (optional optimization)
|
|
|
|
## Files Created
|
|
|
|
| File | Description |
|
|
|------|-------------|
|
|
| `infrastructure/sql/002_postgis_boundaries.sql` | PostGIS schema for boundaries |
|
|
| `scripts/load_boundaries_postgis.py` | Python script to load boundary data |
|
|
| `docs/POSTGIS_BOUNDARY_ARCHITECTURE.md` | This document |
|
|
|
|
## Dependencies
|
|
|
|
- PostgreSQL 14+ with PostGIS 3.3+
|
|
- Python: `psycopg2`, `geopandas`, `shapely`
|
|
- GADM data (downloaded on demand)
|
|
- CBS GeoJSON (existing in `frontend/public/data/`)
|
|
|
|
## Migration from Static GeoJSON
|
|
|
|
The current static GeoJSON approach will be **deprecated** but **not immediately removed**:
|
|
|
|
1. PostGIS becomes the source of truth for boundaries
|
|
2. API serves boundary GeoJSON on demand
|
|
3. Static files remain as fallback for development
|
|
4. Frontend gradually migrates to API-based loading
|