# Kamp Westerbork Digital Collection - Data Harvesting Analysis **Date**: 2025-11-17 **Institution**: Herinneringscentrum Kamp Westerbork **URL**: https://collecties.kampwesterbork.nl/ **ISIL Code**: NL-HhlHCKW --- ## Executive Summary Herinneringscentrum Kamp Westerbork operates a digital collection platform built on **Atlantis CMS** with **IIIF** image services, **Spinque** search backend, and **Sanity.io** content management. The platform uses persistent HTTP URIs for linked data but has **limited public API access** - harvesting requires reverse-engineering Next.js data endpoints and the Spinque search API. --- ## Collections Management System ### Primary System: **Atlantis** (by Atlantis Erfgoed) - **Provider**: [Atlantis Erfgoed](https://www.atlantis-erfgoed.nl/) - **Type**: Web-based archive and collection management system - **Standards**: ISAD(G), ISAAR(CPF), ISDF (archival standards) - **Confirmed Usage**: Listed in Dutch Organizations CSV as using "Atlantis" **Atlantis Features**: - Multimedia information system (links images, audio/video, documents) - Depot management for physical storage tracking - Integrated online publication platform - Plugin-based architecture for extensibility **Reference**: Dutch Organizations CSV confirms "Atlantis" as their system: ``` Drenthe,Hooghalen,Oosthalen 8,Stichting Herinneringscentrum Kamp Westerbork,,https://kampwesterbork.nl/ ,museum,,NL-HhlHCKW,,Atlantis,ja,,ja ``` --- ## Digital Infrastructure Architecture ### 1. **Frontend Framework** - **Technology**: Next.js (React-based, server-side rendered) - **Build ID**: `b6DuV8zPMsiXPQxO3Lrq8` (visible in `/_next/static/` paths) - **Deployment**: Static assets via `_next/static/chunks/` **Key Observation**: The site uses Next.js **data fetching** via JSON endpoints like: ``` https://collecties.kampwesterbork.nl/_next/data/b6DuV8zPMsiXPQxO3Lrq8/nl/zoeken.json?term=Amsterdam ``` This is a **harvestable endpoint** for search results! --- ### 2. **Search Backend: Spinque API** **Endpoint**: `https://collecties.kampwesterbork.nl/api/spinque-proxy` **Method**: POST **Status**: ✅ **Active and responding** (observed multiple 200 OK responses during search) **Observed Behavior**: - 3 POST requests fired on search query "Amsterdam" - Likely structure: `{query: "Amsterdam", page: 1, filters: {...}}` - Returns JSON with search results (persons, works, entities) **Search Result Statistics** (from "Amsterdam" query): - **65,581 total results** - **63,659 persons** (5,305 pages at ~12 results/page) - **Documents**: Brief (894), Briefkaart (179), Bewijs van verzending (103), etc. - Pagination: `/zoeken?term=Amsterdam&page=2` (URL-based) and `&personPage=2` (person-specific) **Harvesting Strategy**: 1. Intercept or reverse-engineer Spinque API calls 2. Use Chrome DevTools Network tab to capture POST request payload 3. Replicate requests with Python `requests` library 4. Iterate through pagination parameters --- ### 3. **IIIF Image API 3.0** **Base URL**: `https://kenniscentrum.kampwesterbork.nl/iiif/image/3.0/` **Format**: Standard IIIF Image API ``` https://kenniscentrum.kampwesterbork.nl/iiif/image/3.0/{IMAGE_ID}/full/{SIZE}/0/default.jpg ``` **Examples**: - `17614498/full/!100,100/0/default.jpg` (thumbnail, max 100x100px) - `17614498/full/!300,300/0/default.jpg` (medium, max 300x300px) - `17614498/full/!600,600/0/default.jpg` (large, max 600x600px) - `17614498/full/max/0/default.jpg` (full resolution) **IIIF Info Document**: `https://kenniscentrum.kampwesterbork.nl/iiif/image/3.0/{IMAGE_ID}/info.json` **Compliance**: Fully IIIF 3.0 compliant ✅ **Harvesting**: Use IIIF manifest URLs if available, or construct URLs from image IDs --- ### 4. **Content Management: Sanity.io** **CDN**: `https://cdn.sanity.io/images/e44e8rzu/production/` **Project ID**: `e44e8rzu` **Dataset**: `production` **Image Examples**: ``` https://cdn.sanity.io/images/e44e8rzu/production/daba9f3b89cd9c794d4baa061296c803705bcf55-658x548.jpg?w=1599&fit=max ``` **Sanity API Endpoint** (potentially accessible): ``` https://e44e8rzu.api.sanity.io/v2021-10-21/data/query/production?query=*[_type == "story"] ``` **Use Case**: Sanity stores editorial content (stories/verhalen), not collection metadata **Harvesting**: Query Sanity API for stories, but collection data is in Atlantis/Spinque --- ## Persistent Identifier Schemes ### 1. **Work/Object IDs** **Format**: `https://data.kampwesterbork.nl/work/{IDENTIFIER}` **Examples**: - `TH0000017587` (ThesisHolocaust collection prefix) - `HCIS.00011671` (Herinneringscentrum Information System) **Web URL Structure** (URL-encoded PIDs): ``` https://collecties.kampwesterbork.nl/werk/https%3A%2F%2Fdata.kampwesterbork.nl%2Fwork%2FTH0000017587 ``` **Note**: Direct access to `https://data.kampwesterbork.nl/work/TH0000017587` returns **connection refused** (likely internal-only or requires authentication). --- ### 2. **Person IDs** **Format**: `https://kampwesterbork.nl/data/person/{NUMERIC_ID}` **Examples**: - `10907179` (Marie Schönberg-Amsterdam) - `13442221` (Abraham Peekel) - `14808721` (Henriette Kalker) **Web URL Structure**: ``` https://collecties.kampwesterbork.nl/persoon/https%3A%2F%2Fkampwesterbork.nl%2Fdata%2Fperson%2F10907179 ``` **Harvesting**: Person IDs are visible in search results - can be extracted from Spinque API responses. --- ### 3. **Entity/Concept IDs** (Thesaurus) **Format**: `https://digitaalerfgoed.poolparty.biz/westerbork/{ID}` **Examples**: - `joodse%20gemeente%20nieuw-amsterdam` (Joodse gemeente Nieuw-Amsterdam) - `doorgangskamp%20westerbork` (Doorgangskamp Westerbork periods) **System**: **PoolParty Semantic Suite** (controlled vocabulary management) **Use Case**: Subject headings, locations, events, organizational changes (1939-1971 camp periods) --- ### 4. **Image IDs** **Format**: Simple numeric identifiers (no URI prefix) **Examples**: `17614498`, `17614499`, `15557762` **Retrieval**: Via IIIF Image API using numeric ID --- ## Data Harvesting Strategies ### Strategy 1: **Next.js Data Endpoints** ⭐ RECOMMENDED **Approach**: Scrape Next.js JSON data files for server-side rendered pages **Example Endpoint**: ```bash curl "https://collecties.kampwesterbork.nl/_next/data/b6DuV8zPMsiXPQxO3Lrq8/nl/zoeken.json?term=Amsterdam&page=1" ``` **Advantages**: - ✅ Returns structured JSON (not HTML parsing) - ✅ Contains all data rendered on page (persons, works, entities) - ✅ Pagination via `?page=N` parameter - ✅ No authentication required **Disadvantages**: - ⚠️ Build ID (`b6DuV8zPMsiXPQxO3Lrq8`) may change on redeployment - ⚠️ Requires discovering build ID from homepage HTML or `_buildManifest.js` **Implementation**: ```python import requests # 1. Discover current build ID response = requests.get("https://collecties.kampwesterbork.nl/") build_id = extract_build_id_from_html(response.text) # Parse from