633 lines
19 KiB
Markdown
633 lines
19 KiB
Markdown
# UNESCO Data Extraction: Dependencies
|
|
|
|
## Overview
|
|
|
|
This document catalogs all dependencies for extracting UNESCO World Heritage Site data and integrating it with the Global GLAM Dataset.
|
|
|
|
## External Data Dependencies
|
|
|
|
### 1. UNESCO DataHub
|
|
|
|
**Primary Source**:
|
|
- **URL**: https://data.unesco.org/
|
|
- **Platform**: OpenDataSoft (commercial open data platform)
|
|
- **World Heritage Dataset**: https://data.unesco.org/explore/dataset/whc001/
|
|
- **API Documentation**: https://help.opendatasoft.com/apis/ods-explore-v2/
|
|
|
|
**⚠️ CRITICAL FINDING**: UNESCO DataHub uses **OpenDataSoft's proprietary Explore API v2** (NOT OAI-PMH or standard REST)
|
|
|
|
**API Architecture**:
|
|
- **API Version**: Explore API v2.0 and v2.1
|
|
- **Base Endpoint**: `https://data.unesco.org/api/explore/v2.0/` and `https://data.unesco.org/api/explore/v2.1/`
|
|
- **Query Language**: ODSQL (OpenDataSoft SQL-like query language, NOT standard SQL)
|
|
- **Response Format**: JSON
|
|
- **Authentication**: None required for public datasets (optional for private data)
|
|
|
|
**Available Formats**:
|
|
- JSON API via OpenDataSoft Explore API (primary extraction method)
|
|
- CSV export via portal
|
|
- DCAT-AP metadata
|
|
- RDF/XML (via OpenDataSoft serialization)
|
|
- GeoJSON (for geographic datasets)
|
|
|
|
**Rate Limits**:
|
|
- Public API: ~100 requests/minute (OpenDataSoft default, to be verified)
|
|
- Bulk download: Available via CSV/JSON export from portal
|
|
- SPARQL: NOT available (OpenDataSoft uses ODSQL instead)
|
|
|
|
**Authentication**: None required for public data
|
|
|
|
**Legacy JSON Endpoint** (BLOCKED):
|
|
- URL: https://whc.unesco.org/en/list/json/
|
|
- Status: Returns 403 Forbidden (Cloudflare protection)
|
|
- Mitigation: Use OpenDataSoft API instead
|
|
|
|
### 2. Wikidata Integration
|
|
|
|
**Purpose**: Enrich UNESCO sites with Wikidata Q-numbers
|
|
|
|
**Endpoint**: https://query.wikidata.org/sparql
|
|
|
|
**Query Pattern**:
|
|
```sparql
|
|
SELECT ?item ?itemLabel ?whcID WHERE {
|
|
?item wdt:P757 ?whcID . # UNESCO World Heritage Site ID (P757)
|
|
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
|
|
}
|
|
```
|
|
|
|
**Usage**:
|
|
- Cross-reference UNESCO WHC IDs with Wikidata
|
|
- Enrich with additional identifiers (VIAF, GND, etc.)
|
|
- Extract alternative names in multiple languages
|
|
|
|
### 3. GeoNames
|
|
|
|
**Purpose**: Geocode locations, generate UN/LOCODE for GHCID
|
|
|
|
**Endpoint**: http://api.geonames.org/
|
|
|
|
**API Key**: Required (free tier: 20k requests/day)
|
|
|
|
**Usage**:
|
|
- Validate coordinates
|
|
- Look up UN/LOCODE for cities
|
|
- Resolve place names to GeoNames IDs
|
|
|
|
### 4. ISO Standards Data
|
|
|
|
**ISO 3166-1 alpha-2** (Country Codes):
|
|
- Source: `pycountry` library
|
|
- Purpose: Standardize country identifiers for GHCID
|
|
|
|
**ISO 3166-2** (Region Codes):
|
|
- Source: `pycountry` library
|
|
- Purpose: Region codes for GHCID generation
|
|
|
|
## Python Library Dependencies
|
|
|
|
### Core Libraries
|
|
|
|
```toml
|
|
[tool.poetry.dependencies]
|
|
python = "^3.11"
|
|
|
|
# HTTP & API clients
|
|
httpx = "^0.25.0" # Async HTTP for UNESCO API
|
|
aiohttp = "^3.9.0" # Alternative async HTTP
|
|
requests-cache = "^1.1.0" # Cache API responses
|
|
# NOTE: OpenDataSoft API uses standard REST, no special client library needed
|
|
# ODSQL queries passed as URL parameters to Explore API v2
|
|
|
|
# Data processing
|
|
pandas = "^2.1.0" # Tabular data manipulation
|
|
pydantic = "^2.4.0" # Data validation
|
|
jsonschema = "^4.19.0" # JSON schema validation
|
|
|
|
# LinkML ecosystem
|
|
linkml = "^1.6.0" # Schema tools
|
|
linkml-runtime = "^1.6.0" # Runtime validation
|
|
linkml-map = "^0.1.0" # Mapping transformations ⚠️ CRITICAL
|
|
|
|
# RDF & Semantic Web
|
|
rdflib = "^7.0.0" # RDF graph manipulation
|
|
sparqlwrapper = "^2.0.0" # SPARQL queries
|
|
jsonld = "^1.8.0" # JSON-LD processing
|
|
|
|
# Geographic data
|
|
geopy = "^2.4.0" # Geocoding
|
|
pycountry = "^22.3.5" # Country/region codes
|
|
shapely = "^2.0.0" # Geometric operations (optional)
|
|
|
|
# Text processing
|
|
unidecode = "^1.3.7" # Unicode normalization
|
|
ftfy = "^6.1.1" # Fix text encoding
|
|
rapidfuzz = "^3.5.0" # Fuzzy matching
|
|
|
|
# Utilities
|
|
tqdm = "^4.66.0" # Progress bars
|
|
rich = "^13.7.0" # Beautiful terminal output
|
|
loguru = "^0.7.0" # Structured logging
|
|
```
|
|
|
|
### Development Dependencies
|
|
|
|
```toml
|
|
[tool.poetry.group.dev.dependencies]
|
|
# Testing
|
|
pytest = "^7.4.0"
|
|
pytest-asyncio = "^0.21.0" # Async test support
|
|
pytest-cov = "^4.1.0" # Coverage
|
|
pytest-mock = "^3.12.0" # Mocking
|
|
hypothesis = "^6.92.0" # Property-based testing
|
|
responses = "^0.24.0" # Mock HTTP responses
|
|
|
|
# Code quality
|
|
ruff = "^0.1.0" # Fast linter
|
|
mypy = "^1.7.0" # Type checking
|
|
pre-commit = "^3.5.0" # Git hooks
|
|
|
|
# Documentation
|
|
mkdocs = "^1.5.0"
|
|
mkdocs-material = "^9.5.0"
|
|
```
|
|
|
|
### OpenDataSoft API Integration
|
|
|
|
**⚠️ CRITICAL**: UNESCO uses OpenDataSoft's proprietary Explore API, NOT a standard protocol like OAI-PMH.
|
|
|
|
**ODSQL Query Language**:
|
|
- **Type**: SQL-like query language (NOT standard SQL)
|
|
- **Purpose**: Filter, aggregate, and query OpenDataSoft datasets
|
|
- **Syntax**: Similar to SQL SELECT/WHERE/GROUP BY but with custom functions
|
|
- **Documentation**: https://help.opendatasoft.com/apis/ods-explore-v2/#odsql
|
|
|
|
**Example ODSQL Query**:
|
|
```python
|
|
# Fetch UNESCO World Heritage Sites inscribed after 2000 in Europe
|
|
import requests
|
|
|
|
url = "https://data.unesco.org/api/explore/v2.0/catalog/datasets/whc001/records"
|
|
params = {
|
|
"select": "site_name, date_inscribed, latitude, longitude",
|
|
"where": "date_inscribed > 2000 AND region = 'Europe'",
|
|
"limit": 100,
|
|
"offset": 0,
|
|
"order_by": "date_inscribed DESC"
|
|
}
|
|
response = requests.get(url, params=params)
|
|
data = response.json()
|
|
```
|
|
|
|
**Key Differences from Standard REST APIs**:
|
|
- Queries passed as URL parameters (not POST body)
|
|
- ODSQL syntax in `where`, `select`, `group_by` parameters
|
|
- Pagination via `limit` and `offset` (not standard cursors)
|
|
- Response structure: `{"total_count": N, "results": [...]}`
|
|
|
|
**Implementation Strategy**:
|
|
- Use `requests` or `httpx` for HTTP calls (no special client library)
|
|
- Build ODSQL queries programmatically as strings
|
|
- Handle pagination with `limit`/`offset` loops
|
|
- Cache responses with `requests-cache`
|
|
|
|
### Critical: LinkML Map Extension
|
|
|
|
**⚠️ IMPORTANT**: This project requires **LinkML Map with conditional XPath/JSONPath/regex support**.
|
|
|
|
**Status**: To be implemented as bespoke extension
|
|
|
|
**Required Features**:
|
|
1. **Conditional extraction** - Extract based on field value conditions
|
|
2. **XPath support** - Navigate XML/HTML structures from UNESCO
|
|
3. **JSONPath support** - Query complex JSON API responses
|
|
4. **Regex patterns** - Extract and validate identifier formats
|
|
5. **Multi-value handling** - Map arrays to multivalued LinkML slots
|
|
|
|
**Implementation Path**:
|
|
- Extend `linkml-map` package OR
|
|
- Create custom `linkml_map_extended` module in project
|
|
|
|
**Example Usage**:
|
|
```yaml
|
|
# LinkML Map with conditional XPath
|
|
mappings:
|
|
- source: unesco_site
|
|
target: HeritageCustodian
|
|
conditions:
|
|
- path: "$.category"
|
|
pattern: "^(Cultural|Mixed)$" # Only cultural/mixed sites
|
|
transforms:
|
|
- source_path: "$.site_name"
|
|
target_slot: name
|
|
- source_path: "$.date_inscribed"
|
|
target_slot: founded_date
|
|
transform: "parse_year"
|
|
- source_xpath: "//description[@lang='en']/text()"
|
|
target_slot: description
|
|
```
|
|
|
|
## Schema Dependencies
|
|
|
|
### LinkML Global GLAM Schema (v0.2.1)
|
|
|
|
**Modules Required**:
|
|
- `schemas/core.yaml` - HeritageCustodian, Location, Identifier
|
|
- `schemas/enums.yaml` - InstitutionTypeEnum, DataSource
|
|
- `schemas/provenance.yaml` - Provenance, ChangeEvent
|
|
- `schemas/collections.yaml` - Collection metadata
|
|
|
|
**New Enums to Add**:
|
|
```yaml
|
|
# schemas/enums.yaml - Add to DataSource
|
|
DataSource:
|
|
permissible_values:
|
|
UNESCO_WORLD_HERITAGE:
|
|
description: UNESCO World Heritage Centre API
|
|
meaning: heritage:UNESCODataHub
|
|
```
|
|
|
|
### Ontology Dependencies
|
|
|
|
**Required Ontologies**:
|
|
|
|
1. **CPOV** (Core Public Organisation Vocabulary)
|
|
- Path: `/data/ontology/core-public-organisation-ap.ttl`
|
|
- Usage: Map UNESCO sites to public organizations
|
|
|
|
2. **Schema.org**
|
|
- Path: `/data/ontology/schemaorg.owl`
|
|
- Usage: `schema:Museum`, `schema:TouristAttraction`, `schema:Place`
|
|
|
|
3. **CIDOC-CRM** (Cultural Heritage)
|
|
- Path: `/data/ontology/CIDOC_CRM_v7.1.3.rdf`
|
|
- Usage: `crm:E53_Place`, `crm:E74_Group` for heritage sites
|
|
|
|
4. **UNESCO Thesaurus**
|
|
- URL: http://vocabularies.unesco.org/thesaurus/
|
|
- Usage: Map UNESCO criteria to controlled vocabulary
|
|
- Format: SKOS (RDF)
|
|
|
|
**To Download**:
|
|
```bash
|
|
# UNESCO Thesaurus
|
|
curl -o data/ontology/unesco_thesaurus.ttl \
|
|
http://vocabularies.unesco.org/thesaurus/concept.ttl
|
|
|
|
# Verify CIDOC-CRM is present
|
|
ls data/ontology/CIDOC_CRM_v7.1.3.rdf
|
|
```
|
|
|
|
## External Service Dependencies
|
|
|
|
### 1. UNESCO OpenDataSoft API Service
|
|
|
|
**Platform**: OpenDataSoft Explore API v2.0/v2.1
|
|
|
|
**Primary Endpoint**: https://data.unesco.org/api/explore/v2.0/catalog/datasets/whc001/records
|
|
|
|
**Legacy Endpoint** (BLOCKED):
|
|
- `https://whc.unesco.org/en/list/json/`
|
|
- Status: Returns 403 Forbidden (Cloudflare protection)
|
|
- Action: Do NOT use - use OpenDataSoft API instead
|
|
|
|
**Availability**: 99.5% uptime (estimated, OpenDataSoft SLA)
|
|
|
|
**API Response Structure**:
|
|
```json
|
|
{
|
|
"total_count": 1199,
|
|
"results": [
|
|
{
|
|
"site_name": "Historic Centre of Rome",
|
|
"date_inscribed": 1980,
|
|
"category": "Cultural",
|
|
"latitude": 41.9,
|
|
"longitude": 12.5,
|
|
"criteria_txt": "(i)(ii)(iii)(iv)(vi)",
|
|
"states_name_en": "Italy",
|
|
// ... additional fields
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Pagination**:
|
|
- Use `limit` (max 100 per request) and `offset` parameters
|
|
- Total records available in `total_count` field
|
|
- No cursor-based pagination
|
|
|
|
**Fallback Strategy**:
|
|
- Cache API responses locally (requests-cache with 24-hour TTL)
|
|
- Implement exponential backoff on 5xx errors
|
|
- Fallback to CSV export download if API unavailable for >1 hour
|
|
- CSV export URL: `https://data.unesco.org/api/explore/v2.0/catalog/datasets/whc001/exports/csv`
|
|
|
|
**Monitoring**:
|
|
- Log all API response times
|
|
- Alert if >5% requests fail or response time >2s
|
|
- Track `total_count` for dataset size changes
|
|
- Monitor for schema changes via response validation
|
|
|
|
### 2. Wikidata SPARQL Endpoint
|
|
|
|
**Endpoint**: https://query.wikidata.org/sparql
|
|
|
|
**Rate Limits**:
|
|
- 60 requests/minute (public)
|
|
- User-Agent header required
|
|
|
|
**Fallback Strategy**:
|
|
- Cache SPARQL results (24-hour TTL)
|
|
- Batch queries to minimize requests
|
|
- Skip enrichment if endpoint unavailable
|
|
|
|
### 3. GeoNames API
|
|
|
|
**Endpoint**: http://api.geonames.org/
|
|
|
|
**Rate Limits**:
|
|
- Free tier: 20,000 requests/day
|
|
- Premium: 200,000 requests/day
|
|
|
|
**API Key Storage**: Environment variable `GEONAMES_API_KEY`
|
|
|
|
**Fallback Strategy**:
|
|
- Use UNESCO-provided coordinates as primary
|
|
- GeoNames lookup only for UN/LOCODE generation
|
|
- Cache lookups indefinitely (place names don't change often)
|
|
|
|
## File System Dependencies
|
|
|
|
### Required Directories
|
|
|
|
```
|
|
glam-extractor/
|
|
├── data/
|
|
│ ├── unesco/
|
|
│ │ ├── raw/ # Raw API responses
|
|
│ │ ├── cache/ # HTTP cache
|
|
│ │ ├── instances/ # LinkML instances
|
|
│ │ └── mappings/ # LinkML Map schemas
|
|
│ ├── ontology/
|
|
│ │ ├── unesco_thesaurus.ttl # UNESCO vocabulary
|
|
│ │ └── [existing ontologies]
|
|
│ └── reference/
|
|
│ └── unesco_criteria.yaml # Criteria definitions
|
|
├── src/glam_extractor/
|
|
│ ├── extractors/
|
|
│ │ └── unesco.py # UNESCO extractor
|
|
│ ├── mappers/
|
|
│ │ └── unesco_mapper.py # LinkML Map integration
|
|
│ └── validators/
|
|
│ └── unesco_validator.py # UNESCO-specific validation
|
|
└── tests/
|
|
└── unesco/
|
|
├── fixtures/ # Test data
|
|
└── test_unesco_*.py # Test modules
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
# .env file
|
|
GEONAMES_API_KEY=your_key_here
|
|
UNESCO_API_CACHE_TTL=86400 # 24 hours
|
|
WIKIDATA_RATE_LIMIT=60 # requests/minute
|
|
LOG_LEVEL=INFO
|
|
```
|
|
|
|
## Database Dependencies
|
|
|
|
### DuckDB (Intermediate Storage)
|
|
|
|
**Purpose**: Store extracted UNESCO data during processing
|
|
|
|
**Schema**:
|
|
```sql
|
|
CREATE TABLE unesco_sites (
|
|
whc_id INTEGER PRIMARY KEY,
|
|
site_name TEXT NOT NULL,
|
|
country TEXT,
|
|
category TEXT, -- Cultural, Natural, Mixed
|
|
date_inscribed INTEGER,
|
|
latitude DOUBLE,
|
|
longitude DOUBLE,
|
|
raw_json TEXT, -- Full API response
|
|
extracted_at TIMESTAMP,
|
|
processed BOOLEAN DEFAULT FALSE
|
|
);
|
|
|
|
CREATE TABLE unesco_criteria (
|
|
whc_id INTEGER,
|
|
criterion_code TEXT, -- (i), (ii), ..., (x)
|
|
criterion_text TEXT,
|
|
FOREIGN KEY (whc_id) REFERENCES unesco_sites(whc_id)
|
|
);
|
|
```
|
|
|
|
### SQLite (Cache)
|
|
|
|
**Purpose**: HTTP response caching (via requests-cache)
|
|
|
|
**Auto-managed**: Library handles schema
|
|
|
|
## Network Dependencies
|
|
|
|
### Required Network Access
|
|
|
|
- `data.unesco.org` (port 443) - UNESCO OpenDataSoft API (primary)
|
|
- `whc.unesco.org` (port 443) - UNESCO WHC website (legacy, blocked by Cloudflare)
|
|
- `query.wikidata.org` (port 443) - Wikidata SPARQL
|
|
- `api.geonames.org` (port 80/443) - GeoNames API
|
|
- `vocabularies.unesco.org` (port 443) - UNESCO Thesaurus
|
|
|
|
**Note**: The legacy `whc.unesco.org/en/list/json/` endpoint is protected by Cloudflare and returns 403. Use `data.unesco.org` OpenDataSoft API instead.
|
|
|
|
### Offline Mode
|
|
|
|
**Supported**: Yes, with cached data
|
|
|
|
**Requirements**:
|
|
- Pre-downloaded UNESCO dataset snapshot
|
|
- Cached Wikidata SPARQL results
|
|
- GeoNames offline database (optional)
|
|
|
|
**Activate**:
|
|
```bash
|
|
export GLAM_OFFLINE_MODE=true
|
|
```
|
|
|
|
## Version Constraints
|
|
|
|
### Python Version
|
|
- **Minimum**: Python 3.11
|
|
- **Recommended**: Python 3.11 or 3.12
|
|
- **Reason**: Modern type hints, performance improvements
|
|
|
|
### LinkML Version
|
|
- **Minimum**: linkml 1.6.0
|
|
- **Reason**: Required for latest schema features
|
|
|
|
### API Versions
|
|
- **UNESCO OpenDataSoft API**: Explore API v2.0 and v2.1 (both supported)
|
|
- **Wikidata SPARQL**: SPARQL 1.1 standard
|
|
- **GeoNames API**: v3.0
|
|
|
|
**Breaking Change Detection**:
|
|
- Monitor UNESCO API responses for schema changes
|
|
- Test suite includes response validation against expected structure
|
|
- Alert if field names or types change unexpectedly
|
|
|
|
## Dependency Installation
|
|
|
|
### Full Installation
|
|
|
|
```bash
|
|
# Clone repository
|
|
git clone https://github.com/user/glam-dataset.git
|
|
cd glam-dataset
|
|
|
|
# Install with Poetry (includes UNESCO dependencies)
|
|
poetry install
|
|
|
|
# Download UNESCO Thesaurus
|
|
mkdir -p data/ontology
|
|
curl -o data/ontology/unesco_thesaurus.ttl \
|
|
http://vocabularies.unesco.org/thesaurus/concept.ttl
|
|
|
|
# Set up environment
|
|
cp .env.example .env
|
|
# Edit .env with your GeoNames API key
|
|
```
|
|
|
|
### Minimal Installation (Testing Only)
|
|
|
|
```bash
|
|
poetry install --without dev
|
|
# Use cached fixtures, no API access
|
|
```
|
|
|
|
## Dependency Risk Assessment
|
|
|
|
### High Risk
|
|
|
|
1. **LinkML Map Extension** ⚠️
|
|
- **Risk**: Custom extension not yet implemented
|
|
- **Mitigation**: Prioritize in Phase 1, create fallback pure-Python mapper
|
|
- **Timeline**: Week 1-2
|
|
|
|
2. **UNESCO OpenDataSoft API Stability**
|
|
- **Risk**: Schema changes in OpenDataSoft platform or dataset fields
|
|
- **Mitigation**:
|
|
- Version API responses in cache filenames
|
|
- Automated schema validation tests
|
|
- Monitor `total_count` for unexpected changes
|
|
- **Detection**: Weekly automated tests, compare field names/types
|
|
- **Fallback**: CSV export download if API breaking changes detected
|
|
|
|
3. **ODSQL Query Language Changes**
|
|
- **Risk**: OpenDataSoft updates ODSQL syntax in breaking ways
|
|
- **Mitigation**:
|
|
- Pin to Explore API v2.0 (stable version)
|
|
- Test queries against live API weekly
|
|
- Document ODSQL syntax version in code comments
|
|
- **Impact**: Query failures, need to rewrite ODSQL expressions
|
|
|
|
### Medium Risk
|
|
|
|
1. **GeoNames Rate Limits**
|
|
- **Risk**: Exceed free tier (20k/day)
|
|
- **Mitigation**: Aggressive caching, batch processing
|
|
- **Fallback**: Manual UN/LOCODE lookup table
|
|
|
|
2. **Wikidata Endpoint Load**
|
|
- **Risk**: SPARQL endpoint slow or unavailable
|
|
- **Mitigation**: Timeout handling, skip enrichment on failure
|
|
- **Impact**: Reduced data quality, no blocking errors
|
|
|
|
### Low Risk
|
|
|
|
1. **Python Library Updates**
|
|
- **Risk**: Breaking changes in dependencies
|
|
- **Mitigation**: Pin versions in poetry.lock
|
|
- **Testing**: CI/CD catches incompatibilities
|
|
|
|
## Monitoring Dependencies
|
|
|
|
### Health Checks
|
|
|
|
```python
|
|
# tests/test_dependencies.py
|
|
import httpx
|
|
|
|
def test_unesco_opendatasoft_api_available():
|
|
"""Verify UNESCO OpenDataSoft API is accessible"""
|
|
url = "https://data.unesco.org/api/explore/v2.0/catalog/datasets/whc001/records"
|
|
params = {"limit": 1}
|
|
response = httpx.get(url, params=params, timeout=10)
|
|
assert response.status_code == 200
|
|
|
|
data = response.json()
|
|
assert "total_count" in data
|
|
assert "results" in data
|
|
assert data["total_count"] > 1000 # Expect 1000+ UNESCO sites
|
|
|
|
def test_unesco_legacy_api_blocked():
|
|
"""Verify legacy JSON endpoint is indeed blocked"""
|
|
url = "https://whc.unesco.org/en/list/json/"
|
|
response = httpx.get(url, timeout=10, follow_redirects=True)
|
|
# Expect 403 Forbidden due to Cloudflare protection
|
|
assert response.status_code == 403
|
|
|
|
def test_wikidata_sparql_available():
|
|
"""Verify Wikidata SPARQL endpoint"""
|
|
from SPARQLWrapper import SPARQLWrapper, JSON
|
|
|
|
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
|
|
sparql.setQuery("SELECT * WHERE { ?s ?p ?o } LIMIT 1")
|
|
sparql.setReturnFormat(JSON)
|
|
|
|
results = sparql.query().convert()
|
|
assert len(results["results"]["bindings"]) > 0
|
|
|
|
def test_geonames_api_key():
|
|
"""Verify GeoNames API key is valid"""
|
|
import os
|
|
|
|
api_key = os.getenv("GEONAMES_API_KEY")
|
|
assert api_key, "GEONAMES_API_KEY not set in environment"
|
|
|
|
url = f"http://api.geonames.org/getJSON"
|
|
params = {"geonameId": 6295630, "username": api_key} # Test with Earth
|
|
response = httpx.get(url, params=params, timeout=10)
|
|
|
|
assert response.status_code == 200
|
|
data = response.json()
|
|
assert "geonameId" in data
|
|
```
|
|
|
|
### Dependency Update Schedule
|
|
|
|
- **Weekly**: Security patches (automated via Dependabot)
|
|
- **Monthly**: Minor version updates (manual review)
|
|
- **Quarterly**: Major version updates (full regression testing)
|
|
|
|
## Related Documentation
|
|
|
|
- **Implementation Phases**: `docs/plan/unesco/03-implementation-phases.md`
|
|
- **LinkML Map Schema**: `docs/plan/unesco/06-linkml-map-schema.md`
|
|
- **TDD Strategy**: `docs/plan/unesco/04-tdd-strategy.md`
|
|
|
|
---
|
|
|
|
**Version**: 1.1
|
|
**Date**: 2025-11-09
|
|
**Status**: Updated - OpenDataSoft API architecture documented
|
|
**Changelog**:
|
|
- v1.1 (2025-11-09): Updated to reflect OpenDataSoft Explore API v2 (not OAI-PMH)
|
|
- Added ODSQL query language documentation
|
|
- Updated API endpoints and response structures
|
|
- Added health checks for OpenDataSoft API
|
|
- Documented legacy endpoint blocking (Cloudflare 403)
|
|
- v1.0 (2025-11-09): Initial version
|