481 lines
17 KiB
Markdown
481 lines
17 KiB
Markdown
# UNESCO Data Consumers and Use Cases
|
|
|
|
**Project**: Global GLAM Dataset - UNESCO World Heritage Sites Extraction
|
|
**Document**: 02 - Data Consumers
|
|
**Version**: 1.0
|
|
**Date**: 2025-11-09
|
|
**Status**: Draft
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This document identifies the primary consumers and use cases for UNESCO World Heritage Site data integrated into the Global GLAM Dataset. UNESCO sites represent authoritative (TIER_1) heritage custodian data with global coverage across 167 countries, making them valuable for diverse stakeholders ranging from academic researchers to tourism applications.
|
|
|
|
---
|
|
|
|
## Primary Data Consumers
|
|
|
|
### 1. Heritage Researchers and Academic Institutions
|
|
|
|
**Profile**: Scholars studying global cultural heritage networks, institutional relationships, and collection preservation patterns.
|
|
|
|
**Use Cases**:
|
|
- **Network Analysis**: Mapping relationships between UNESCO sites and regional/national GLAM institutions
|
|
- **Collection Provenance Research**: Tracing cultural objects through institutional history (mergers, relocations, repatriation)
|
|
- **Comparative Studies**: Cross-regional analysis of heritage management practices
|
|
- **Citation Systems**: Persistent identifiers (GHCID, UUID v5) for academic references
|
|
|
|
**Key Requirements**:
|
|
- High-quality provenance metadata (data_tier: TIER_1_AUTHORITATIVE)
|
|
- Complete identifier sets (UNESCO WHC ID, Wikidata Q-numbers, ISIL codes where applicable)
|
|
- Temporal data (founding dates, organizational change events)
|
|
- Multilingual name support (UNESCO sites have names in multiple languages)
|
|
|
|
**Integration Points**:
|
|
- Export as RDF/Turtle for SPARQL queries
|
|
- JSON-LD for web-based discovery
|
|
- CSV exports for statistical analysis (R, Python pandas)
|
|
|
|
**Example Query**:
|
|
```sparql
|
|
# Find all UNESCO museums in Latin America with digital platforms
|
|
PREFIX glam: <https://w3id.org/heritage/custodian/>
|
|
PREFIX schema: <http://schema.org/>
|
|
|
|
SELECT ?museum ?name ?platform_url WHERE {
|
|
?museum a glam:HeritageCustodian ;
|
|
glam:institution_type "MUSEUM" ;
|
|
glam:data_source "UNESCO_WORLD_HERITAGE" ;
|
|
glam:location/glam:region "Latin America" ;
|
|
glam:name ?name ;
|
|
glam:digital_platforms/glam:platform_url ?platform_url .
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Cultural Heritage Aggregators
|
|
|
|
**Profile**: Large-scale aggregation platforms consolidating heritage data from multiple sources.
|
|
|
|
**Target Platforms**:
|
|
- **Europeana** (European cultural heritage)
|
|
- **Digital Public Library of America (DPLA)** (US heritage)
|
|
- **Trove** (Australian heritage)
|
|
- **Collectie Nederland** (Dutch heritage - already integrated)
|
|
- **Regional aggregators** (e.g., Biblioteca Digital Hispánica, Brasiliana Digital)
|
|
|
|
**Use Cases**:
|
|
- **Cross-platform Linking**: Connect UNESCO sites to digitized collections in aggregators
|
|
- **Authority Control**: Use GHCID/Wikidata Q-numbers to deduplicate institution records
|
|
- **Discovery Enhancement**: Enrich search results with UNESCO designation context
|
|
- **Geospatial Search**: Locate heritage collections near World Heritage Sites
|
|
|
|
**Key Requirements**:
|
|
- EDM (Europeana Data Model) compatibility via RDF export
|
|
- Schema.org/JSON-LD for web harvesters
|
|
- Stable persistent identifiers (UUID v5 for deterministic references)
|
|
- GeoJSON exports for map-based discovery
|
|
|
|
**Integration Pattern**:
|
|
```yaml
|
|
# Europeana EDM mapping
|
|
edm:ProvidedCHO:
|
|
- dc:title: "Bibliothèque nationale de France"
|
|
edm:type: "TEXT" # Collection type
|
|
dcterms:spatial: "https://www.wikidata.org/wiki/Q90" # Paris
|
|
owl:sameAs:
|
|
- "https://w3id.org/heritage/custodian/fr/bnf" # GHCID
|
|
- "https://www.wikidata.org/wiki/Q193563" # Wikidata
|
|
- "https://viaf.org/viaf/137156502737605171529" # VIAF
|
|
|
|
edm:WebResource:
|
|
- edm:isShownAt: "https://www.bnf.fr"
|
|
dcterms:isPartOf: "https://whc.unesco.org/en/list/600" # UNESCO WHC ID
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Tourism and Cultural Sector Applications
|
|
|
|
**Profile**: Public-facing applications helping visitors discover heritage sites and their collections.
|
|
|
|
**Use Cases**:
|
|
- **Heritage Tourism Apps**: Guide apps showing museums/archives at UNESCO sites
|
|
- **Educational Platforms**: Interactive learning tools for students visiting World Heritage Sites
|
|
- **Virtual Tours**: 360° tours linking physical sites to digital collections
|
|
- **Event Planning**: Identify exhibition spaces, lecture halls, conservation labs at UNESCO sites
|
|
|
|
**Key Requirements**:
|
|
- Geospatial data (lat/lon, GeoJSON boundaries)
|
|
- Operating hours, contact information (when available)
|
|
- Digital platform URLs (virtual tours, online exhibitions)
|
|
- Accessibility information (wheelchair access, multilingual guides)
|
|
|
|
**API Integration Example**:
|
|
```json
|
|
// REST API endpoint for tourism apps
|
|
GET /api/v1/heritage/custodians?near_lat=48.8566&near_lon=2.3522&radius=5km&institution_type=MUSEUM
|
|
|
|
Response:
|
|
{
|
|
"results": [
|
|
{
|
|
"id": "https://w3id.org/heritage/custodian/fr/louvre",
|
|
"name": "Musée du Louvre",
|
|
"institution_type": "MUSEUM",
|
|
"location": {
|
|
"city": "Paris",
|
|
"country": "FR",
|
|
"coordinates": [48.8606, 2.3376]
|
|
},
|
|
"unesco_site": {
|
|
"whc_id": 600,
|
|
"name": "Paris, Banks of the Seine"
|
|
},
|
|
"digital_platforms": [
|
|
{
|
|
"platform_name": "Louvre Collections Database",
|
|
"platform_url": "https://collections.louvre.fr"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 4. Linked Open Data (LOD) Community
|
|
|
|
**Profile**: Semantic web developers building knowledge graphs and linked data applications.
|
|
|
|
**Use Cases**:
|
|
- **Knowledge Graph Construction**: Integrate GLAM data into Wikidata, DBpedia, YAGO
|
|
- **Entity Linking**: Connect heritage institutions across datasets using owl:sameAs
|
|
- **Ontology Alignment**: Map LinkML schema to CIDOC-CRM, PROV-O, FOAF, Schema.org
|
|
- **SPARQL Federation**: Query UNESCO data alongside Wikidata, GeoNames, VIAF
|
|
|
|
**Key Requirements**:
|
|
- W3C-compliant RDF serialization (Turtle, N-Triples, RDF/XML)
|
|
- Content negotiation (Accept: text/turtle, application/ld+json)
|
|
- Dereferenceable URIs (https://w3id.org/heritage/custodian/{id} resolves)
|
|
- Provenance tracking via PROV-O (prov:wasDerivedFrom, prov:generatedAtTime)
|
|
|
|
**Ontology Alignments**:
|
|
```turtle
|
|
@prefix glam: <https://w3id.org/heritage/custodian/> .
|
|
@prefix cpov: <http://data.europa.eu/m8g/> .
|
|
@prefix schema: <http://schema.org/> .
|
|
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
|
|
@prefix prov: <http://www.w3.org/ns/prov#> .
|
|
|
|
# HeritageCustodian maps to multiple ontologies
|
|
glam:HeritageCustodian
|
|
rdfs:subClassOf cpov:PublicOrganisation ; # EU Core Public Org Vocabulary
|
|
rdfs:subClassOf schema:Museum ; # Schema.org
|
|
rdfs:subClassOf crm:E74_Group ; # CIDOC-CRM cultural organizations
|
|
rdfs:subClassOf prov:Organization . # PROV-O provenance
|
|
|
|
# Specific institution example
|
|
<https://w3id.org/heritage/custodian/fr/louvre>
|
|
a glam:HeritageCustodian, schema:Museum, crm:E74_Group ;
|
|
owl:sameAs <https://www.wikidata.org/wiki/Q19675> ; # Wikidata
|
|
owl:sameAs <http://viaf.org/viaf/139708098> ; # VIAF
|
|
schema:name "Musée du Louvre"@fr, "Louvre Museum"@en ;
|
|
schema:location <https://www.geonames.org/2988507> ; # Paris GeoNames
|
|
prov:wasDerivedFrom <https://whc.unesco.org/en/list/600> .
|
|
```
|
|
|
|
---
|
|
|
|
### 5. Government Heritage Agencies
|
|
|
|
**Profile**: National and regional heritage authorities managing conservation, funding, and policy.
|
|
|
|
**Use Cases**:
|
|
- **Policy Analysis**: Assess coverage of heritage institutions across regions
|
|
- **Funding Allocation**: Identify under-resourced institutions at UNESCO sites
|
|
- **Conservation Planning**: Track organizational changes (mergers, closures) affecting site management
|
|
- **International Cooperation**: Coordinate transnational heritage projects
|
|
|
|
**Key Requirements**:
|
|
- Administrative metadata (governance structure, parent organizations)
|
|
- Change event tracking (mergers, relocations, name changes)
|
|
- Collection scope and extent (for resource planning)
|
|
- Data quality tiers (TIER_1 authoritative data preferred)
|
|
|
|
**Example Use Case - Netherlands**:
|
|
```yaml
|
|
# Dutch government analyzing UNESCO site coverage
|
|
Query: "List all GLAM institutions at UNESCO sites in Netherlands
|
|
with ISIL codes and connection to Archieven.nl or Collectie Nederland"
|
|
|
|
Result:
|
|
- name: Nationaal Archief
|
|
location: The Hague
|
|
unesco_site: "Seventeenth-century canal ring area of Amsterdam inside the Singelgracht"
|
|
isil_code: NL-HaDNA
|
|
platforms:
|
|
- Archieven.nl
|
|
- Netwerk Digitaal Erfgoed
|
|
data_tier: TIER_1_AUTHORITATIVE
|
|
```
|
|
|
|
---
|
|
|
|
### 6. Machine Learning and AI Research
|
|
|
|
**Profile**: Data scientists building AI models for heritage data analysis, image recognition, and NLP.
|
|
|
|
**Use Cases**:
|
|
- **Named Entity Recognition**: Train models to extract heritage institutions from text
|
|
- **Image Classification**: Identify institutional logos, building facades from photos
|
|
- **Relationship Extraction**: Discover implicit connections between institutions
|
|
- **Data Quality Models**: Predict confidence scores for uncertain extractions
|
|
|
|
**Key Requirements**:
|
|
- Structured training data (JSON, Parquet formats)
|
|
- Rich provenance metadata (confidence scores, extraction methods)
|
|
- Large-scale exports (Parquet for efficient columnar storage)
|
|
- Reproducibility (deterministic UUID v5 identifiers)
|
|
|
|
**Example Training Data Export**:
|
|
```python
|
|
# Export UNESCO institutions as Parquet for ML training
|
|
import pandas as pd
|
|
import pyarrow.parquet as pq
|
|
|
|
df = pd.DataFrame([
|
|
{
|
|
'institution_name': 'Bibliothèque nationale de France',
|
|
'institution_type': 'LIBRARY',
|
|
'country': 'FR',
|
|
'unesco_whc_id': 600,
|
|
'wikidata_id': 'Q193563',
|
|
'data_tier': 'TIER_1_AUTHORITATIVE',
|
|
'confidence_score': 1.0
|
|
},
|
|
# ... 1,000+ rows
|
|
])
|
|
|
|
df.to_parquet('unesco_glam_institutions_training.parquet', compression='snappy')
|
|
```
|
|
|
|
---
|
|
|
|
## Secondary Data Consumers
|
|
|
|
### 7. Digital Humanities Projects
|
|
|
|
- **Timeline Visualizations**: Display organizational change events over centuries
|
|
- **Network Graphs**: Visualize institutional relationships (parent orgs, partnerships)
|
|
- **Geospatial Analysis**: Map heritage institution density vs. population
|
|
|
|
### 8. Educational Technology Platforms
|
|
|
|
- **Virtual Field Trips**: Link classrooms to UNESCO site collections
|
|
- **Curriculum Development**: Identify heritage institutions for lesson plans
|
|
- **Student Research Tools**: Provide authoritative sources for school projects
|
|
|
|
### 9. Media and Publishing
|
|
|
|
- **Fact-Checking**: Verify heritage institution information for articles
|
|
- **Travel Guides**: Enrich guidebooks with GLAM institution data
|
|
- **Documentary Research**: Locate archival collections for film/TV production
|
|
|
|
### 10. Private Sector Applications
|
|
|
|
- **Art Market**: Verify provenance of objects in collections
|
|
- **Insurance**: Assess collection value at heritage institutions
|
|
- **Cultural Consulting**: Advise on heritage site development
|
|
|
|
---
|
|
|
|
## Cross-Consumer Integration Scenarios
|
|
|
|
### Scenario 1: Researcher → Aggregator → Tourism App
|
|
|
|
1. **Researcher** exports RDF dataset from GLAM project
|
|
2. **Europeana** ingests RDF, creates EDM records
|
|
3. **Tourism App** harvests Europeana API, displays UNESCO sites on map
|
|
4. All three use **GHCID persistent identifier** to maintain referential integrity
|
|
|
|
### Scenario 2: Government → LOD Community → Academic Citation
|
|
|
|
1. **Netherlands Heritage Agency** queries Dutch UNESCO institutions
|
|
2. **Wikidata Editor** enriches Wikidata records with GHCID identifiers
|
|
3. **Academic Paper** cites institution using UUID v5 (stable across systems)
|
|
4. Paper DOI links back to w3id.org/heritage/custodian/{id}
|
|
|
|
### Scenario 3: ML Model → Data Quality → Human Review
|
|
|
|
1. **AI Model** extracts institutions from conversation text (TIER_4 confidence)
|
|
2. **Validation Script** cross-references against UNESCO TIER_1 data
|
|
3. **Conflicts Flagged** for manual review by heritage professionals
|
|
4. **Updated Records** re-exported with corrected provenance metadata
|
|
|
|
---
|
|
|
|
## Data Access Methods
|
|
|
|
### REST API (Planned)
|
|
|
|
```
|
|
GET /api/v1/heritage/custodians
|
|
GET /api/v1/heritage/custodians/{ghcid}
|
|
GET /api/v1/heritage/custodians?country=FR&institution_type=MUSEUM
|
|
GET /api/v1/heritage/custodians?unesco_whc_id=600
|
|
```
|
|
|
|
### SPARQL Endpoint (Planned)
|
|
|
|
```
|
|
POST https://glam.example.org/sparql
|
|
Content-Type: application/sparql-query
|
|
|
|
SELECT ?custodian ?name WHERE {
|
|
?custodian glam:data_source "UNESCO_WORLD_HERITAGE" ;
|
|
glam:name ?name .
|
|
}
|
|
```
|
|
|
|
### File Exports
|
|
|
|
- **JSON-LD**: `exports/unesco_glam_institutions.jsonld`
|
|
- **RDF/Turtle**: `exports/unesco_glam_institutions.ttl`
|
|
- **CSV**: `exports/unesco_glam_institutions.csv`
|
|
- **Parquet**: `exports/unesco_glam_institutions.parquet`
|
|
- **SQLite**: `exports/glam_dataset.db`
|
|
|
|
### Content Negotiation
|
|
|
|
```bash
|
|
# Request Turtle format
|
|
curl -H "Accept: text/turtle" https://w3id.org/heritage/custodian/fr/louvre
|
|
|
|
# Request JSON-LD format
|
|
curl -H "Accept: application/ld+json" https://w3id.org/heritage/custodian/fr/louvre
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Requirements
|
|
|
|
### Query Performance
|
|
|
|
- **Single Institution Lookup**: < 50ms (by GHCID or UUID)
|
|
- **Geospatial Queries**: < 500ms (5km radius, PostGIS optimized)
|
|
- **Full Dataset Export**: < 5 minutes (1,000+ institutions to Parquet)
|
|
|
|
### Data Freshness
|
|
|
|
- **UNESCO API Sync**: Weekly (UNESCO updates sites ~1x/year)
|
|
- **Wikidata Enrichment**: Monthly (community-driven updates)
|
|
- **Provenance Updates**: On-demand (when extraction methods improve)
|
|
|
|
### Scalability Targets
|
|
|
|
- **Initial Load**: 1,000+ UNESCO site institutions
|
|
- **3-Year Projection**: 5,000+ institutions (including regional sites)
|
|
- **Query Load**: 1,000 requests/day (research community usage)
|
|
|
|
---
|
|
|
|
## Privacy and Licensing Considerations
|
|
|
|
### Data Licensing
|
|
|
|
- **UNESCO Data**: Public domain (UN works not subject to copyright)
|
|
- **Wikidata IDs**: CC0 (public domain dedication)
|
|
- **GLAM Project Schema**: CC-BY 4.0 (attribution required)
|
|
- **Aggregated Dataset**: CC0 (maximize reusability)
|
|
|
|
### Privacy Compliance
|
|
|
|
- **No Personal Data**: Institutional records only (no staff names unless public officials)
|
|
- **GDPR Compliance**: Not applicable (organizations, not individuals)
|
|
- **Embargo Periods**: Respect institutional requests to delay publication (rare)
|
|
|
|
### Attribution Requirements
|
|
|
|
When using GLAM Dataset with UNESCO data:
|
|
|
|
```
|
|
Citation: "Global GLAM Dataset - UNESCO World Heritage Sites.
|
|
Retrieved from https://w3id.org/heritage/custodian/.
|
|
Data sourced from UNESCO World Heritage Centre (whc.unesco.org).
|
|
Licensed under CC0 1.0 Universal."
|
|
```
|
|
|
|
---
|
|
|
|
## Success Metrics for Consumers
|
|
|
|
### Adoption Metrics
|
|
|
|
- **Academic Citations**: 10+ papers citing GHCID identifiers within 1 year
|
|
- **Aggregator Integrations**: 3+ platforms (Europeana, DPLA, regional) within 18 months
|
|
- **API Usage**: 500+ unique users within 6 months
|
|
- **Data Downloads**: 100+ dataset exports per month
|
|
|
|
### Quality Metrics
|
|
|
|
- **Identifier Resolution**: 99% of GHCIDs resolve to valid RDF
|
|
- **Cross-platform Consistency**: 95%+ match rate when cross-referencing with Wikidata
|
|
- **Provenance Completeness**: 100% of records have extraction_date and data_source
|
|
- **Error Reports**: < 1% of records flagged for correction by community
|
|
|
|
### Impact Metrics
|
|
|
|
- **Wikidata Enrichment**: 200+ new/improved Wikidata entries for heritage institutions
|
|
- **Tourism App Integrations**: 5+ apps using geospatial API
|
|
- **Research Grants**: 3+ funded projects using GLAM Dataset as infrastructure
|
|
- **Policy Citations**: 2+ government reports referencing the dataset
|
|
|
|
---
|
|
|
|
## Consumer Feedback Mechanisms
|
|
|
|
### GitHub Issues
|
|
|
|
- **Bug Reports**: Schema validation errors, broken identifiers
|
|
- **Feature Requests**: New export formats, additional metadata fields
|
|
- **Data Corrections**: Incorrect institution types, location errors
|
|
|
|
### Community Forum (Planned)
|
|
|
|
- **Use Case Sharing**: Researchers describe how they use the data
|
|
- **Best Practices**: Documentation for common integration patterns
|
|
- **Office Hours**: Monthly Q&A sessions with maintainers
|
|
|
|
### API Analytics
|
|
|
|
- **Usage Tracking**: Monitor which endpoints/filters are most popular
|
|
- **Error Logging**: Identify common query mistakes (improve docs)
|
|
- **Performance Monitoring**: Detect slow queries, optimize indexes
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
**Document Dependencies**:
|
|
- ✅ `01-dependencies.md` - Technical dependencies identified
|
|
- ✅ `02-consumers.md` - **THIS DOCUMENT** - Use cases defined
|
|
- ⏳ `03-implementation-phases.md` - Development timeline (next)
|
|
- ⏳ `04-tdd-strategy.md` - Test-driven development plan
|
|
- ⏳ `05-design-patterns.md` - Architectural patterns
|
|
- ⏳ `06-linkml-map-schema.md` - Data transformation rules
|
|
|
|
**Action Items**:
|
|
1. Validate consumer requirements with sample stakeholder interviews
|
|
2. Design REST API endpoints matching use case queries
|
|
3. Create LinkML → EDM transformation for Europeana integration
|
|
4. Implement content negotiation for RDF/JSON-LD
|
|
|
|
---
|
|
|
|
**Document Status**: Complete
|
|
**Review Needed**: Stakeholder validation of use cases
|
|
**Version**: 1.0
|