# UNESCO Data Consumers and Use Cases
**Project**: Global GLAM Dataset - UNESCO World Heritage Sites Extraction
**Document**: 02 - Data Consumers
**Version**: 1.0
**Date**: 2025-11-09
**Status**: Draft
---
## Executive Summary
This document identifies the primary consumers and use cases for UNESCO World Heritage Site data integrated into the Global GLAM Dataset. UNESCO sites represent authoritative (TIER_1) heritage custodian data with global coverage across 167 countries, making them valuable for diverse stakeholders ranging from academic researchers to tourism applications.
---
## Primary Data Consumers
### 1. Heritage Researchers and Academic Institutions
**Profile**: Scholars studying global cultural heritage networks, institutional relationships, and collection preservation patterns.
**Use Cases**:
- **Network Analysis**: Mapping relationships between UNESCO sites and regional/national GLAM institutions
- **Collection Provenance Research**: Tracing cultural objects through institutional history (mergers, relocations, repatriation)
- **Comparative Studies**: Cross-regional analysis of heritage management practices
- **Citation Systems**: Persistent identifiers (GHCID, UUID v5) for academic references
**Key Requirements**:
- High-quality provenance metadata (data_tier: TIER_1_AUTHORITATIVE)
- Complete identifier sets (UNESCO WHC ID, Wikidata Q-numbers, ISIL codes where applicable)
- Temporal data (founding dates, organizational change events)
- Multilingual name support (UNESCO sites have names in multiple languages)
**Integration Points**:
- Export as RDF/Turtle for SPARQL queries
- JSON-LD for web-based discovery
- CSV exports for statistical analysis (R, Python pandas)
**Example Query**:
```sparql
# Find all UNESCO museums in Latin America with digital platforms
PREFIX glam:
PREFIX schema:
SELECT ?museum ?name ?platform_url WHERE {
?museum a glam:HeritageCustodian ;
glam:institution_type "MUSEUM" ;
glam:data_source "UNESCO_WORLD_HERITAGE" ;
glam:location/glam:region "Latin America" ;
glam:name ?name ;
glam:digital_platforms/glam:platform_url ?platform_url .
}
```
---
### 2. Cultural Heritage Aggregators
**Profile**: Large-scale aggregation platforms consolidating heritage data from multiple sources.
**Target Platforms**:
- **Europeana** (European cultural heritage)
- **Digital Public Library of America (DPLA)** (US heritage)
- **Trove** (Australian heritage)
- **Collectie Nederland** (Dutch heritage - already integrated)
- **Regional aggregators** (e.g., Biblioteca Digital Hispánica, Brasiliana Digital)
**Use Cases**:
- **Cross-platform Linking**: Connect UNESCO sites to digitized collections in aggregators
- **Authority Control**: Use GHCID/Wikidata Q-numbers to deduplicate institution records
- **Discovery Enhancement**: Enrich search results with UNESCO designation context
- **Geospatial Search**: Locate heritage collections near World Heritage Sites
**Key Requirements**:
- EDM (Europeana Data Model) compatibility via RDF export
- Schema.org/JSON-LD for web harvesters
- Stable persistent identifiers (UUID v5 for deterministic references)
- GeoJSON exports for map-based discovery
**Integration Pattern**:
```yaml
# Europeana EDM mapping
edm:ProvidedCHO:
- dc:title: "Bibliothèque nationale de France"
edm:type: "TEXT" # Collection type
dcterms:spatial: "https://www.wikidata.org/wiki/Q90" # Paris
owl:sameAs:
- "https://w3id.org/heritage/custodian/fr/bnf" # GHCID
- "https://www.wikidata.org/wiki/Q193563" # Wikidata
- "https://viaf.org/viaf/137156502737605171529" # VIAF
edm:WebResource:
- edm:isShownAt: "https://www.bnf.fr"
dcterms:isPartOf: "https://whc.unesco.org/en/list/600" # UNESCO WHC ID
```
---
### 3. Tourism and Cultural Sector Applications
**Profile**: Public-facing applications helping visitors discover heritage sites and their collections.
**Use Cases**:
- **Heritage Tourism Apps**: Guide apps showing museums/archives at UNESCO sites
- **Educational Platforms**: Interactive learning tools for students visiting World Heritage Sites
- **Virtual Tours**: 360° tours linking physical sites to digital collections
- **Event Planning**: Identify exhibition spaces, lecture halls, conservation labs at UNESCO sites
**Key Requirements**:
- Geospatial data (lat/lon, GeoJSON boundaries)
- Operating hours, contact information (when available)
- Digital platform URLs (virtual tours, online exhibitions)
- Accessibility information (wheelchair access, multilingual guides)
**API Integration Example**:
```json
// REST API endpoint for tourism apps
GET /api/v1/heritage/custodians?near_lat=48.8566&near_lon=2.3522&radius=5km&institution_type=MUSEUM
Response:
{
"results": [
{
"id": "https://w3id.org/heritage/custodian/fr/louvre",
"name": "Musée du Louvre",
"institution_type": "MUSEUM",
"location": {
"city": "Paris",
"country": "FR",
"coordinates": [48.8606, 2.3376]
},
"unesco_site": {
"whc_id": 600,
"name": "Paris, Banks of the Seine"
},
"digital_platforms": [
{
"platform_name": "Louvre Collections Database",
"platform_url": "https://collections.louvre.fr"
}
]
}
]
}
```
---
### 4. Linked Open Data (LOD) Community
**Profile**: Semantic web developers building knowledge graphs and linked data applications.
**Use Cases**:
- **Knowledge Graph Construction**: Integrate GLAM data into Wikidata, DBpedia, YAGO
- **Entity Linking**: Connect heritage institutions across datasets using owl:sameAs
- **Ontology Alignment**: Map LinkML schema to CIDOC-CRM, PROV-O, FOAF, Schema.org
- **SPARQL Federation**: Query UNESCO data alongside Wikidata, GeoNames, VIAF
**Key Requirements**:
- W3C-compliant RDF serialization (Turtle, N-Triples, RDF/XML)
- Content negotiation (Accept: text/turtle, application/ld+json)
- Dereferenceable URIs (https://w3id.org/heritage/custodian/{id} resolves)
- Provenance tracking via PROV-O (prov:wasDerivedFrom, prov:generatedAtTime)
**Ontology Alignments**:
```turtle
@prefix glam: .
@prefix cpov: .
@prefix schema: .
@prefix crm: .
@prefix prov: .
# HeritageCustodian maps to multiple ontologies
glam:HeritageCustodian
rdfs:subClassOf cpov:PublicOrganisation ; # EU Core Public Org Vocabulary
rdfs:subClassOf schema:Museum ; # Schema.org
rdfs:subClassOf crm:E74_Group ; # CIDOC-CRM cultural organizations
rdfs:subClassOf prov:Organization . # PROV-O provenance
# Specific institution example
a glam:HeritageCustodian, schema:Museum, crm:E74_Group ;
owl:sameAs ; # Wikidata
owl:sameAs ; # VIAF
schema:name "Musée du Louvre"@fr, "Louvre Museum"@en ;
schema:location ; # Paris GeoNames
prov:wasDerivedFrom .
```
---
### 5. Government Heritage Agencies
**Profile**: National and regional heritage authorities managing conservation, funding, and policy.
**Use Cases**:
- **Policy Analysis**: Assess coverage of heritage institutions across regions
- **Funding Allocation**: Identify under-resourced institutions at UNESCO sites
- **Conservation Planning**: Track organizational changes (mergers, closures) affecting site management
- **International Cooperation**: Coordinate transnational heritage projects
**Key Requirements**:
- Administrative metadata (governance structure, parent organizations)
- Change event tracking (mergers, relocations, name changes)
- Collection scope and extent (for resource planning)
- Data quality tiers (TIER_1 authoritative data preferred)
**Example Use Case - Netherlands**:
```yaml
# Dutch government analyzing UNESCO site coverage
Query: "List all GLAM institutions at UNESCO sites in Netherlands
with ISIL codes and connection to Archieven.nl or Collectie Nederland"
Result:
- name: Nationaal Archief
location: The Hague
unesco_site: "Seventeenth-century canal ring area of Amsterdam inside the Singelgracht"
isil_code: NL-HaDNA
platforms:
- Archieven.nl
- Netwerk Digitaal Erfgoed
data_tier: TIER_1_AUTHORITATIVE
```
---
### 6. Machine Learning and AI Research
**Profile**: Data scientists building AI models for heritage data analysis, image recognition, and NLP.
**Use Cases**:
- **Named Entity Recognition**: Train models to extract heritage institutions from text
- **Image Classification**: Identify institutional logos, building facades from photos
- **Relationship Extraction**: Discover implicit connections between institutions
- **Data Quality Models**: Predict confidence scores for uncertain extractions
**Key Requirements**:
- Structured training data (JSON, Parquet formats)
- Rich provenance metadata (confidence scores, extraction methods)
- Large-scale exports (Parquet for efficient columnar storage)
- Reproducibility (deterministic UUID v5 identifiers)
**Example Training Data Export**:
```python
# Export UNESCO institutions as Parquet for ML training
import pandas as pd
import pyarrow.parquet as pq
df = pd.DataFrame([
{
'institution_name': 'Bibliothèque nationale de France',
'institution_type': 'LIBRARY',
'country': 'FR',
'unesco_whc_id': 600,
'wikidata_id': 'Q193563',
'data_tier': 'TIER_1_AUTHORITATIVE',
'confidence_score': 1.0
},
# ... 1,000+ rows
])
df.to_parquet('unesco_glam_institutions_training.parquet', compression='snappy')
```
---
## Secondary Data Consumers
### 7. Digital Humanities Projects
- **Timeline Visualizations**: Display organizational change events over centuries
- **Network Graphs**: Visualize institutional relationships (parent orgs, partnerships)
- **Geospatial Analysis**: Map heritage institution density vs. population
### 8. Educational Technology Platforms
- **Virtual Field Trips**: Link classrooms to UNESCO site collections
- **Curriculum Development**: Identify heritage institutions for lesson plans
- **Student Research Tools**: Provide authoritative sources for school projects
### 9. Media and Publishing
- **Fact-Checking**: Verify heritage institution information for articles
- **Travel Guides**: Enrich guidebooks with GLAM institution data
- **Documentary Research**: Locate archival collections for film/TV production
### 10. Private Sector Applications
- **Art Market**: Verify provenance of objects in collections
- **Insurance**: Assess collection value at heritage institutions
- **Cultural Consulting**: Advise on heritage site development
---
## Cross-Consumer Integration Scenarios
### Scenario 1: Researcher → Aggregator → Tourism App
1. **Researcher** exports RDF dataset from GLAM project
2. **Europeana** ingests RDF, creates EDM records
3. **Tourism App** harvests Europeana API, displays UNESCO sites on map
4. All three use **GHCID persistent identifier** to maintain referential integrity
### Scenario 2: Government → LOD Community → Academic Citation
1. **Netherlands Heritage Agency** queries Dutch UNESCO institutions
2. **Wikidata Editor** enriches Wikidata records with GHCID identifiers
3. **Academic Paper** cites institution using UUID v5 (stable across systems)
4. Paper DOI links back to w3id.org/heritage/custodian/{id}
### Scenario 3: ML Model → Data Quality → Human Review
1. **AI Model** extracts institutions from conversation text (TIER_4 confidence)
2. **Validation Script** cross-references against UNESCO TIER_1 data
3. **Conflicts Flagged** for manual review by heritage professionals
4. **Updated Records** re-exported with corrected provenance metadata
---
## Data Access Methods
### REST API (Planned)
```
GET /api/v1/heritage/custodians
GET /api/v1/heritage/custodians/{ghcid}
GET /api/v1/heritage/custodians?country=FR&institution_type=MUSEUM
GET /api/v1/heritage/custodians?unesco_whc_id=600
```
### SPARQL Endpoint (Planned)
```
POST https://glam.example.org/sparql
Content-Type: application/sparql-query
SELECT ?custodian ?name WHERE {
?custodian glam:data_source "UNESCO_WORLD_HERITAGE" ;
glam:name ?name .
}
```
### File Exports
- **JSON-LD**: `exports/unesco_glam_institutions.jsonld`
- **RDF/Turtle**: `exports/unesco_glam_institutions.ttl`
- **CSV**: `exports/unesco_glam_institutions.csv`
- **Parquet**: `exports/unesco_glam_institutions.parquet`
- **SQLite**: `exports/glam_dataset.db`
### Content Negotiation
```bash
# Request Turtle format
curl -H "Accept: text/turtle" https://w3id.org/heritage/custodian/fr/louvre
# Request JSON-LD format
curl -H "Accept: application/ld+json" https://w3id.org/heritage/custodian/fr/louvre
```
---
## Performance Requirements
### Query Performance
- **Single Institution Lookup**: < 50ms (by GHCID or UUID)
- **Geospatial Queries**: < 500ms (5km radius, PostGIS optimized)
- **Full Dataset Export**: < 5 minutes (1,000+ institutions to Parquet)
### Data Freshness
- **UNESCO API Sync**: Weekly (UNESCO updates sites ~1x/year)
- **Wikidata Enrichment**: Monthly (community-driven updates)
- **Provenance Updates**: On-demand (when extraction methods improve)
### Scalability Targets
- **Initial Load**: 1,000+ UNESCO site institutions
- **3-Year Projection**: 5,000+ institutions (including regional sites)
- **Query Load**: 1,000 requests/day (research community usage)
---
## Privacy and Licensing Considerations
### Data Licensing
- **UNESCO Data**: Public domain (UN works not subject to copyright)
- **Wikidata IDs**: CC0 (public domain dedication)
- **GLAM Project Schema**: CC-BY 4.0 (attribution required)
- **Aggregated Dataset**: CC0 (maximize reusability)
### Privacy Compliance
- **No Personal Data**: Institutional records only (no staff names unless public officials)
- **GDPR Compliance**: Not applicable (organizations, not individuals)
- **Embargo Periods**: Respect institutional requests to delay publication (rare)
### Attribution Requirements
When using GLAM Dataset with UNESCO data:
```
Citation: "Global GLAM Dataset - UNESCO World Heritage Sites.
Retrieved from https://w3id.org/heritage/custodian/.
Data sourced from UNESCO World Heritage Centre (whc.unesco.org).
Licensed under CC0 1.0 Universal."
```
---
## Success Metrics for Consumers
### Adoption Metrics
- **Academic Citations**: 10+ papers citing GHCID identifiers within 1 year
- **Aggregator Integrations**: 3+ platforms (Europeana, DPLA, regional) within 18 months
- **API Usage**: 500+ unique users within 6 months
- **Data Downloads**: 100+ dataset exports per month
### Quality Metrics
- **Identifier Resolution**: 99% of GHCIDs resolve to valid RDF
- **Cross-platform Consistency**: 95%+ match rate when cross-referencing with Wikidata
- **Provenance Completeness**: 100% of records have extraction_date and data_source
- **Error Reports**: < 1% of records flagged for correction by community
### Impact Metrics
- **Wikidata Enrichment**: 200+ new/improved Wikidata entries for heritage institutions
- **Tourism App Integrations**: 5+ apps using geospatial API
- **Research Grants**: 3+ funded projects using GLAM Dataset as infrastructure
- **Policy Citations**: 2+ government reports referencing the dataset
---
## Consumer Feedback Mechanisms
### GitHub Issues
- **Bug Reports**: Schema validation errors, broken identifiers
- **Feature Requests**: New export formats, additional metadata fields
- **Data Corrections**: Incorrect institution types, location errors
### Community Forum (Planned)
- **Use Case Sharing**: Researchers describe how they use the data
- **Best Practices**: Documentation for common integration patterns
- **Office Hours**: Monthly Q&A sessions with maintainers
### API Analytics
- **Usage Tracking**: Monitor which endpoints/filters are most popular
- **Error Logging**: Identify common query mistakes (improve docs)
- **Performance Monitoring**: Detect slow queries, optimize indexes
---
## Next Steps
**Document Dependencies**:
- ✅ `01-dependencies.md` - Technical dependencies identified
- ✅ `02-consumers.md` - **THIS DOCUMENT** - Use cases defined
- ⏳ `03-implementation-phases.md` - Development timeline (next)
- ⏳ `04-tdd-strategy.md` - Test-driven development plan
- ⏳ `05-design-patterns.md` - Architectural patterns
- ⏳ `06-linkml-map-schema.md` - Data transformation rules
**Action Items**:
1. Validate consumer requirements with sample stakeholder interviews
2. Design REST API endpoints matching use case queries
3. Create LinkML → EDM transformation for Europeana integration
4. Implement content negotiation for RDF/JSON-LD
---
**Document Status**: Complete
**Review Needed**: Stakeholder validation of use cases
**Version**: 1.0