17 KiB
UNESCO Data Consumers and Use Cases
Project: Global GLAM Dataset - UNESCO World Heritage Sites Extraction
Document: 02 - Data Consumers
Version: 1.0
Date: 2025-11-09
Status: Draft
Executive Summary
This document identifies the primary consumers and use cases for UNESCO World Heritage Site data integrated into the Global GLAM Dataset. UNESCO sites represent authoritative (TIER_1) heritage custodian data with global coverage across 167 countries, making them valuable for diverse stakeholders ranging from academic researchers to tourism applications.
Primary Data Consumers
1. Heritage Researchers and Academic Institutions
Profile: Scholars studying global cultural heritage networks, institutional relationships, and collection preservation patterns.
Use Cases:
- Network Analysis: Mapping relationships between UNESCO sites and regional/national GLAM institutions
- Collection Provenance Research: Tracing cultural objects through institutional history (mergers, relocations, repatriation)
- Comparative Studies: Cross-regional analysis of heritage management practices
- Citation Systems: Persistent identifiers (GHCID, UUID v5) for academic references
Key Requirements:
- High-quality provenance metadata (data_tier: TIER_1_AUTHORITATIVE)
- Complete identifier sets (UNESCO WHC ID, Wikidata Q-numbers, ISIL codes where applicable)
- Temporal data (founding dates, organizational change events)
- Multilingual name support (UNESCO sites have names in multiple languages)
Integration Points:
- Export as RDF/Turtle for SPARQL queries
- JSON-LD for web-based discovery
- CSV exports for statistical analysis (R, Python pandas)
Example Query:
# Find all UNESCO museums in Latin America with digital platforms
PREFIX glam: <https://w3id.org/heritage/custodian/>
PREFIX schema: <http://schema.org/>
SELECT ?museum ?name ?platform_url WHERE {
?museum a glam:HeritageCustodian ;
glam:institution_type "MUSEUM" ;
glam:data_source "UNESCO_WORLD_HERITAGE" ;
glam:location/glam:region "Latin America" ;
glam:name ?name ;
glam:digital_platforms/glam:platform_url ?platform_url .
}
2. Cultural Heritage Aggregators
Profile: Large-scale aggregation platforms consolidating heritage data from multiple sources.
Target Platforms:
- Europeana (European cultural heritage)
- Digital Public Library of America (DPLA) (US heritage)
- Trove (Australian heritage)
- Collectie Nederland (Dutch heritage - already integrated)
- Regional aggregators (e.g., Biblioteca Digital Hispánica, Brasiliana Digital)
Use Cases:
- Cross-platform Linking: Connect UNESCO sites to digitized collections in aggregators
- Authority Control: Use GHCID/Wikidata Q-numbers to deduplicate institution records
- Discovery Enhancement: Enrich search results with UNESCO designation context
- Geospatial Search: Locate heritage collections near World Heritage Sites
Key Requirements:
- EDM (Europeana Data Model) compatibility via RDF export
- Schema.org/JSON-LD for web harvesters
- Stable persistent identifiers (UUID v5 for deterministic references)
- GeoJSON exports for map-based discovery
Integration Pattern:
# Europeana EDM mapping
edm:ProvidedCHO:
- dc:title: "Bibliothèque nationale de France"
edm:type: "TEXT" # Collection type
dcterms:spatial: "https://www.wikidata.org/wiki/Q90" # Paris
owl:sameAs:
- "https://w3id.org/heritage/custodian/fr/bnf" # GHCID
- "https://www.wikidata.org/wiki/Q193563" # Wikidata
- "https://viaf.org/viaf/137156502737605171529" # VIAF
edm:WebResource:
- edm:isShownAt: "https://www.bnf.fr"
dcterms:isPartOf: "https://whc.unesco.org/en/list/600" # UNESCO WHC ID
3. Tourism and Cultural Sector Applications
Profile: Public-facing applications helping visitors discover heritage sites and their collections.
Use Cases:
- Heritage Tourism Apps: Guide apps showing museums/archives at UNESCO sites
- Educational Platforms: Interactive learning tools for students visiting World Heritage Sites
- Virtual Tours: 360° tours linking physical sites to digital collections
- Event Planning: Identify exhibition spaces, lecture halls, conservation labs at UNESCO sites
Key Requirements:
- Geospatial data (lat/lon, GeoJSON boundaries)
- Operating hours, contact information (when available)
- Digital platform URLs (virtual tours, online exhibitions)
- Accessibility information (wheelchair access, multilingual guides)
API Integration Example:
// REST API endpoint for tourism apps
GET /api/v1/heritage/custodians?near_lat=48.8566&near_lon=2.3522&radius=5km&institution_type=MUSEUM
Response:
{
"results": [
{
"id": "https://w3id.org/heritage/custodian/fr/louvre",
"name": "Musée du Louvre",
"institution_type": "MUSEUM",
"location": {
"city": "Paris",
"country": "FR",
"coordinates": [48.8606, 2.3376]
},
"unesco_site": {
"whc_id": 600,
"name": "Paris, Banks of the Seine"
},
"digital_platforms": [
{
"platform_name": "Louvre Collections Database",
"platform_url": "https://collections.louvre.fr"
}
]
}
]
}
4. Linked Open Data (LOD) Community
Profile: Semantic web developers building knowledge graphs and linked data applications.
Use Cases:
- Knowledge Graph Construction: Integrate GLAM data into Wikidata, DBpedia, YAGO
- Entity Linking: Connect heritage institutions across datasets using owl:sameAs
- Ontology Alignment: Map LinkML schema to CIDOC-CRM, PROV-O, FOAF, Schema.org
- SPARQL Federation: Query UNESCO data alongside Wikidata, GeoNames, VIAF
Key Requirements:
- W3C-compliant RDF serialization (Turtle, N-Triples, RDF/XML)
- Content negotiation (Accept: text/turtle, application/ld+json)
- Dereferenceable URIs (https://w3id.org/heritage/custodian/{id} resolves)
- Provenance tracking via PROV-O (prov:wasDerivedFrom, prov:generatedAtTime)
Ontology Alignments:
@prefix glam: <https://w3id.org/heritage/custodian/> .
@prefix cpov: <http://data.europa.eu/m8g/> .
@prefix schema: <http://schema.org/> .
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
# HeritageCustodian maps to multiple ontologies
glam:HeritageCustodian
rdfs:subClassOf cpov:PublicOrganisation ; # EU Core Public Org Vocabulary
rdfs:subClassOf schema:Museum ; # Schema.org
rdfs:subClassOf crm:E74_Group ; # CIDOC-CRM cultural organizations
rdfs:subClassOf prov:Organization . # PROV-O provenance
# Specific institution example
<https://w3id.org/heritage/custodian/fr/louvre>
a glam:HeritageCustodian, schema:Museum, crm:E74_Group ;
owl:sameAs <https://www.wikidata.org/wiki/Q19675> ; # Wikidata
owl:sameAs <http://viaf.org/viaf/139708098> ; # VIAF
schema:name "Musée du Louvre"@fr, "Louvre Museum"@en ;
schema:location <https://www.geonames.org/2988507> ; # Paris GeoNames
prov:wasDerivedFrom <https://whc.unesco.org/en/list/600> .
5. Government Heritage Agencies
Profile: National and regional heritage authorities managing conservation, funding, and policy.
Use Cases:
- Policy Analysis: Assess coverage of heritage institutions across regions
- Funding Allocation: Identify under-resourced institutions at UNESCO sites
- Conservation Planning: Track organizational changes (mergers, closures) affecting site management
- International Cooperation: Coordinate transnational heritage projects
Key Requirements:
- Administrative metadata (governance structure, parent organizations)
- Change event tracking (mergers, relocations, name changes)
- Collection scope and extent (for resource planning)
- Data quality tiers (TIER_1 authoritative data preferred)
Example Use Case - Netherlands:
# Dutch government analyzing UNESCO site coverage
Query: "List all GLAM institutions at UNESCO sites in Netherlands
with ISIL codes and connection to Archieven.nl or Collectie Nederland"
Result:
- name: Nationaal Archief
location: The Hague
unesco_site: "Seventeenth-century canal ring area of Amsterdam inside the Singelgracht"
isil_code: NL-HaDNA
platforms:
- Archieven.nl
- Netwerk Digitaal Erfgoed
data_tier: TIER_1_AUTHORITATIVE
6. Machine Learning and AI Research
Profile: Data scientists building AI models for heritage data analysis, image recognition, and NLP.
Use Cases:
- Named Entity Recognition: Train models to extract heritage institutions from text
- Image Classification: Identify institutional logos, building facades from photos
- Relationship Extraction: Discover implicit connections between institutions
- Data Quality Models: Predict confidence scores for uncertain extractions
Key Requirements:
- Structured training data (JSON, Parquet formats)
- Rich provenance metadata (confidence scores, extraction methods)
- Large-scale exports (Parquet for efficient columnar storage)
- Reproducibility (deterministic UUID v5 identifiers)
Example Training Data Export:
# Export UNESCO institutions as Parquet for ML training
import pandas as pd
import pyarrow.parquet as pq
df = pd.DataFrame([
{
'institution_name': 'Bibliothèque nationale de France',
'institution_type': 'LIBRARY',
'country': 'FR',
'unesco_whc_id': 600,
'wikidata_id': 'Q193563',
'data_tier': 'TIER_1_AUTHORITATIVE',
'confidence_score': 1.0
},
# ... 1,000+ rows
])
df.to_parquet('unesco_glam_institutions_training.parquet', compression='snappy')
Secondary Data Consumers
7. Digital Humanities Projects
- Timeline Visualizations: Display organizational change events over centuries
- Network Graphs: Visualize institutional relationships (parent orgs, partnerships)
- Geospatial Analysis: Map heritage institution density vs. population
8. Educational Technology Platforms
- Virtual Field Trips: Link classrooms to UNESCO site collections
- Curriculum Development: Identify heritage institutions for lesson plans
- Student Research Tools: Provide authoritative sources for school projects
9. Media and Publishing
- Fact-Checking: Verify heritage institution information for articles
- Travel Guides: Enrich guidebooks with GLAM institution data
- Documentary Research: Locate archival collections for film/TV production
10. Private Sector Applications
- Art Market: Verify provenance of objects in collections
- Insurance: Assess collection value at heritage institutions
- Cultural Consulting: Advise on heritage site development
Cross-Consumer Integration Scenarios
Scenario 1: Researcher → Aggregator → Tourism App
- Researcher exports RDF dataset from GLAM project
- Europeana ingests RDF, creates EDM records
- Tourism App harvests Europeana API, displays UNESCO sites on map
- All three use GHCID persistent identifier to maintain referential integrity
Scenario 2: Government → LOD Community → Academic Citation
- Netherlands Heritage Agency queries Dutch UNESCO institutions
- Wikidata Editor enriches Wikidata records with GHCID identifiers
- Academic Paper cites institution using UUID v5 (stable across systems)
- Paper DOI links back to w3id.org/heritage/custodian/{id}
Scenario 3: ML Model → Data Quality → Human Review
- AI Model extracts institutions from conversation text (TIER_4 confidence)
- Validation Script cross-references against UNESCO TIER_1 data
- Conflicts Flagged for manual review by heritage professionals
- Updated Records re-exported with corrected provenance metadata
Data Access Methods
REST API (Planned)
GET /api/v1/heritage/custodians
GET /api/v1/heritage/custodians/{ghcid}
GET /api/v1/heritage/custodians?country=FR&institution_type=MUSEUM
GET /api/v1/heritage/custodians?unesco_whc_id=600
SPARQL Endpoint (Planned)
POST https://glam.example.org/sparql
Content-Type: application/sparql-query
SELECT ?custodian ?name WHERE {
?custodian glam:data_source "UNESCO_WORLD_HERITAGE" ;
glam:name ?name .
}
File Exports
- JSON-LD:
exports/unesco_glam_institutions.jsonld - RDF/Turtle:
exports/unesco_glam_institutions.ttl - CSV:
exports/unesco_glam_institutions.csv - Parquet:
exports/unesco_glam_institutions.parquet - SQLite:
exports/glam_dataset.db
Content Negotiation
# Request Turtle format
curl -H "Accept: text/turtle" https://w3id.org/heritage/custodian/fr/louvre
# Request JSON-LD format
curl -H "Accept: application/ld+json" https://w3id.org/heritage/custodian/fr/louvre
Performance Requirements
Query Performance
- Single Institution Lookup: < 50ms (by GHCID or UUID)
- Geospatial Queries: < 500ms (5km radius, PostGIS optimized)
- Full Dataset Export: < 5 minutes (1,000+ institutions to Parquet)
Data Freshness
- UNESCO API Sync: Weekly (UNESCO updates sites ~1x/year)
- Wikidata Enrichment: Monthly (community-driven updates)
- Provenance Updates: On-demand (when extraction methods improve)
Scalability Targets
- Initial Load: 1,000+ UNESCO site institutions
- 3-Year Projection: 5,000+ institutions (including regional sites)
- Query Load: 1,000 requests/day (research community usage)
Privacy and Licensing Considerations
Data Licensing
- UNESCO Data: Public domain (UN works not subject to copyright)
- Wikidata IDs: CC0 (public domain dedication)
- GLAM Project Schema: CC-BY 4.0 (attribution required)
- Aggregated Dataset: CC0 (maximize reusability)
Privacy Compliance
- No Personal Data: Institutional records only (no staff names unless public officials)
- GDPR Compliance: Not applicable (organizations, not individuals)
- Embargo Periods: Respect institutional requests to delay publication (rare)
Attribution Requirements
When using GLAM Dataset with UNESCO data:
Citation: "Global GLAM Dataset - UNESCO World Heritage Sites.
Retrieved from https://w3id.org/heritage/custodian/.
Data sourced from UNESCO World Heritage Centre (whc.unesco.org).
Licensed under CC0 1.0 Universal."
Success Metrics for Consumers
Adoption Metrics
- Academic Citations: 10+ papers citing GHCID identifiers within 1 year
- Aggregator Integrations: 3+ platforms (Europeana, DPLA, regional) within 18 months
- API Usage: 500+ unique users within 6 months
- Data Downloads: 100+ dataset exports per month
Quality Metrics
- Identifier Resolution: 99% of GHCIDs resolve to valid RDF
- Cross-platform Consistency: 95%+ match rate when cross-referencing with Wikidata
- Provenance Completeness: 100% of records have extraction_date and data_source
- Error Reports: < 1% of records flagged for correction by community
Impact Metrics
- Wikidata Enrichment: 200+ new/improved Wikidata entries for heritage institutions
- Tourism App Integrations: 5+ apps using geospatial API
- Research Grants: 3+ funded projects using GLAM Dataset as infrastructure
- Policy Citations: 2+ government reports referencing the dataset
Consumer Feedback Mechanisms
GitHub Issues
- Bug Reports: Schema validation errors, broken identifiers
- Feature Requests: New export formats, additional metadata fields
- Data Corrections: Incorrect institution types, location errors
Community Forum (Planned)
- Use Case Sharing: Researchers describe how they use the data
- Best Practices: Documentation for common integration patterns
- Office Hours: Monthly Q&A sessions with maintainers
API Analytics
- Usage Tracking: Monitor which endpoints/filters are most popular
- Error Logging: Identify common query mistakes (improve docs)
- Performance Monitoring: Detect slow queries, optimize indexes
Next Steps
Document Dependencies:
- ✅
01-dependencies.md- Technical dependencies identified - ✅
02-consumers.md- THIS DOCUMENT - Use cases defined - ⏳
03-implementation-phases.md- Development timeline (next) - ⏳
04-tdd-strategy.md- Test-driven development plan - ⏳
05-design-patterns.md- Architectural patterns - ⏳
06-linkml-map-schema.md- Data transformation rules
Action Items:
- Validate consumer requirements with sample stakeholder interviews
- Design REST API endpoints matching use case queries
- Create LinkML → EDM transformation for Europeana integration
- Implement content negotiation for RDF/JSON-LD
Document Status: Complete
Review Needed: Stakeholder validation of use cases
Version: 1.0