# UNESCO Data Consumers and Use Cases **Project**: Global GLAM Dataset - UNESCO World Heritage Sites Extraction **Document**: 02 - Data Consumers **Version**: 1.0 **Date**: 2025-11-09 **Status**: Draft --- ## Executive Summary This document identifies the primary consumers and use cases for UNESCO World Heritage Site data integrated into the Global GLAM Dataset. UNESCO sites represent authoritative (TIER_1) heritage custodian data with global coverage across 167 countries, making them valuable for diverse stakeholders ranging from academic researchers to tourism applications. --- ## Primary Data Consumers ### 1. Heritage Researchers and Academic Institutions **Profile**: Scholars studying global cultural heritage networks, institutional relationships, and collection preservation patterns. **Use Cases**: - **Network Analysis**: Mapping relationships between UNESCO sites and regional/national GLAM institutions - **Collection Provenance Research**: Tracing cultural objects through institutional history (mergers, relocations, repatriation) - **Comparative Studies**: Cross-regional analysis of heritage management practices - **Citation Systems**: Persistent identifiers (GHCID, UUID v5) for academic references **Key Requirements**: - High-quality provenance metadata (data_tier: TIER_1_AUTHORITATIVE) - Complete identifier sets (UNESCO WHC ID, Wikidata Q-numbers, ISIL codes where applicable) - Temporal data (founding dates, organizational change events) - Multilingual name support (UNESCO sites have names in multiple languages) **Integration Points**: - Export as RDF/Turtle for SPARQL queries - JSON-LD for web-based discovery - CSV exports for statistical analysis (R, Python pandas) **Example Query**: ```sparql # Find all UNESCO museums in Latin America with digital platforms PREFIX glam: PREFIX schema: SELECT ?museum ?name ?platform_url WHERE { ?museum a glam:HeritageCustodian ; glam:institution_type "MUSEUM" ; glam:data_source "UNESCO_WORLD_HERITAGE" ; glam:location/glam:region "Latin America" ; glam:name ?name ; glam:digital_platforms/glam:platform_url ?platform_url . } ``` --- ### 2. Cultural Heritage Aggregators **Profile**: Large-scale aggregation platforms consolidating heritage data from multiple sources. **Target Platforms**: - **Europeana** (European cultural heritage) - **Digital Public Library of America (DPLA)** (US heritage) - **Trove** (Australian heritage) - **Collectie Nederland** (Dutch heritage - already integrated) - **Regional aggregators** (e.g., Biblioteca Digital Hispánica, Brasiliana Digital) **Use Cases**: - **Cross-platform Linking**: Connect UNESCO sites to digitized collections in aggregators - **Authority Control**: Use GHCID/Wikidata Q-numbers to deduplicate institution records - **Discovery Enhancement**: Enrich search results with UNESCO designation context - **Geospatial Search**: Locate heritage collections near World Heritage Sites **Key Requirements**: - EDM (Europeana Data Model) compatibility via RDF export - Schema.org/JSON-LD for web harvesters - Stable persistent identifiers (UUID v5 for deterministic references) - GeoJSON exports for map-based discovery **Integration Pattern**: ```yaml # Europeana EDM mapping edm:ProvidedCHO: - dc:title: "Bibliothèque nationale de France" edm:type: "TEXT" # Collection type dcterms:spatial: "https://www.wikidata.org/wiki/Q90" # Paris owl:sameAs: - "https://w3id.org/heritage/custodian/fr/bnf" # GHCID - "https://www.wikidata.org/wiki/Q193563" # Wikidata - "https://viaf.org/viaf/137156502737605171529" # VIAF edm:WebResource: - edm:isShownAt: "https://www.bnf.fr" dcterms:isPartOf: "https://whc.unesco.org/en/list/600" # UNESCO WHC ID ``` --- ### 3. Tourism and Cultural Sector Applications **Profile**: Public-facing applications helping visitors discover heritage sites and their collections. **Use Cases**: - **Heritage Tourism Apps**: Guide apps showing museums/archives at UNESCO sites - **Educational Platforms**: Interactive learning tools for students visiting World Heritage Sites - **Virtual Tours**: 360° tours linking physical sites to digital collections - **Event Planning**: Identify exhibition spaces, lecture halls, conservation labs at UNESCO sites **Key Requirements**: - Geospatial data (lat/lon, GeoJSON boundaries) - Operating hours, contact information (when available) - Digital platform URLs (virtual tours, online exhibitions) - Accessibility information (wheelchair access, multilingual guides) **API Integration Example**: ```json // REST API endpoint for tourism apps GET /api/v1/heritage/custodians?near_lat=48.8566&near_lon=2.3522&radius=5km&institution_type=MUSEUM Response: { "results": [ { "id": "https://w3id.org/heritage/custodian/fr/louvre", "name": "Musée du Louvre", "institution_type": "MUSEUM", "location": { "city": "Paris", "country": "FR", "coordinates": [48.8606, 2.3376] }, "unesco_site": { "whc_id": 600, "name": "Paris, Banks of the Seine" }, "digital_platforms": [ { "platform_name": "Louvre Collections Database", "platform_url": "https://collections.louvre.fr" } ] } ] } ``` --- ### 4. Linked Open Data (LOD) Community **Profile**: Semantic web developers building knowledge graphs and linked data applications. **Use Cases**: - **Knowledge Graph Construction**: Integrate GLAM data into Wikidata, DBpedia, YAGO - **Entity Linking**: Connect heritage institutions across datasets using owl:sameAs - **Ontology Alignment**: Map LinkML schema to CIDOC-CRM, PROV-O, FOAF, Schema.org - **SPARQL Federation**: Query UNESCO data alongside Wikidata, GeoNames, VIAF **Key Requirements**: - W3C-compliant RDF serialization (Turtle, N-Triples, RDF/XML) - Content negotiation (Accept: text/turtle, application/ld+json) - Dereferenceable URIs (https://w3id.org/heritage/custodian/{id} resolves) - Provenance tracking via PROV-O (prov:wasDerivedFrom, prov:generatedAtTime) **Ontology Alignments**: ```turtle @prefix glam: . @prefix cpov: . @prefix schema: . @prefix crm: . @prefix prov: . # HeritageCustodian maps to multiple ontologies glam:HeritageCustodian rdfs:subClassOf cpov:PublicOrganisation ; # EU Core Public Org Vocabulary rdfs:subClassOf schema:Museum ; # Schema.org rdfs:subClassOf crm:E74_Group ; # CIDOC-CRM cultural organizations rdfs:subClassOf prov:Organization . # PROV-O provenance # Specific institution example a glam:HeritageCustodian, schema:Museum, crm:E74_Group ; owl:sameAs ; # Wikidata owl:sameAs ; # VIAF schema:name "Musée du Louvre"@fr, "Louvre Museum"@en ; schema:location ; # Paris GeoNames prov:wasDerivedFrom . ``` --- ### 5. Government Heritage Agencies **Profile**: National and regional heritage authorities managing conservation, funding, and policy. **Use Cases**: - **Policy Analysis**: Assess coverage of heritage institutions across regions - **Funding Allocation**: Identify under-resourced institutions at UNESCO sites - **Conservation Planning**: Track organizational changes (mergers, closures) affecting site management - **International Cooperation**: Coordinate transnational heritage projects **Key Requirements**: - Administrative metadata (governance structure, parent organizations) - Change event tracking (mergers, relocations, name changes) - Collection scope and extent (for resource planning) - Data quality tiers (TIER_1 authoritative data preferred) **Example Use Case - Netherlands**: ```yaml # Dutch government analyzing UNESCO site coverage Query: "List all GLAM institutions at UNESCO sites in Netherlands with ISIL codes and connection to Archieven.nl or Collectie Nederland" Result: - name: Nationaal Archief location: The Hague unesco_site: "Seventeenth-century canal ring area of Amsterdam inside the Singelgracht" isil_code: NL-HaDNA platforms: - Archieven.nl - Netwerk Digitaal Erfgoed data_tier: TIER_1_AUTHORITATIVE ``` --- ### 6. Machine Learning and AI Research **Profile**: Data scientists building AI models for heritage data analysis, image recognition, and NLP. **Use Cases**: - **Named Entity Recognition**: Train models to extract heritage institutions from text - **Image Classification**: Identify institutional logos, building facades from photos - **Relationship Extraction**: Discover implicit connections between institutions - **Data Quality Models**: Predict confidence scores for uncertain extractions **Key Requirements**: - Structured training data (JSON, Parquet formats) - Rich provenance metadata (confidence scores, extraction methods) - Large-scale exports (Parquet for efficient columnar storage) - Reproducibility (deterministic UUID v5 identifiers) **Example Training Data Export**: ```python # Export UNESCO institutions as Parquet for ML training import pandas as pd import pyarrow.parquet as pq df = pd.DataFrame([ { 'institution_name': 'Bibliothèque nationale de France', 'institution_type': 'LIBRARY', 'country': 'FR', 'unesco_whc_id': 600, 'wikidata_id': 'Q193563', 'data_tier': 'TIER_1_AUTHORITATIVE', 'confidence_score': 1.0 }, # ... 1,000+ rows ]) df.to_parquet('unesco_glam_institutions_training.parquet', compression='snappy') ``` --- ## Secondary Data Consumers ### 7. Digital Humanities Projects - **Timeline Visualizations**: Display organizational change events over centuries - **Network Graphs**: Visualize institutional relationships (parent orgs, partnerships) - **Geospatial Analysis**: Map heritage institution density vs. population ### 8. Educational Technology Platforms - **Virtual Field Trips**: Link classrooms to UNESCO site collections - **Curriculum Development**: Identify heritage institutions for lesson plans - **Student Research Tools**: Provide authoritative sources for school projects ### 9. Media and Publishing - **Fact-Checking**: Verify heritage institution information for articles - **Travel Guides**: Enrich guidebooks with GLAM institution data - **Documentary Research**: Locate archival collections for film/TV production ### 10. Private Sector Applications - **Art Market**: Verify provenance of objects in collections - **Insurance**: Assess collection value at heritage institutions - **Cultural Consulting**: Advise on heritage site development --- ## Cross-Consumer Integration Scenarios ### Scenario 1: Researcher → Aggregator → Tourism App 1. **Researcher** exports RDF dataset from GLAM project 2. **Europeana** ingests RDF, creates EDM records 3. **Tourism App** harvests Europeana API, displays UNESCO sites on map 4. All three use **GHCID persistent identifier** to maintain referential integrity ### Scenario 2: Government → LOD Community → Academic Citation 1. **Netherlands Heritage Agency** queries Dutch UNESCO institutions 2. **Wikidata Editor** enriches Wikidata records with GHCID identifiers 3. **Academic Paper** cites institution using UUID v5 (stable across systems) 4. Paper DOI links back to w3id.org/heritage/custodian/{id} ### Scenario 3: ML Model → Data Quality → Human Review 1. **AI Model** extracts institutions from conversation text (TIER_4 confidence) 2. **Validation Script** cross-references against UNESCO TIER_1 data 3. **Conflicts Flagged** for manual review by heritage professionals 4. **Updated Records** re-exported with corrected provenance metadata --- ## Data Access Methods ### REST API (Planned) ``` GET /api/v1/heritage/custodians GET /api/v1/heritage/custodians/{ghcid} GET /api/v1/heritage/custodians?country=FR&institution_type=MUSEUM GET /api/v1/heritage/custodians?unesco_whc_id=600 ``` ### SPARQL Endpoint (Planned) ``` POST https://glam.example.org/sparql Content-Type: application/sparql-query SELECT ?custodian ?name WHERE { ?custodian glam:data_source "UNESCO_WORLD_HERITAGE" ; glam:name ?name . } ``` ### File Exports - **JSON-LD**: `exports/unesco_glam_institutions.jsonld` - **RDF/Turtle**: `exports/unesco_glam_institutions.ttl` - **CSV**: `exports/unesco_glam_institutions.csv` - **Parquet**: `exports/unesco_glam_institutions.parquet` - **SQLite**: `exports/glam_dataset.db` ### Content Negotiation ```bash # Request Turtle format curl -H "Accept: text/turtle" https://w3id.org/heritage/custodian/fr/louvre # Request JSON-LD format curl -H "Accept: application/ld+json" https://w3id.org/heritage/custodian/fr/louvre ``` --- ## Performance Requirements ### Query Performance - **Single Institution Lookup**: < 50ms (by GHCID or UUID) - **Geospatial Queries**: < 500ms (5km radius, PostGIS optimized) - **Full Dataset Export**: < 5 minutes (1,000+ institutions to Parquet) ### Data Freshness - **UNESCO API Sync**: Weekly (UNESCO updates sites ~1x/year) - **Wikidata Enrichment**: Monthly (community-driven updates) - **Provenance Updates**: On-demand (when extraction methods improve) ### Scalability Targets - **Initial Load**: 1,000+ UNESCO site institutions - **3-Year Projection**: 5,000+ institutions (including regional sites) - **Query Load**: 1,000 requests/day (research community usage) --- ## Privacy and Licensing Considerations ### Data Licensing - **UNESCO Data**: Public domain (UN works not subject to copyright) - **Wikidata IDs**: CC0 (public domain dedication) - **GLAM Project Schema**: CC-BY 4.0 (attribution required) - **Aggregated Dataset**: CC0 (maximize reusability) ### Privacy Compliance - **No Personal Data**: Institutional records only (no staff names unless public officials) - **GDPR Compliance**: Not applicable (organizations, not individuals) - **Embargo Periods**: Respect institutional requests to delay publication (rare) ### Attribution Requirements When using GLAM Dataset with UNESCO data: ``` Citation: "Global GLAM Dataset - UNESCO World Heritage Sites. Retrieved from https://w3id.org/heritage/custodian/. Data sourced from UNESCO World Heritage Centre (whc.unesco.org). Licensed under CC0 1.0 Universal." ``` --- ## Success Metrics for Consumers ### Adoption Metrics - **Academic Citations**: 10+ papers citing GHCID identifiers within 1 year - **Aggregator Integrations**: 3+ platforms (Europeana, DPLA, regional) within 18 months - **API Usage**: 500+ unique users within 6 months - **Data Downloads**: 100+ dataset exports per month ### Quality Metrics - **Identifier Resolution**: 99% of GHCIDs resolve to valid RDF - **Cross-platform Consistency**: 95%+ match rate when cross-referencing with Wikidata - **Provenance Completeness**: 100% of records have extraction_date and data_source - **Error Reports**: < 1% of records flagged for correction by community ### Impact Metrics - **Wikidata Enrichment**: 200+ new/improved Wikidata entries for heritage institutions - **Tourism App Integrations**: 5+ apps using geospatial API - **Research Grants**: 3+ funded projects using GLAM Dataset as infrastructure - **Policy Citations**: 2+ government reports referencing the dataset --- ## Consumer Feedback Mechanisms ### GitHub Issues - **Bug Reports**: Schema validation errors, broken identifiers - **Feature Requests**: New export formats, additional metadata fields - **Data Corrections**: Incorrect institution types, location errors ### Community Forum (Planned) - **Use Case Sharing**: Researchers describe how they use the data - **Best Practices**: Documentation for common integration patterns - **Office Hours**: Monthly Q&A sessions with maintainers ### API Analytics - **Usage Tracking**: Monitor which endpoints/filters are most popular - **Error Logging**: Identify common query mistakes (improve docs) - **Performance Monitoring**: Detect slow queries, optimize indexes --- ## Next Steps **Document Dependencies**: - ✅ `01-dependencies.md` - Technical dependencies identified - ✅ `02-consumers.md` - **THIS DOCUMENT** - Use cases defined - ⏳ `03-implementation-phases.md` - Development timeline (next) - ⏳ `04-tdd-strategy.md` - Test-driven development plan - ⏳ `05-design-patterns.md` - Architectural patterns - ⏳ `06-linkml-map-schema.md` - Data transformation rules **Action Items**: 1. Validate consumer requirements with sample stakeholder interviews 2. Design REST API endpoints matching use case queries 3. Create LinkML → EDM transformation for Europeana integration 4. Implement content negotiation for RDF/JSON-LD --- **Document Status**: Complete **Review Needed**: Stakeholder validation of use cases **Version**: 1.0