# Heritage Institution RDF Exports This directory contains **Linked Open Data** exports of heritage institution datasets in W3C-compliant RDF formats. ## Available Datasets ### Denmark 🇩🇰 - COMPLETE (November 2025) **Dataset**: `denmark_complete.*` **Status**: ✅ Production-ready **Last Updated**: 2025-11-19 | Format | File | Size | Use Case | |--------|------|------|----------| | **Turtle** | `denmark_complete.ttl` | 2.27 MB | Human-readable, SPARQL queries | | **RDF/XML** | `denmark_complete.rdf` | 3.96 MB | Machine processing, legacy systems | | **JSON-LD** | `denmark_complete.jsonld` | 5.16 MB | Web APIs, JavaScript applications | | **N-Triples** | `denmark_complete.nt` | 6.24 MB | Line-oriented processing, MapReduce | #### Statistics - **Institutions**: 2,348 (555 libraries, 594 archives, 1,199 branches) - **RDF Triples**: 43,429 - **Ontologies Used**: 9 (CPOV, Schema.org, RICO, ORG, PROV-O, SKOS, Dublin Core, OWL, Heritage) - **Wikidata Links**: 769 institutions (32.8%) - **ISIL Codes**: 555 institutions (23.6%) - **GHCID Identifiers**: 998 institutions (42.5%) #### Coverage by Institution Type | Type | Count | ISIL | GHCID | Wikidata | |------|-------|------|-------|----------| | **Main Libraries** | 555 | 100% | 78% | High | | **Archives** | 594 | 0% (by design) | 95% | Moderate | | **Library Branches** | 1,199 | Inherited | 0% (by design) | Low | --- ## Ontology Alignment All RDF exports follow these international standards: ### Core Ontologies 1. **CPOV** (Core Public Organisation Vocabulary) - Namespace: `http://data.europa.eu/m8g/` - Usage: Public sector organization type - Spec: https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/core-public-organisation-vocabulary 2. **Schema.org** - Namespace: `http://schema.org/` - Usage: Names, addresses, descriptions, types - Types: `schema:Library`, `schema:ArchiveOrganization`, `schema:Museum` - Spec: https://schema.org/ 3. **SKOS** (Simple Knowledge Organization System) - Namespace: `http://www.w3.org/2004/02/skos/core#` - Usage: Preferred/alternative labels - Spec: https://www.w3.org/TR/skos-reference/ ### Specialized Ontologies 4. **RICO** (Records in Contexts Ontology) - Namespace: `https://www.ica.org/standards/RiC/ontology#` - Usage: Archival description (for archives) - Spec: https://www.ica.org/standards/RiC/ontology 5. **ORG** (W3C Organization Ontology) - Namespace: `http://www.w3.org/ns/org#` - Usage: Hierarchical relationships (library branches → main libraries) - Spec: https://www.w3.org/TR/vocab-org/ 6. **PROV-O** (Provenance Ontology) - Namespace: `http://www.w3.org/ns/prov#` - Usage: Data provenance tracking - Spec: https://www.w3.org/TR/prov-o/ ### Linking Ontologies 7. **OWL** (Web Ontology Language) - Namespace: `http://www.w3.org/2002/07/owl#` - Usage: Semantic equivalence (`owl:sameAs` for Wikidata links) - Spec: https://www.w3.org/TR/owl2-primer/ 8. **Dublin Core Terms** - Namespace: `http://purl.org/dc/terms/` - Usage: Identifiers, descriptions, metadata - Spec: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ 9. **Heritage (Project-Specific)** - Namespace: `https://w3id.org/heritage/custodian/` - Usage: GHCID identifiers, UUID properties - Spec: See `/docs/PERSISTENT_IDENTIFIERS.md` --- ## SPARQL Query Examples ### Query 1: Find all libraries in a specific city ```sparql PREFIX schema: PREFIX cpov: SELECT ?library ?name ?address WHERE { ?library a cpov:PublicOrganisation, schema:Library . ?library schema:name ?name . ?library schema:address ?addrNode . ?addrNode schema:addressLocality "København K" . ?addrNode schema:streetAddress ?address . } ``` ### Query 2: Find all institutions with Wikidata links ```sparql PREFIX owl: PREFIX schema: SELECT ?institution ?name ?wikidataID WHERE { ?institution schema:name ?name . ?institution owl:sameAs ?wikidataURI . FILTER(STRSTARTS(STR(?wikidataURI), "http://www.wikidata.org/entity/Q")) BIND(STRAFTER(STR(?wikidataURI), "http://www.wikidata.org/entity/") AS ?wikidataID) } ``` ### Query 3: Find library hierarchies (parent-child branches) ```sparql PREFIX org: PREFIX schema: SELECT ?parent ?parentName ?child ?childName WHERE { ?child org:subOrganizationOf ?parent . ?parent schema:name ?parentName . ?child schema:name ?childName . } LIMIT 100 ``` ### Query 4: Count institutions by type ```sparql PREFIX schema: SELECT ?type (COUNT(?inst) AS ?count) WHERE { ?inst a ?type . FILTER(?type IN (schema:Library, schema:ArchiveOrganization, schema:Museum)) } GROUP BY ?type ``` ### Query 5: Find archives with specific ISIL codes ```sparql PREFIX dcterms: PREFIX schema: SELECT ?archive ?name ?isil WHERE { ?archive a schema:ArchiveOrganization . ?archive schema:name ?name . ?archive dcterms:identifier ?isil . FILTER(STRSTARTS(?isil, "DK-")) } ``` ### Query 6: Get provenance for all institutions ```sparql PREFIX prov: PREFIX dcterms: PREFIX schema: SELECT ?institution ?name ?source WHERE { ?institution schema:name ?name . ?institution prov:wasGeneratedBy ?activity . ?activity dcterms:source ?source . } LIMIT 100 ``` --- ## Usage Examples ### Loading RDF with Python (rdflib) ```python from rdflib import Graph # Load Turtle format g = Graph() g.parse("denmark_complete.ttl", format="turtle") print(f"Loaded {len(g)} triples") # Query with SPARQL qres = g.query(""" PREFIX schema: SELECT ?name WHERE { ?inst a schema:Library . ?inst schema:name ?name . } LIMIT 10 """) for row in qres: print(row.name) ``` ### Loading RDF with Apache Jena (Java) ```java import org.apache.jena.rdf.model.*; import org.apache.jena.query.*; // Load RDF/XML format Model model = ModelFactory.createDefaultModel(); model.read("denmark_complete.rdf"); // Query with SPARQL String queryString = """ PREFIX schema: SELECT ?name WHERE { ?inst a schema:Library . ?inst schema:name ?name . } LIMIT 10 """; Query query = QueryFactory.create(queryString); QueryExecution qexec = QueryExecutionFactory.create(query, model); ResultSet results = qexec.execSelect(); ResultSetFormatter.out(System.out, results, query); ``` ### Loading JSON-LD with JavaScript ```javascript const jsonld = require('jsonld'); const fs = require('fs'); // Load JSON-LD const doc = JSON.parse(fs.readFileSync('denmark_complete.jsonld', 'utf8')); // Expand to N-Quads jsonld.toRDF(doc, {format: 'application/n-quads'}).then(nquads => { console.log(`Loaded ${nquads.split('\n').length} triples`); }); ``` --- ## Setting Up a SPARQL Endpoint ### Option 1: Apache Jena Fuseki (Open Source) ```bash # Download Jena Fuseki wget https://dlcdn.apache.org/jena/binaries/apache-jena-fuseki-4.10.0.tar.gz tar xzf apache-jena-fuseki-4.10.0.tar.gz cd apache-jena-fuseki-4.10.0 # Start server ./fuseki-server --update --mem /denmark # Load data curl -X POST http://localhost:3030/denmark/data \ --data-binary @denmark_complete.ttl \ -H "Content-Type: text/turtle" # Query endpoint curl -X POST http://localhost:3030/denmark/query \ --data-urlencode "query=SELECT * WHERE { ?s ?p ?o } LIMIT 10" ``` ### Option 2: GraphDB (Free Edition) 1. Download GraphDB Free from https://www.ontotext.com/products/graphdb/download/ 2. Install and start GraphDB 3. Create new repository "denmark" 4. Import `denmark_complete.ttl` via web UI 5. Query via SPARQL interface at http://localhost:7200/sparql --- ## W3ID Persistent Identifiers All institutions have persistent URIs following the pattern: ``` https://w3id.org/heritage/custodian/dk/{isil-or-id} ``` **Examples**: - Royal Library: `https://w3id.org/heritage/custodian/dk/190101` - Copenhagen Libraries: `https://w3id.org/heritage/custodian/dk/710100` - Danish National Archives: `https://w3id.org/heritage/custodian/dk/archive/rigsarkivet` **Content Negotiation** (when w3id.org registration complete): ```bash # Get HTML representation curl https://w3id.org/heritage/custodian/dk/710100 # Get Turtle RDF curl -H "Accept: text/turtle" https://w3id.org/heritage/custodian/dk/710100 # Get JSON-LD curl -H "Accept: application/ld+json" https://w3id.org/heritage/custodian/dk/710100 ``` --- ## Data Quality & Provenance All RDF exports include **complete provenance metadata** using PROV-O: ```turtle prov:wasGeneratedBy [ a prov:Activity ; dcterms:source "ISIL_REGISTRY" ; prov:startedAtTime "2025-11-19T10:00:00Z"^^xsd:dateTime ; prov:endedAtTime "2025-11-19T10:30:00Z"^^xsd:dateTime ] . ``` **Data Tier Classification** (see `AGENTS.md`): - **TIER_1_AUTHORITATIVE**: Official registries (ISIL, national library databases) - **TIER_2_VERIFIED**: Verified web scraping (Arkiv.dk) - **TIER_3_CROWD_SOURCED**: Wikidata, OpenStreetMap - **TIER_4_INFERRED**: NLP-extracted from conversations **Denmark Dataset**: - Main libraries (555): TIER_1 (ISIL registry) - Archives (594): TIER_2 (Arkiv.dk verified scraping) - Wikidata links (769): TIER_3 (crowd-sourced) --- ## Validation All RDF files have been validated using: ### Syntax Validation ```bash # Turtle syntax check rapper -i turtle -o ntriples denmark_complete.ttl > /dev/null # RDF/XML syntax check rapper -i rdfxml -o ntriples denmark_complete.rdf > /dev/null # JSON-LD context validation jsonld validate denmark_complete.jsonld ``` ### Semantic Validation - ✅ All URIs resolve to w3id.org namespace (when registration complete) - ✅ owl:sameAs links point to valid Wikidata entities - ✅ Hierarchical relationships use standard ORG vocabulary - ✅ ISIL codes link to isil.org registry - ✅ GHCID identifiers follow project specification --- ## Citation If you use this dataset in research, please cite: ```bibtex @dataset{danish_glam_rdf_2025, author = {GLAM Extractor Project}, title = {Danish Heritage Institutions Linked Open Data}, year = {2025}, month = {November}, version = {1.0}, url = {https://github.com/yourusername/glam-extractor}, note = {2,348 institutions (555 libraries, 594 archives, 1,199 branches), 43,429 RDF triples} } ``` --- ## Related Documentation - **Project README**: `/README.md` - **LinkML Schema**: `/schemas/heritage_custodian.yaml` - **Persistent Identifiers**: `/docs/PERSISTENT_IDENTIFIERS.md` - **Ontology Extensions**: `/docs/ONTOLOGY_EXTENSIONS.md` - **Denmark Session Summary**: `/SESSION_SUMMARY_20251119_RDF_WIKIDATA_COMPLETE.md` --- ## Contributing To add new country datasets or improve existing RDF exports: 1. Follow ontology alignment guidelines in `/docs/ONTOLOGY_EXTENSIONS.md` 2. Use RDF exporter template: `/scripts/export_denmark_rdf.py` 3. Validate with SPARQL queries before publishing 4. Update this README with new dataset statistics --- ## License This data is published under **CC0 1.0 Universal (Public Domain)**. You may use, modify, and distribute it freely without restrictions. Individual institution data may be subject to different licenses from source registries. Consult: - Danish ISIL Registry: https://slks.dk/isil - Arkiv.dk: https://arkiv.dk - Wikidata: CC0 (https://www.wikidata.org/wiki/Wikidata:Data_access#Licensing) --- **Last Updated**: 2025-11-19 **Maintainer**: GLAM Extractor Project **Contact**: [GitHub Issues](https://github.com/yourusername/glam-extractor/issues)