# Heritage Institution RDF Exports
This directory contains **Linked Open Data** exports of heritage institution datasets in W3C-compliant RDF formats.
## Available Datasets
### Denmark 🇩🇰 - COMPLETE (November 2025)
**Dataset**: `denmark_complete.*`
**Status**: ✅ Production-ready
**Last Updated**: 2025-11-19
| Format | File | Size | Use Case |
|--------|------|------|----------|
| **Turtle** | `denmark_complete.ttl` | 2.27 MB | Human-readable, SPARQL queries |
| **RDF/XML** | `denmark_complete.rdf` | 3.96 MB | Machine processing, legacy systems |
| **JSON-LD** | `denmark_complete.jsonld` | 5.16 MB | Web APIs, JavaScript applications |
| **N-Triples** | `denmark_complete.nt` | 6.24 MB | Line-oriented processing, MapReduce |
#### Statistics
- **Institutions**: 2,348 (555 libraries, 594 archives, 1,199 branches)
- **RDF Triples**: 43,429
- **Ontologies Used**: 9 (CPOV, Schema.org, RICO, ORG, PROV-O, SKOS, Dublin Core, OWL, Heritage)
- **Wikidata Links**: 769 institutions (32.8%)
- **ISIL Codes**: 555 institutions (23.6%)
- **GHCID Identifiers**: 998 institutions (42.5%)
#### Coverage by Institution Type
| Type | Count | ISIL | GHCID | Wikidata |
|------|-------|------|-------|----------|
| **Main Libraries** | 555 | 100% | 78% | High |
| **Archives** | 594 | 0% (by design) | 95% | Moderate |
| **Library Branches** | 1,199 | Inherited | 0% (by design) | Low |
---
## Ontology Alignment
All RDF exports follow these international standards:
### Core Ontologies
1. **CPOV** (Core Public Organisation Vocabulary)
- Namespace: `http://data.europa.eu/m8g/`
- Usage: Public sector organization type
- Spec: https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/core-public-organisation-vocabulary
2. **Schema.org**
- Namespace: `http://schema.org/`
- Usage: Names, addresses, descriptions, types
- Types: `schema:Library`, `schema:ArchiveOrganization`, `schema:Museum`
- Spec: https://schema.org/
3. **SKOS** (Simple Knowledge Organization System)
- Namespace: `http://www.w3.org/2004/02/skos/core#`
- Usage: Preferred/alternative labels
- Spec: https://www.w3.org/TR/skos-reference/
### Specialized Ontologies
4. **RICO** (Records in Contexts Ontology)
- Namespace: `https://www.ica.org/standards/RiC/ontology#`
- Usage: Archival description (for archives)
- Spec: https://www.ica.org/standards/RiC/ontology
5. **ORG** (W3C Organization Ontology)
- Namespace: `http://www.w3.org/ns/org#`
- Usage: Hierarchical relationships (library branches → main libraries)
- Spec: https://www.w3.org/TR/vocab-org/
6. **PROV-O** (Provenance Ontology)
- Namespace: `http://www.w3.org/ns/prov#`
- Usage: Data provenance tracking
- Spec: https://www.w3.org/TR/prov-o/
### Linking Ontologies
7. **OWL** (Web Ontology Language)
- Namespace: `http://www.w3.org/2002/07/owl#`
- Usage: Semantic equivalence (`owl:sameAs` for Wikidata links)
- Spec: https://www.w3.org/TR/owl2-primer/
8. **Dublin Core Terms**
- Namespace: `http://purl.org/dc/terms/`
- Usage: Identifiers, descriptions, metadata
- Spec: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
9. **Heritage (Project-Specific)**
- Namespace: `https://w3id.org/heritage/custodian/`
- Usage: GHCID identifiers, UUID properties
- Spec: See `/docs/PERSISTENT_IDENTIFIERS.md`
---
## SPARQL Query Examples
### Query 1: Find all libraries in a specific city
```sparql
PREFIX schema:
PREFIX cpov:
SELECT ?library ?name ?address WHERE {
?library a cpov:PublicOrganisation, schema:Library .
?library schema:name ?name .
?library schema:address ?addrNode .
?addrNode schema:addressLocality "København K" .
?addrNode schema:streetAddress ?address .
}
```
### Query 2: Find all institutions with Wikidata links
```sparql
PREFIX owl:
PREFIX schema:
SELECT ?institution ?name ?wikidataID WHERE {
?institution schema:name ?name .
?institution owl:sameAs ?wikidataURI .
FILTER(STRSTARTS(STR(?wikidataURI), "http://www.wikidata.org/entity/Q"))
BIND(STRAFTER(STR(?wikidataURI), "http://www.wikidata.org/entity/") AS ?wikidataID)
}
```
### Query 3: Find library hierarchies (parent-child branches)
```sparql
PREFIX org:
PREFIX schema:
SELECT ?parent ?parentName ?child ?childName WHERE {
?child org:subOrganizationOf ?parent .
?parent schema:name ?parentName .
?child schema:name ?childName .
}
LIMIT 100
```
### Query 4: Count institutions by type
```sparql
PREFIX schema:
SELECT ?type (COUNT(?inst) AS ?count) WHERE {
?inst a ?type .
FILTER(?type IN (schema:Library, schema:ArchiveOrganization, schema:Museum))
}
GROUP BY ?type
```
### Query 5: Find archives with specific ISIL codes
```sparql
PREFIX dcterms:
PREFIX schema:
SELECT ?archive ?name ?isil WHERE {
?archive a schema:ArchiveOrganization .
?archive schema:name ?name .
?archive dcterms:identifier ?isil .
FILTER(STRSTARTS(?isil, "DK-"))
}
```
### Query 6: Get provenance for all institutions
```sparql
PREFIX prov:
PREFIX dcterms:
PREFIX schema:
SELECT ?institution ?name ?source WHERE {
?institution schema:name ?name .
?institution prov:wasGeneratedBy ?activity .
?activity dcterms:source ?source .
}
LIMIT 100
```
---
## Usage Examples
### Loading RDF with Python (rdflib)
```python
from rdflib import Graph
# Load Turtle format
g = Graph()
g.parse("denmark_complete.ttl", format="turtle")
print(f"Loaded {len(g)} triples")
# Query with SPARQL
qres = g.query("""
PREFIX schema:
SELECT ?name WHERE {
?inst a schema:Library .
?inst schema:name ?name .
}
LIMIT 10
""")
for row in qres:
print(row.name)
```
### Loading RDF with Apache Jena (Java)
```java
import org.apache.jena.rdf.model.*;
import org.apache.jena.query.*;
// Load RDF/XML format
Model model = ModelFactory.createDefaultModel();
model.read("denmark_complete.rdf");
// Query with SPARQL
String queryString = """
PREFIX schema:
SELECT ?name WHERE {
?inst a schema:Library .
?inst schema:name ?name .
}
LIMIT 10
""";
Query query = QueryFactory.create(queryString);
QueryExecution qexec = QueryExecutionFactory.create(query, model);
ResultSet results = qexec.execSelect();
ResultSetFormatter.out(System.out, results, query);
```
### Loading JSON-LD with JavaScript
```javascript
const jsonld = require('jsonld');
const fs = require('fs');
// Load JSON-LD
const doc = JSON.parse(fs.readFileSync('denmark_complete.jsonld', 'utf8'));
// Expand to N-Quads
jsonld.toRDF(doc, {format: 'application/n-quads'}).then(nquads => {
console.log(`Loaded ${nquads.split('\n').length} triples`);
});
```
---
## Setting Up a SPARQL Endpoint
### Option 1: Apache Jena Fuseki (Open Source)
```bash
# Download Jena Fuseki
wget https://dlcdn.apache.org/jena/binaries/apache-jena-fuseki-4.10.0.tar.gz
tar xzf apache-jena-fuseki-4.10.0.tar.gz
cd apache-jena-fuseki-4.10.0
# Start server
./fuseki-server --update --mem /denmark
# Load data
curl -X POST http://localhost:3030/denmark/data \
--data-binary @denmark_complete.ttl \
-H "Content-Type: text/turtle"
# Query endpoint
curl -X POST http://localhost:3030/denmark/query \
--data-urlencode "query=SELECT * WHERE { ?s ?p ?o } LIMIT 10"
```
### Option 2: GraphDB (Free Edition)
1. Download GraphDB Free from https://www.ontotext.com/products/graphdb/download/
2. Install and start GraphDB
3. Create new repository "denmark"
4. Import `denmark_complete.ttl` via web UI
5. Query via SPARQL interface at http://localhost:7200/sparql
---
## W3ID Persistent Identifiers
All institutions have persistent URIs following the pattern:
```
https://w3id.org/heritage/custodian/dk/{isil-or-id}
```
**Examples**:
- Royal Library: `https://w3id.org/heritage/custodian/dk/190101`
- Copenhagen Libraries: `https://w3id.org/heritage/custodian/dk/710100`
- Danish National Archives: `https://w3id.org/heritage/custodian/dk/archive/rigsarkivet`
**Content Negotiation** (when w3id.org registration complete):
```bash
# Get HTML representation
curl https://w3id.org/heritage/custodian/dk/710100
# Get Turtle RDF
curl -H "Accept: text/turtle" https://w3id.org/heritage/custodian/dk/710100
# Get JSON-LD
curl -H "Accept: application/ld+json" https://w3id.org/heritage/custodian/dk/710100
```
---
## Data Quality & Provenance
All RDF exports include **complete provenance metadata** using PROV-O:
```turtle
prov:wasGeneratedBy [
a prov:Activity ;
dcterms:source "ISIL_REGISTRY" ;
prov:startedAtTime "2025-11-19T10:00:00Z"^^xsd:dateTime ;
prov:endedAtTime "2025-11-19T10:30:00Z"^^xsd:dateTime
] .
```
**Data Tier Classification** (see `AGENTS.md`):
- **TIER_1_AUTHORITATIVE**: Official registries (ISIL, national library databases)
- **TIER_2_VERIFIED**: Verified web scraping (Arkiv.dk)
- **TIER_3_CROWD_SOURCED**: Wikidata, OpenStreetMap
- **TIER_4_INFERRED**: NLP-extracted from conversations
**Denmark Dataset**:
- Main libraries (555): TIER_1 (ISIL registry)
- Archives (594): TIER_2 (Arkiv.dk verified scraping)
- Wikidata links (769): TIER_3 (crowd-sourced)
---
## Validation
All RDF files have been validated using:
### Syntax Validation
```bash
# Turtle syntax check
rapper -i turtle -o ntriples denmark_complete.ttl > /dev/null
# RDF/XML syntax check
rapper -i rdfxml -o ntriples denmark_complete.rdf > /dev/null
# JSON-LD context validation
jsonld validate denmark_complete.jsonld
```
### Semantic Validation
- ✅ All URIs resolve to w3id.org namespace (when registration complete)
- ✅ owl:sameAs links point to valid Wikidata entities
- ✅ Hierarchical relationships use standard ORG vocabulary
- ✅ ISIL codes link to isil.org registry
- ✅ GHCID identifiers follow project specification
---
## Citation
If you use this dataset in research, please cite:
```bibtex
@dataset{danish_glam_rdf_2025,
author = {GLAM Extractor Project},
title = {Danish Heritage Institutions Linked Open Data},
year = {2025},
month = {November},
version = {1.0},
url = {https://github.com/yourusername/glam-extractor},
note = {2,348 institutions (555 libraries, 594 archives, 1,199 branches), 43,429 RDF triples}
}
```
---
## Related Documentation
- **Project README**: `/README.md`
- **LinkML Schema**: `/schemas/heritage_custodian.yaml`
- **Persistent Identifiers**: `/docs/PERSISTENT_IDENTIFIERS.md`
- **Ontology Extensions**: `/docs/ONTOLOGY_EXTENSIONS.md`
- **Denmark Session Summary**: `/SESSION_SUMMARY_20251119_RDF_WIKIDATA_COMPLETE.md`
---
## Contributing
To add new country datasets or improve existing RDF exports:
1. Follow ontology alignment guidelines in `/docs/ONTOLOGY_EXTENSIONS.md`
2. Use RDF exporter template: `/scripts/export_denmark_rdf.py`
3. Validate with SPARQL queries before publishing
4. Update this README with new dataset statistics
---
## License
This data is published under **CC0 1.0 Universal (Public Domain)**. You may use, modify, and distribute it freely without restrictions.
Individual institution data may be subject to different licenses from source registries. Consult:
- Danish ISIL Registry: https://slks.dk/isil
- Arkiv.dk: https://arkiv.dk
- Wikidata: CC0 (https://www.wikidata.org/wiki/Wikidata:Data_access#Licensing)
---
**Last Updated**: 2025-11-19
**Maintainer**: GLAM Extractor Project
**Contact**: [GitHub Issues](https://github.com/yourusername/glam-extractor/issues)