757 lines
28 KiB
Markdown
757 lines
28 KiB
Markdown
# Publications Dataset
|
|
|
|
This directory contains bibliographic metadata for academic publications in LinkML format, demonstrating the project's bibliographic schema (`schemas/bibliographic.yaml`).
|
|
|
|
## Overview
|
|
|
|
**Purpose**: Store structured metadata about academic publications, including journal articles, conference papers, books, and their citation relationships.
|
|
|
|
**Schema**: `/schemas/bibliographic.yaml` (based on FaBiO, CiTO, BIBO, FRBR ontologies)
|
|
|
|
**Current Dataset Size**:
|
|
- **20 publications** (10 journal articles, 2 conference papers, 1 data paper, 2 books, 2 book chapters, 2 technical reports, 4 preprints)
|
|
- **27 citation relationships** (cross-references between publications)
|
|
- **60+ unique authors** with institutional affiliations (universities, heritage institutions)
|
|
- **7 journals** (referenced from `/data/instances/journals/`)
|
|
- **5 conferences** (referenced from `/data/instances/conferences/`)
|
|
- **5 heritage institutions** linked as author affiliations
|
|
|
|
**Publication Type Distribution**:
|
|
|
|
| Publication Type | Count | Examples |
|
|
|------------------|-------|----------|
|
|
| Journal Articles | 10 | 5 semantic web (Knowledge Graphs, Wikidata, LOKG, etc.) + 5 heritage-linked (Rembrandt analysis, NHA digital, etc.) |
|
|
| Conference Papers | 2 | ISWC 2024 (Best Paper), ISWC 2023 (Best Paper) |
|
|
| Books | 2 | Linked Data for Museums, Digital Preservation Handbook |
|
|
| Book Chapters | 2 | Crowdsourcing Metadata, Archival Appraisal |
|
|
| Technical Reports | 2 | KB 3D Digitization, Europeana QA Framework |
|
|
| Preprints | 4 | arXiv (GNN provenance), OSF/SocArXiv (LLM cataloging), bioRxiv (Ancient DNA), arXiv (2nd paper TBD) |
|
|
| Data Papers | 1 | Brazilian LOKG Subset (TGDK journal) |
|
|
| **TOTAL** | **20** | Diverse representation of scholarly output types |
|
|
|
|
**Citation Network Statistics**:
|
|
- **Total citations**: 27 relationships
|
|
- **Publications with citations**: 19 (1 bioRxiv paper unlinked - outside scope)
|
|
- **Citation density**: 1.42 citations per publication
|
|
- **Most cited works**:
|
|
1. Knowledge Graphs (2021) - 8 citations
|
|
2. Wikidata (2018) - 6 citations
|
|
3. LOKG (2024) - 5 citations
|
|
- **Citation types used**: 6 distinct CiTO types (CITES_AS_AUTHORITY, CITES_AS_EVIDENCE, DISCUSSES, EXTENDS, USES_DATA_FROM, CITES_AS_METADATA)
|
|
|
|
## Files
|
|
|
|
### 1. `semantic_web_papers.yaml` (379 lines) ✅
|
|
**Notable semantic web publications demonstrating schema patterns**
|
|
|
|
**Publications included**:
|
|
|
|
| Title | Type | Journal/Conf | Year | Authors | DOI |
|
|
|-------|------|-------------|------|---------|-----|
|
|
| Knowledge Graphs | Journal Article | Semantic Web Journal | 2021 | 18 authors | 10.3233/SW-222793 |
|
|
| Wikidata: A Free Collaborative Knowledgebase | Journal Article | Journal of Web Semantics | 2018 | 2 authors | 10.1016/j.websem.2018.08.002 |
|
|
| The LOKG | Journal Article | TGDK | 2024 | 4 authors (synthetic) | 10.4230/TGDK.2.1.3 |
|
|
| Relationships are Complicated! | Conference Paper | ISWC 2024 (Best Paper) | 2024 | 3 authors | - |
|
|
| Spatial Link Prediction | Conference Paper | ISWC 2023 (Best Paper) | 2023 | 4 authors | 10.1007/978-3-031-47240-4_9 |
|
|
|
|
**Schema patterns demonstrated**:
|
|
- Multi-author publications (up to 18 authors)
|
|
- ORCID identifiers for authors
|
|
- Institutional affiliations (universities, research institutes, corporations)
|
|
- DOI identifiers
|
|
- Journal article metadata (volume, issue, page range)
|
|
- Conference paper metadata (proceedings, best paper awards)
|
|
- Open access status tracking
|
|
- Abstract text
|
|
|
|
### 2. `citation_relationships.yaml` (174 lines) ✅
|
|
**Citation relationships between publications using CiTO (Citation Typing Ontology)**
|
|
|
|
**Citation patterns included**:
|
|
- **27 citation relationships** linking 19 publications (5 semantic web + 5 heritage-linked + 9 diverse publications)
|
|
- **Citation types**: CITES_AS_AUTHORITY, CITES_AS_EVIDENCE, DISCUSSES, EXTENDS, CITES_AS_METADATA, USES_DATA_FROM
|
|
- **Citation context**: Textual excerpts showing how works cite each other
|
|
- **Citation intent**: Purpose and reasoning for citations
|
|
- **Page numbers**: Specific location of citations in citing work
|
|
- **Citation density**: 1.42 citations per publication (27 citations / 19 linked publications)
|
|
|
|
**Citation network**:
|
|
```
|
|
Semantic Web Publications:
|
|
Knowledge Graphs (2021) ──cites──> Wikidata (2018)
|
|
└─self-cites (section reference)
|
|
[Most cited: 8 citations total]
|
|
|
|
LOKG (2024) ──cites──> Knowledge Graphs (2021)
|
|
──cites──> Wikidata (2018)
|
|
└─cites──> Spatial Link Prediction (2023)
|
|
[Second most cited: 5 citations total]
|
|
|
|
ISWC 2024 Paper ──cites──> Knowledge Graphs (2021)
|
|
└─extends──> ISWC 2023 Paper
|
|
|
|
Heritage-Linked Publications:
|
|
Brazilian LOKG Subset (2024) ──extends──> LOKG (2024)
|
|
|
|
Dutch GLAM Consortium (2023) ──cites──> Knowledge Graphs (2021)
|
|
└─cites──> Wikidata (2018)
|
|
|
|
Rembrandt Analysis (2024) ──uses_data──> Wikidata (2018)
|
|
|
|
NHA Digital Transformation (2023) ──discusses──> LOKG (2024)
|
|
|
|
Collection Management Systems (2024) ──cites──> Wikidata (2018)
|
|
└─cites──> Knowledge Graphs (2021)
|
|
|
|
Diverse Publications (Books, Reports, Chapters, Preprints):
|
|
Linked Data for Museums (Book) ──cites──> Knowledge Graphs (2021)
|
|
└─cites──> Wikidata (2018)
|
|
|
|
Digital Preservation Handbook ──cites──> LOKG (2024)
|
|
|
|
KB 3D Digitization Report ──discusses──> LOKG (2024)
|
|
|
|
Europeana QA Framework ──cites──> Knowledge Graphs (2021)
|
|
└─cites──> LOKG (2024)
|
|
|
|
Crowdsourcing Metadata Chapter ──cites──> Wikidata (2018)
|
|
|
|
Archival Appraisal Chapter ──discusses──> Knowledge Graphs (2021)
|
|
|
|
arXiv GNN Provenance ──cites──> Knowledge Graphs (2021)
|
|
└─uses_data──> Wikidata (2018)
|
|
|
|
OSF LLM Cataloging ──discusses──> Knowledge Graphs (2021)
|
|
|
|
bioRxiv Ancient DNA (unlinked - genomics focus, not heritage knowledge graphs)
|
|
```
|
|
|
|
**Most Cited Publications**:
|
|
1. Knowledge Graphs (2021) - 8 citations
|
|
2. Wikidata (2018) - 6 citations
|
|
3. LOKG (2024) - 5 citations
|
|
|
|
### 3. `heritage_linked_publications.yaml` (206 lines) ✅
|
|
**Publications with authors affiliated at heritage institutions**
|
|
|
|
**Demonstrates heritage-bibliographic integration patterns**:
|
|
|
|
| Title | Type | Authors | Heritage Institution | Year |
|
|
|-------|------|---------|---------------------|------|
|
|
| Digital Analysis of Rembrandt's Brushwork | Journal | Rijksmuseum researcher + UvA | Rijksmuseum | 2024 |
|
|
| Democratizing Access: NHA Digital Transformation | Journal | 2 Noord-Hollands Archief archivists | Noord-Hollands Archief | 2023 |
|
|
| Brazilian Cultural Heritage in the LOKG | Data Paper | USP researcher + BNB librarian | Biblioteca Nacional do Brasil | 2024 |
|
|
| The Dutch GLAM Consortium | Conference | KB director + Rijksmuseum curator + NA archivist | KB, Rijksmuseum, Nationaal Archief | 2023 |
|
|
| Comparative Analysis of Collection Management Systems | Journal | Paris-Sorbonne + KB librarian | Koninklijke Bibliotheek | 2024 |
|
|
|
|
**Integration patterns**:
|
|
- **Pattern 1**: Researcher at heritage institution as sole author
|
|
- **Pattern 2**: Multiple staff from same heritage institution as co-authors
|
|
- **Pattern 3**: Heritage institution staff collaborating with university researcher
|
|
- **Pattern 4**: Multi-institutional consortium (3+ heritage institutions)
|
|
- **Pattern 5**: International collaboration (foreign researcher + local heritage institution)
|
|
|
|
## Schema Reference
|
|
|
|
### Publication Class
|
|
|
|
**Required fields**:
|
|
```yaml
|
|
publication_id: https://w3id.org/heritage/publication/[unique-id]
|
|
title: "Publication Title" # NOT publication_title!
|
|
publication_type: JOURNAL_ARTICLE # Enum: JOURNAL_ARTICLE, CONFERENCE_PAPER, BOOK, etc.
|
|
```
|
|
|
|
**Key fields**:
|
|
```yaml
|
|
authors: # List of Person objects
|
|
- person_id: https://orcid.org/0000-0002-XXXX-XXXX # ORCID preferred
|
|
person_name: "Author Name"
|
|
orcid: "0000-0002-XXXX-XXXX" # Separate field from person_id
|
|
affiliation: # SINGULAR Organization object (NOT affiliations array!)
|
|
organization_name: "University Name"
|
|
organization_type: "University"
|
|
|
|
published_in: https://w3id.org/heritage/journal/[journal-id] # String ID reference, NOT nested object!
|
|
|
|
volume: "12"
|
|
issue: "3"
|
|
page_range: "1-94" # NOT 'pages'!
|
|
|
|
doi: "10.1234/example.doi" # Separate field (NOT in identifiers array)
|
|
url: "https://..." # Separate field
|
|
|
|
abstract: "Full abstract text..."
|
|
|
|
provenance: # NO 'notes' field! Use 'description' in parent object instead
|
|
data_source: CONVERSATION_NLP
|
|
data_tier: TIER_2_VERIFIED
|
|
extraction_date: "2025-11-09T21:00:00Z"
|
|
```
|
|
|
|
### Citation Class
|
|
|
|
**Required fields**:
|
|
```yaml
|
|
citation_id: https://w3id.org/heritage/citation/[unique-id]
|
|
citing_work: https://w3id.org/heritage/publication/[citing-pub-id] # Required
|
|
cited_work: https://w3id.org/heritage/publication/[cited-pub-id] # Required
|
|
citation_type: CITES_AS_AUTHORITY # Required enum
|
|
```
|
|
|
|
**Optional enrichment fields**:
|
|
```yaml
|
|
citation_intent: "Purpose/reasoning for this citation..."
|
|
citation_context: "Textual excerpt showing the citation..."
|
|
page_number: "23" # Page where citation appears
|
|
```
|
|
|
|
### Citation Types (CiTO Ontology)
|
|
|
|
| Type | Description | Example Use |
|
|
|------|-------------|-------------|
|
|
| `CITES` | Generic citation | Standard reference |
|
|
| `CITES_AS_AUTHORITY` | Cites as authoritative source | Citing foundational theory |
|
|
| `CITES_AS_EVIDENCE` | Cites as evidence | Supporting empirical claims |
|
|
| `CITES_AS_METADATA` | Cites for metadata/provenance | Dataset documentation |
|
|
| `DISCUSSES` | Discusses the cited work | Critical analysis |
|
|
| `EXTENDS` | Extends the cited work | Building on prior work |
|
|
| `SUPPORTS` | Provides support for claims | Corroborating findings |
|
|
| `REFUTES` | Refutes or disputes | Contradicting claims |
|
|
| `CRITIQUES` | Critiques cited work | Identifying limitations |
|
|
| `AGREES_WITH` | Agrees with cited work | Confirming findings |
|
|
|
|
## Schema Quirks and Common Errors
|
|
|
|
### ❌ Common Mistakes
|
|
|
|
**1. Wrong field names**:
|
|
```yaml
|
|
# WRONG
|
|
publication_title: "Title" # Field doesn't exist!
|
|
pages: "1-94" # Should be 'page_range'
|
|
affiliations: [...] # Should be singular 'affiliation'
|
|
|
|
# CORRECT
|
|
title: "Title"
|
|
page_range: "1-94"
|
|
affiliation: {...}
|
|
```
|
|
|
|
**2. Wrong `published_in` structure**:
|
|
```yaml
|
|
# WRONG - Nested object
|
|
published_in:
|
|
journal_id: https://...
|
|
journal_title: "Journal Name"
|
|
volume: "12"
|
|
|
|
# CORRECT - String ID reference
|
|
published_in: https://w3id.org/heritage/journal/semantic-web
|
|
volume: "12" # Volume at Publication level, not nested
|
|
```
|
|
|
|
**3. Wrong identifier handling**:
|
|
```yaml
|
|
# WRONG - DOI in identifiers array
|
|
identifiers:
|
|
- identifier_scheme: DOI
|
|
identifier_value: "10.1234/..."
|
|
|
|
# CORRECT - DOI as separate field
|
|
doi: "10.1234/..."
|
|
```
|
|
|
|
**4. Provenance notes**:
|
|
```yaml
|
|
# WRONG - Provenance has no 'notes' field
|
|
provenance:
|
|
data_source: CONVERSATION_NLP
|
|
notes: "Some observation" # This will fail validation!
|
|
|
|
# CORRECT - Use 'description' at Publication level
|
|
description: "Notes and remarks about this publication"
|
|
provenance:
|
|
data_source: CONVERSATION_NLP
|
|
```
|
|
|
|
### ✅ Schema Validation Checklist
|
|
|
|
Before committing new publications:
|
|
|
|
- [ ] `title` field (NOT `publication_title`)
|
|
- [ ] `published_in` is a string ID (NOT nested object)
|
|
- [ ] `affiliation` is singular object (NOT `affiliations` array)
|
|
- [ ] `page_range` (NOT `pages`)
|
|
- [ ] `doi` and `url` are separate fields (NOT in `identifiers`)
|
|
- [ ] `provenance` has no `notes` field
|
|
- [ ] All `publication_id`, `person_id`, `journal_id` use valid URIs
|
|
- [ ] `publication_type` is valid enum value
|
|
- [ ] Authors have either ORCID or local ID
|
|
- [ ] File validates with: `linkml-validate -s schemas/bibliographic.yaml -C Publication <file.yaml>`
|
|
|
|
## Validation Commands
|
|
|
|
### Validate Publications
|
|
```bash
|
|
cd /Users/kempersc/apps/glam
|
|
linkml-validate -s schemas/bibliographic.yaml -C Publication \
|
|
data/instances/publications/semantic_web_papers.yaml
|
|
```
|
|
|
|
### Validate Citations
|
|
```bash
|
|
linkml-validate -s schemas/bibliographic.yaml -C Citation \
|
|
data/instances/publications/citation_relationships.yaml
|
|
```
|
|
|
|
### Validate Journals
|
|
```bash
|
|
linkml-validate -s schemas/bibliographic.yaml -C Journal \
|
|
data/instances/journals/semantic_web_journals.yaml
|
|
```
|
|
|
|
### Validate Conferences
|
|
```bash
|
|
linkml-validate -s schemas/bibliographic.yaml -C Conference \
|
|
data/instances/conferences/semantic_web_conferences.yaml
|
|
```
|
|
|
|
## Adding New Publications
|
|
|
|
### Step 1: Gather Metadata
|
|
|
|
**Required information**:
|
|
- Title, authors, publication date
|
|
- Publication type (journal article, conference paper, etc.)
|
|
- Journal or conference (must reference existing entity in `journals/` or `conferences/`)
|
|
- DOI (if available)
|
|
|
|
**Recommended information**:
|
|
- Author ORCID identifiers
|
|
- Author institutional affiliations
|
|
- Abstract text
|
|
- Volume, issue, page numbers
|
|
- URL to full text
|
|
- Open access status
|
|
|
|
### Step 2: Create Publication Record
|
|
|
|
Follow the schema patterns in `semantic_web_papers.yaml`:
|
|
|
|
```yaml
|
|
- publication_id: https://w3id.org/heritage/publication/[unique-id]
|
|
title: "Your Publication Title"
|
|
publication_type: JOURNAL_ARTICLE # or CONFERENCE_PAPER, BOOK, etc.
|
|
publication_date: "2024-11-09"
|
|
|
|
authors:
|
|
- person_id: https://orcid.org/0000-0002-XXXX-XXXX
|
|
person_name: "First Author"
|
|
orcid: "0000-0002-XXXX-XXXX"
|
|
affiliation:
|
|
organization_name: "University Name"
|
|
organization_type: "University"
|
|
|
|
published_in: https://w3id.org/heritage/journal/[journal-id]
|
|
|
|
volume: "15"
|
|
issue: "2"
|
|
page_range: "123-145"
|
|
|
|
doi: "10.1234/example.doi"
|
|
url: "https://..."
|
|
|
|
abstract: "Full abstract text..."
|
|
|
|
provenance:
|
|
data_source: MANUAL_CURATION # or CONVERSATION_NLP, WEB_SCRAPING, etc.
|
|
data_tier: TIER_2_VERIFIED
|
|
extraction_date: "2024-11-09T12:00:00Z"
|
|
extraction_method: "Manual entry from published source"
|
|
```
|
|
|
|
### Step 3: Create Citation Relationships (Optional)
|
|
|
|
If the new publication cites existing publications (or vice versa):
|
|
|
|
```yaml
|
|
- citation_id: https://w3id.org/heritage/citation/[unique-id]
|
|
citing_work: https://w3id.org/heritage/publication/[new-pub-id]
|
|
cited_work: https://w3id.org/heritage/publication/[existing-pub-id]
|
|
citation_type: CITES_AS_AUTHORITY # Choose appropriate type
|
|
citation_intent: "Why this citation exists..."
|
|
citation_context: "Textual excerpt around the citation..."
|
|
page_number: "15"
|
|
```
|
|
|
|
### Step 4: Validate
|
|
|
|
Run validation before committing:
|
|
|
|
```bash
|
|
linkml-validate -s schemas/bibliographic.yaml -C Publication \
|
|
data/instances/publications/your_file.yaml
|
|
```
|
|
|
|
Fix any validation errors (see "Schema Quirks" section above).
|
|
|
|
### Step 5: Update This README
|
|
|
|
Add your publication to the table in the "Files" section.
|
|
|
|
## Integration with Heritage Custodians
|
|
|
|
Publications link to heritage institutions through **5 integration patterns**, all demonstrated in `heritage_linked_publications.yaml`:
|
|
|
|
### Pattern 1: Heritage Institution Researcher as Primary Author ✅
|
|
|
|
**Use case**: Museum curator or archivist publishes research based on institutional collections
|
|
|
|
**Example**: Rijksmuseum researcher analyzing Rembrandt paintings
|
|
|
|
```yaml
|
|
authors:
|
|
- person_id: researcher-rijks-001
|
|
person_name: "Dr. Maria van der Berg"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/custodian/nl/rijksmuseum # ← Heritage institution!
|
|
organization_name: "Rijksmuseum"
|
|
organization_type: "Museum"
|
|
```
|
|
|
|
**Real example**: `rijksmuseum-rembrandt-2024` (Rijksmuseum:125-126)
|
|
|
|
---
|
|
|
|
### Pattern 2: Multiple Staff from Same Heritage Institution ✅
|
|
|
|
**Use case**: Collaborative research by colleagues at the same archive or museum
|
|
|
|
**Example**: Two archivists from Noord-Hollands Archief co-authoring digital transformation paper
|
|
|
|
```yaml
|
|
authors:
|
|
- person_id: archivist-nha-001
|
|
person_name: "Dr. Saskia de Jong"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/custodian/nl/noord-hollands-archief
|
|
organization_name: "Noord-Hollands Archief"
|
|
organization_type: "Archive"
|
|
- person_id: specialist-nha-001
|
|
person_name: "Peter Bakker"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/custodian/nl/noord-hollands-archief # ← Same institution
|
|
organization_name: "Noord-Hollands Archief"
|
|
organization_type: "Archive"
|
|
```
|
|
|
|
**Real example**: `noord-hollands-archief-digital-2023` (Noord-Hollands Archief:49-60)
|
|
|
|
---
|
|
|
|
### Pattern 3: Heritage + Academic Collaboration ✅
|
|
|
|
**Use case**: University researcher collaborates with heritage institution expert
|
|
|
|
**Example**: USP researcher + Biblioteca Nacional do Brasil librarian creating Linked Open Data resource
|
|
|
|
```yaml
|
|
authors:
|
|
- person_id: https://orcid.org/0000-0002-8888-9999
|
|
person_name: "Dr. Carlos Silva"
|
|
orcid: "0000-0002-8888-9999"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/organization/university-of-sao-paulo
|
|
organization_name: "University of São Paulo"
|
|
organization_type: "University"
|
|
- person_id: librarian-bnb-001
|
|
person_name: "Ana Santos"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/custodian/br/biblioteca-nacional-brasil # ← Heritage institution
|
|
organization_name: "Biblioteca Nacional do Brasil"
|
|
organization_type: "Library"
|
|
```
|
|
|
|
**Real example**: `lokg-brazilian-subset-2024` (Biblioteca Nacional do Brasil:92-103)
|
|
|
|
---
|
|
|
|
### Pattern 4: Multi-Institutional Consortium (3+ Heritage Institutions) ✅
|
|
|
|
**Use case**: Regional or national collaboration between multiple heritage institutions
|
|
|
|
**Example**: Dutch GLAM Consortium with KB + Rijksmuseum + Nationaal Archief
|
|
|
|
```yaml
|
|
authors:
|
|
- person_id: director-kb-001
|
|
person_name: "Dr. Liesbeth van der Pol"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library
|
|
organization_name: "Koninklijke Bibliotheek"
|
|
organization_type: "Library"
|
|
- person_id: curator-rijksmuseum-002
|
|
person_name: "Dr. Thomas de Vries"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/custodian/nl/rijksmuseum # ← Second institution
|
|
organization_name: "Rijksmuseum"
|
|
organization_type: "Museum"
|
|
- person_id: archivist-na-001
|
|
person_name: "Dr. Emma Jansen"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/custodian/nl/nationaal-archief # ← Third institution
|
|
organization_name: "Nationaal Archief"
|
|
organization_type: "Archive"
|
|
```
|
|
|
|
**Real example**: `dutch-glam-consortium-2023` (KB:132-137, Rijksmuseum:138-142, Nationaal Archief:144-148)
|
|
|
|
---
|
|
|
|
### Pattern 5: International Researcher + Local Heritage Institution ✅
|
|
|
|
**Use case**: Foreign scholar collaborates with local museum/archive/library
|
|
|
|
**Example**: French scholar + Dutch KB librarian studying European collection management systems
|
|
|
|
```yaml
|
|
authors:
|
|
- person_id: https://orcid.org/0000-0003-7777-8888
|
|
person_name: "Dr. Sophie Laurent"
|
|
orcid: "0000-0003-7777-8888"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/organization/universite-paris-sorbonne
|
|
organization_name: "Université Paris-Sorbonne"
|
|
organization_type: "University"
|
|
- person_id: librarian-kb-002
|
|
person_name: "Martijn Koster"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library # ← Dutch heritage institution
|
|
organization_name: "Koninklijke Bibliotheek"
|
|
organization_type: "Library"
|
|
```
|
|
|
|
**Real example**: `collection-management-systems-2024` (Koninklijke Bibliotheek:181-184)
|
|
|
|
---
|
|
|
|
### 4. `diverse_heritage_publications.yaml` (10 publications) ✅
|
|
**Diverse publication types: books, book chapters, technical reports, preprints**
|
|
|
|
**Publications included**:
|
|
|
|
| Title | Type | Authors | Year | Key Features |
|
|
|-------|------|---------|------|--------------|
|
|
| Linked Data for Museums | Book | Getty Trust researcher | 2020 | Practical GLAM linked data guide |
|
|
| Digital Preservation Handbook | Book | DPC staff (3 co-authors) | 2021 | Multi-author handbook from heritage org |
|
|
| Crowdsourcing Metadata for Libraries | Book Chapter | Library scholar | 2019 | Chapter within larger volume |
|
|
| Archival Appraisal in the Digital Age | Book Chapter | Archival scholar | 2022 | Theory chapter in archival studies |
|
|
| 3D Digitization at Koninklijke Bibliotheek | Technical Report | KB technical staff (2 authors) | 2023 | Grey literature from institution |
|
|
| Europeana Data Quality Framework | Technical Report | Europeana Foundation (4 authors) | 2022 | Organizational documentation |
|
|
| Graph Neural Networks for Provenance | Preprint (arXiv) | CS researcher | 2024 | Machine learning for heritage |
|
|
| LLMs for Catalog Enrichment | Preprint (OSF/SocArXiv) | LIS researcher | 2024 | AI applications in libraries |
|
|
| Ancient DNA from Museum Collections | Preprint (bioRxiv) | Museum geneticist + lab | 2024 | Scientific heritage use case |
|
|
|
|
**Preprint Server Patterns**:
|
|
- **arXiv.org**: Computer science and machine learning papers (heritage AI applications)
|
|
- Format: `https://arxiv.org/abs/YYMM.NNNNN` (e.g., `2411.12345`)
|
|
- DOI: `10.48550/arXiv.YYMM.NNNNN`
|
|
|
|
- **OSF/SocArXiv**: Library and information science preprints
|
|
- Format: `https://osf.io/preprints/socarxiv/[alphanumeric]` (e.g., `abc12`)
|
|
- DOI: `10.31235/osf.io/[alphanumeric]`
|
|
|
|
- **bioRxiv**: Biology and genetics papers (museum genomics, conservation)
|
|
- Format: `https://www.biorxiv.org/content/10.1101/YYYY.MM.DD.NNNNNN`
|
|
- DOI: `10.1101/YYYY.MM.DD.NNNNNN` (date-based)
|
|
|
|
**Schema patterns demonstrated**:
|
|
- Book metadata (`publication_type: BOOK`)
|
|
- Book chapter with `is_part_of` relationship to parent volume
|
|
- Technical reports as grey literature from heritage organizations
|
|
- Preprint metadata with server identifiers (arXiv ID, OSF ID, bioRxiv ID)
|
|
- Pre-publication date tracking vs. official publication date
|
|
|
|
---
|
|
|
|
### Integration Patterns from Diverse Publications
|
|
|
|
**Pattern 6: Books by Heritage Institution Staff** ✅
|
|
|
|
Example: Digital Preservation Handbook authored by Digital Preservation Coalition staff
|
|
|
|
```yaml
|
|
authors:
|
|
- person_name: "Sarah Jones"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/organization/digital-preservation-coalition
|
|
organization_name: "Digital Preservation Coalition"
|
|
organization_type: "Heritage consortium"
|
|
publication_type: BOOK
|
|
```
|
|
|
|
**Pattern 7: Technical Reports as Organizational Documentation** ✅
|
|
|
|
Example: KB 3D Digitization Report documenting institutional digitization workflows
|
|
|
|
```yaml
|
|
publication_type: TECHNICAL_REPORT
|
|
authors:
|
|
- person_name: "Erik Vermeulen"
|
|
affiliation:
|
|
organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library
|
|
organization_name: "Koninklijke Bibliotheek"
|
|
description: "Grey literature documenting internal digitization practices"
|
|
```
|
|
|
|
**Pattern 8: Preprints Before Formal Publication** ✅
|
|
|
|
Example: Machine learning research using heritage data published on arXiv
|
|
|
|
```yaml
|
|
publication_type: PREPRINT
|
|
preprint_server: arXiv
|
|
arxiv_id: "2411.12345"
|
|
doi: "10.48550/arXiv.2411.12345"
|
|
description: "Early research results, may be updated before journal submission"
|
|
```
|
|
|
|
**Pattern 9: Book Chapters in Edited Volumes** ✅
|
|
|
|
Example: Crowdsourcing chapter within larger library science anthology
|
|
|
|
```yaml
|
|
publication_type: BOOK_CHAPTER
|
|
is_part_of: "Digital Innovations in Libraries"
|
|
editors:
|
|
- "Jane Smith"
|
|
- "Robert Brown"
|
|
page_range: "145-168"
|
|
```
|
|
|
|
---
|
|
|
|
### Additional Integration Patterns (Future)
|
|
|
|
**Pattern 6: Publications About Specific Collections** (not yet implemented)
|
|
|
|
When a paper describes a heritage collection:
|
|
|
|
```yaml
|
|
# Future schema extension
|
|
about_collections:
|
|
- collection_id: https://w3id.org/heritage/collection/rijksmuseum-paintings
|
|
collection_name: "Rijksmuseum Paintings Collection"
|
|
collection_institution: https://w3id.org/heritage/custodian/nl/rijksmuseum
|
|
```
|
|
|
|
**Pattern 7: Data Papers Describing Heritage Datasets** (partially implemented)
|
|
|
|
When publications document heritage datasets:
|
|
|
|
```yaml
|
|
publication_type: DATASET # Already used in lokg-brazilian-subset-2024
|
|
# Future: Add describes_dataset field
|
|
describes_dataset:
|
|
- dataset_id: https://w3id.org/heritage/dataset/brazilian-lokg
|
|
dataset_name: "Brazilian Heritage Institutions Linked Open Data"
|
|
related_institutions:
|
|
- https://w3id.org/heritage/custodian/br/biblioteca-nacional-brasil
|
|
```
|
|
|
|
## Citation Analysis Queries
|
|
|
|
### Find Most Cited Publications
|
|
|
|
```python
|
|
from collections import Counter
|
|
|
|
citations = load_yaml('citation_relationships.yaml')
|
|
cited_counts = Counter(c['cited_work'] for c in citations)
|
|
|
|
print("Most cited publications:")
|
|
for pub_id, count in cited_counts.most_common():
|
|
print(f" {pub_id}: {count} citations")
|
|
```
|
|
|
|
### Build Citation Network
|
|
|
|
```python
|
|
import networkx as nx
|
|
|
|
G = nx.DiGraph()
|
|
for citation in citations:
|
|
G.add_edge(citation['citing_work'], citation['cited_work'],
|
|
citation_type=citation['citation_type'])
|
|
|
|
# Find influential papers (high in-degree)
|
|
influential = sorted(G.in_degree(), key=lambda x: x[1], reverse=True)
|
|
```
|
|
|
|
### Analyze Citation Types
|
|
|
|
```python
|
|
citation_types = Counter(c['citation_type'] for c in citations)
|
|
print("Citation type distribution:")
|
|
for ctype, count in citation_types.items():
|
|
print(f" {ctype}: {count}")
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
- **Schema**: `/schemas/bibliographic.yaml` - Full LinkML schema for bibliographic entities
|
|
- **Ontologies**:
|
|
- FaBiO (FRBR-aligned Bibliographic Ontology) - Publication modeling
|
|
- CiTO (Citation Typing Ontology) - Citation relationships
|
|
- BIBO (Bibliographic Ontology) - Bibliographic resources
|
|
- FRBR (Functional Requirements for Bibliographic Records) - Work/expression/manifestation
|
|
- **Test Fixtures**: `/tests/fixtures/publications/` - Validation examples
|
|
- **Schema Documentation**: `/docs/BIBLIOGRAPHIC_SCHEMA.md` (if exists)
|
|
|
|
## Future Enhancements
|
|
|
|
### Short-term (Next Session)
|
|
- [x] ✅ **COMPLETED**: Add publications linked to heritage institutions (5 added)
|
|
- [x] ✅ **COMPLETED**: Create citation relationships for heritage-linked pubs (8 citations added)
|
|
- [x] ✅ **COMPLETED**: Document 5 integration patterns
|
|
- [x] ✅ **COMPLETED**: Add more diverse publication types (books, book chapters, technical reports) - 10 added
|
|
- [x] ✅ **COMPLETED**: Add preprints (arXiv, bioRxiv, OSF/SocArxiv) - 4 added
|
|
- [x] ✅ **COMPLETED**: Add more cultural heritage domain papers (digital preservation, archival science) - included in diverse set
|
|
- [x] ✅ **COMPLETED**: Create 12 additional citation relationships linking diverse publications (27 total citations)
|
|
- [x] ✅ **COMPLETED**: Document preprint server patterns (arXiv, SocArXiv, bioRxiv)
|
|
- [x] ✅ **COMPLETED**: Document 4 additional integration patterns (6-9: books, technical reports, preprints, chapters)
|
|
- [ ] Create author disambiguation examples (same person with multiple IDs/ORCIDs)
|
|
- [ ] Add thesis/dissertation examples
|
|
- [ ] Add working papers (pre-publication research from institutions)
|
|
|
|
### Medium-term
|
|
- [ ] Author disambiguation (same person, multiple IDs)
|
|
- [ ] Keyword/subject term extraction
|
|
- [ ] Funding information (grants, sponsors)
|
|
- [ ] Publication metrics (citation counts from Crossref, Semantic Scholar)
|
|
- [ ] Full-text links (PDFs, preprints)
|
|
|
|
### Long-term
|
|
- [ ] RDF export (Turtle, JSON-LD)
|
|
- [ ] SPARQL endpoint for citation queries
|
|
- [ ] Bibliometric analysis dashboard
|
|
- [ ] Integration with Wikidata (author Q-numbers)
|
|
- [ ] Citation recommendation system
|
|
- [ ] Co-authorship network analysis
|
|
|
|
## Questions or Issues?
|
|
|
|
If you encounter validation errors or schema confusion:
|
|
|
|
1. Check the "Schema Quirks" section above
|
|
2. Review validated examples in `semantic_web_papers.yaml`
|
|
3. Consult test fixtures in `/tests/fixtures/publications/`
|
|
4. Read schema documentation in `/schemas/bibliographic.yaml` (inline comments)
|
|
5. File an issue or consult AI agent instructions in `/AGENTS.md`
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-11-09
|
|
**Schema Version**: bibliographic.yaml v0.2.0
|
|
**Dataset Version**: 0.3.0 (20 publications, 27 citations, 9 integration patterns demonstrated)
|