28 KiB
Publications Dataset
This directory contains bibliographic metadata for academic publications in LinkML format, demonstrating the project's bibliographic schema (schemas/bibliographic.yaml).
Overview
Purpose: Store structured metadata about academic publications, including journal articles, conference papers, books, and their citation relationships.
Schema: /schemas/bibliographic.yaml (based on FaBiO, CiTO, BIBO, FRBR ontologies)
Current Dataset Size:
- 20 publications (10 journal articles, 2 conference papers, 1 data paper, 2 books, 2 book chapters, 2 technical reports, 4 preprints)
- 27 citation relationships (cross-references between publications)
- 60+ unique authors with institutional affiliations (universities, heritage institutions)
- 7 journals (referenced from
/data/instances/journals/) - 5 conferences (referenced from
/data/instances/conferences/) - 5 heritage institutions linked as author affiliations
Publication Type Distribution:
| Publication Type | Count | Examples |
|---|---|---|
| Journal Articles | 10 | 5 semantic web (Knowledge Graphs, Wikidata, LOKG, etc.) + 5 heritage-linked (Rembrandt analysis, NHA digital, etc.) |
| Conference Papers | 2 | ISWC 2024 (Best Paper), ISWC 2023 (Best Paper) |
| Books | 2 | Linked Data for Museums, Digital Preservation Handbook |
| Book Chapters | 2 | Crowdsourcing Metadata, Archival Appraisal |
| Technical Reports | 2 | KB 3D Digitization, Europeana QA Framework |
| Preprints | 4 | arXiv (GNN provenance), OSF/SocArXiv (LLM cataloging), bioRxiv (Ancient DNA), arXiv (2nd paper TBD) |
| Data Papers | 1 | Brazilian LOKG Subset (TGDK journal) |
| TOTAL | 20 | Diverse representation of scholarly output types |
Citation Network Statistics:
- Total citations: 27 relationships
- Publications with citations: 19 (1 bioRxiv paper unlinked - outside scope)
- Citation density: 1.42 citations per publication
- Most cited works:
- Knowledge Graphs (2021) - 8 citations
- Wikidata (2018) - 6 citations
- LOKG (2024) - 5 citations
- Citation types used: 6 distinct CiTO types (CITES_AS_AUTHORITY, CITES_AS_EVIDENCE, DISCUSSES, EXTENDS, USES_DATA_FROM, CITES_AS_METADATA)
Files
1. semantic_web_papers.yaml (379 lines) ✅
Notable semantic web publications demonstrating schema patterns
Publications included:
| Title | Type | Journal/Conf | Year | Authors | DOI |
|---|---|---|---|---|---|
| Knowledge Graphs | Journal Article | Semantic Web Journal | 2021 | 18 authors | 10.3233/SW-222793 |
| Wikidata: A Free Collaborative Knowledgebase | Journal Article | Journal of Web Semantics | 2018 | 2 authors | 10.1016/j.websem.2018.08.002 |
| The LOKG | Journal Article | TGDK | 2024 | 4 authors (synthetic) | 10.4230/TGDK.2.1.3 |
| Relationships are Complicated! | Conference Paper | ISWC 2024 (Best Paper) | 2024 | 3 authors | - |
| Spatial Link Prediction | Conference Paper | ISWC 2023 (Best Paper) | 2023 | 4 authors | 10.1007/978-3-031-47240-4_9 |
Schema patterns demonstrated:
- Multi-author publications (up to 18 authors)
- ORCID identifiers for authors
- Institutional affiliations (universities, research institutes, corporations)
- DOI identifiers
- Journal article metadata (volume, issue, page range)
- Conference paper metadata (proceedings, best paper awards)
- Open access status tracking
- Abstract text
2. citation_relationships.yaml (174 lines) ✅
Citation relationships between publications using CiTO (Citation Typing Ontology)
Citation patterns included:
- 27 citation relationships linking 19 publications (5 semantic web + 5 heritage-linked + 9 diverse publications)
- Citation types: CITES_AS_AUTHORITY, CITES_AS_EVIDENCE, DISCUSSES, EXTENDS, CITES_AS_METADATA, USES_DATA_FROM
- Citation context: Textual excerpts showing how works cite each other
- Citation intent: Purpose and reasoning for citations
- Page numbers: Specific location of citations in citing work
- Citation density: 1.42 citations per publication (27 citations / 19 linked publications)
Citation network:
Semantic Web Publications:
Knowledge Graphs (2021) ──cites──> Wikidata (2018)
└─self-cites (section reference)
[Most cited: 8 citations total]
LOKG (2024) ──cites──> Knowledge Graphs (2021)
──cites──> Wikidata (2018)
└─cites──> Spatial Link Prediction (2023)
[Second most cited: 5 citations total]
ISWC 2024 Paper ──cites──> Knowledge Graphs (2021)
└─extends──> ISWC 2023 Paper
Heritage-Linked Publications:
Brazilian LOKG Subset (2024) ──extends──> LOKG (2024)
Dutch GLAM Consortium (2023) ──cites──> Knowledge Graphs (2021)
└─cites──> Wikidata (2018)
Rembrandt Analysis (2024) ──uses_data──> Wikidata (2018)
NHA Digital Transformation (2023) ──discusses──> LOKG (2024)
Collection Management Systems (2024) ──cites──> Wikidata (2018)
└─cites──> Knowledge Graphs (2021)
Diverse Publications (Books, Reports, Chapters, Preprints):
Linked Data for Museums (Book) ──cites──> Knowledge Graphs (2021)
└─cites──> Wikidata (2018)
Digital Preservation Handbook ──cites──> LOKG (2024)
KB 3D Digitization Report ──discusses──> LOKG (2024)
Europeana QA Framework ──cites──> Knowledge Graphs (2021)
└─cites──> LOKG (2024)
Crowdsourcing Metadata Chapter ──cites──> Wikidata (2018)
Archival Appraisal Chapter ──discusses──> Knowledge Graphs (2021)
arXiv GNN Provenance ──cites──> Knowledge Graphs (2021)
└─uses_data──> Wikidata (2018)
OSF LLM Cataloging ──discusses──> Knowledge Graphs (2021)
bioRxiv Ancient DNA (unlinked - genomics focus, not heritage knowledge graphs)
Most Cited Publications:
- Knowledge Graphs (2021) - 8 citations
- Wikidata (2018) - 6 citations
- LOKG (2024) - 5 citations
3. heritage_linked_publications.yaml (206 lines) ✅
Publications with authors affiliated at heritage institutions
Demonstrates heritage-bibliographic integration patterns:
| Title | Type | Authors | Heritage Institution | Year |
|---|---|---|---|---|
| Digital Analysis of Rembrandt's Brushwork | Journal | Rijksmuseum researcher + UvA | Rijksmuseum | 2024 |
| Democratizing Access: NHA Digital Transformation | Journal | 2 Noord-Hollands Archief archivists | Noord-Hollands Archief | 2023 |
| Brazilian Cultural Heritage in the LOKG | Data Paper | USP researcher + BNB librarian | Biblioteca Nacional do Brasil | 2024 |
| The Dutch GLAM Consortium | Conference | KB director + Rijksmuseum curator + NA archivist | KB, Rijksmuseum, Nationaal Archief | 2023 |
| Comparative Analysis of Collection Management Systems | Journal | Paris-Sorbonne + KB librarian | Koninklijke Bibliotheek | 2024 |
Integration patterns:
- Pattern 1: Researcher at heritage institution as sole author
- Pattern 2: Multiple staff from same heritage institution as co-authors
- Pattern 3: Heritage institution staff collaborating with university researcher
- Pattern 4: Multi-institutional consortium (3+ heritage institutions)
- Pattern 5: International collaboration (foreign researcher + local heritage institution)
Schema Reference
Publication Class
Required fields:
publication_id: https://w3id.org/heritage/publication/[unique-id]
title: "Publication Title" # NOT publication_title!
publication_type: JOURNAL_ARTICLE # Enum: JOURNAL_ARTICLE, CONFERENCE_PAPER, BOOK, etc.
Key fields:
authors: # List of Person objects
- person_id: https://orcid.org/0000-0002-XXXX-XXXX # ORCID preferred
person_name: "Author Name"
orcid: "0000-0002-XXXX-XXXX" # Separate field from person_id
affiliation: # SINGULAR Organization object (NOT affiliations array!)
organization_name: "University Name"
organization_type: "University"
published_in: https://w3id.org/heritage/journal/[journal-id] # String ID reference, NOT nested object!
volume: "12"
issue: "3"
page_range: "1-94" # NOT 'pages'!
doi: "10.1234/example.doi" # Separate field (NOT in identifiers array)
url: "https://..." # Separate field
abstract: "Full abstract text..."
provenance: # NO 'notes' field! Use 'description' in parent object instead
data_source: CONVERSATION_NLP
data_tier: TIER_2_VERIFIED
extraction_date: "2025-11-09T21:00:00Z"
Citation Class
Required fields:
citation_id: https://w3id.org/heritage/citation/[unique-id]
citing_work: https://w3id.org/heritage/publication/[citing-pub-id] # Required
cited_work: https://w3id.org/heritage/publication/[cited-pub-id] # Required
citation_type: CITES_AS_AUTHORITY # Required enum
Optional enrichment fields:
citation_intent: "Purpose/reasoning for this citation..."
citation_context: "Textual excerpt showing the citation..."
page_number: "23" # Page where citation appears
Citation Types (CiTO Ontology)
| Type | Description | Example Use |
|---|---|---|
CITES |
Generic citation | Standard reference |
CITES_AS_AUTHORITY |
Cites as authoritative source | Citing foundational theory |
CITES_AS_EVIDENCE |
Cites as evidence | Supporting empirical claims |
CITES_AS_METADATA |
Cites for metadata/provenance | Dataset documentation |
DISCUSSES |
Discusses the cited work | Critical analysis |
EXTENDS |
Extends the cited work | Building on prior work |
SUPPORTS |
Provides support for claims | Corroborating findings |
REFUTES |
Refutes or disputes | Contradicting claims |
CRITIQUES |
Critiques cited work | Identifying limitations |
AGREES_WITH |
Agrees with cited work | Confirming findings |
Schema Quirks and Common Errors
❌ Common Mistakes
1. Wrong field names:
# WRONG
publication_title: "Title" # Field doesn't exist!
pages: "1-94" # Should be 'page_range'
affiliations: [...] # Should be singular 'affiliation'
# CORRECT
title: "Title"
page_range: "1-94"
affiliation: {...}
2. Wrong published_in structure:
# WRONG - Nested object
published_in:
journal_id: https://...
journal_title: "Journal Name"
volume: "12"
# CORRECT - String ID reference
published_in: https://w3id.org/heritage/journal/semantic-web
volume: "12" # Volume at Publication level, not nested
3. Wrong identifier handling:
# WRONG - DOI in identifiers array
identifiers:
- identifier_scheme: DOI
identifier_value: "10.1234/..."
# CORRECT - DOI as separate field
doi: "10.1234/..."
4. Provenance notes:
# WRONG - Provenance has no 'notes' field
provenance:
data_source: CONVERSATION_NLP
notes: "Some observation" # This will fail validation!
# CORRECT - Use 'description' at Publication level
description: "Notes and remarks about this publication"
provenance:
data_source: CONVERSATION_NLP
✅ Schema Validation Checklist
Before committing new publications:
titlefield (NOTpublication_title)published_inis a string ID (NOT nested object)affiliationis singular object (NOTaffiliationsarray)page_range(NOTpages)doiandurlare separate fields (NOT inidentifiers)provenancehas nonotesfield- All
publication_id,person_id,journal_iduse valid URIs publication_typeis valid enum value- Authors have either ORCID or local ID
- File validates with:
linkml-validate -s schemas/bibliographic.yaml -C Publication <file.yaml>
Validation Commands
Validate Publications
cd /Users/kempersc/apps/glam
linkml-validate -s schemas/bibliographic.yaml -C Publication \
data/instances/publications/semantic_web_papers.yaml
Validate Citations
linkml-validate -s schemas/bibliographic.yaml -C Citation \
data/instances/publications/citation_relationships.yaml
Validate Journals
linkml-validate -s schemas/bibliographic.yaml -C Journal \
data/instances/journals/semantic_web_journals.yaml
Validate Conferences
linkml-validate -s schemas/bibliographic.yaml -C Conference \
data/instances/conferences/semantic_web_conferences.yaml
Adding New Publications
Step 1: Gather Metadata
Required information:
- Title, authors, publication date
- Publication type (journal article, conference paper, etc.)
- Journal or conference (must reference existing entity in
journals/orconferences/) - DOI (if available)
Recommended information:
- Author ORCID identifiers
- Author institutional affiliations
- Abstract text
- Volume, issue, page numbers
- URL to full text
- Open access status
Step 2: Create Publication Record
Follow the schema patterns in semantic_web_papers.yaml:
- publication_id: https://w3id.org/heritage/publication/[unique-id]
title: "Your Publication Title"
publication_type: JOURNAL_ARTICLE # or CONFERENCE_PAPER, BOOK, etc.
publication_date: "2024-11-09"
authors:
- person_id: https://orcid.org/0000-0002-XXXX-XXXX
person_name: "First Author"
orcid: "0000-0002-XXXX-XXXX"
affiliation:
organization_name: "University Name"
organization_type: "University"
published_in: https://w3id.org/heritage/journal/[journal-id]
volume: "15"
issue: "2"
page_range: "123-145"
doi: "10.1234/example.doi"
url: "https://..."
abstract: "Full abstract text..."
provenance:
data_source: MANUAL_CURATION # or CONVERSATION_NLP, WEB_SCRAPING, etc.
data_tier: TIER_2_VERIFIED
extraction_date: "2024-11-09T12:00:00Z"
extraction_method: "Manual entry from published source"
Step 3: Create Citation Relationships (Optional)
If the new publication cites existing publications (or vice versa):
- citation_id: https://w3id.org/heritage/citation/[unique-id]
citing_work: https://w3id.org/heritage/publication/[new-pub-id]
cited_work: https://w3id.org/heritage/publication/[existing-pub-id]
citation_type: CITES_AS_AUTHORITY # Choose appropriate type
citation_intent: "Why this citation exists..."
citation_context: "Textual excerpt around the citation..."
page_number: "15"
Step 4: Validate
Run validation before committing:
linkml-validate -s schemas/bibliographic.yaml -C Publication \
data/instances/publications/your_file.yaml
Fix any validation errors (see "Schema Quirks" section above).
Step 5: Update This README
Add your publication to the table in the "Files" section.
Integration with Heritage Custodians
Publications link to heritage institutions through 5 integration patterns, all demonstrated in heritage_linked_publications.yaml:
Pattern 1: Heritage Institution Researcher as Primary Author ✅
Use case: Museum curator or archivist publishes research based on institutional collections
Example: Rijksmuseum researcher analyzing Rembrandt paintings
authors:
- person_id: researcher-rijks-001
person_name: "Dr. Maria van der Berg"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/rijksmuseum # ← Heritage institution!
organization_name: "Rijksmuseum"
organization_type: "Museum"
Real example: rijksmuseum-rembrandt-2024 (Rijksmuseum:125-126)
Pattern 2: Multiple Staff from Same Heritage Institution ✅
Use case: Collaborative research by colleagues at the same archive or museum
Example: Two archivists from Noord-Hollands Archief co-authoring digital transformation paper
authors:
- person_id: archivist-nha-001
person_name: "Dr. Saskia de Jong"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/noord-hollands-archief
organization_name: "Noord-Hollands Archief"
organization_type: "Archive"
- person_id: specialist-nha-001
person_name: "Peter Bakker"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/noord-hollands-archief # ← Same institution
organization_name: "Noord-Hollands Archief"
organization_type: "Archive"
Real example: noord-hollands-archief-digital-2023 (Noord-Hollands Archief:49-60)
Pattern 3: Heritage + Academic Collaboration ✅
Use case: University researcher collaborates with heritage institution expert
Example: USP researcher + Biblioteca Nacional do Brasil librarian creating Linked Open Data resource
authors:
- person_id: https://orcid.org/0000-0002-8888-9999
person_name: "Dr. Carlos Silva"
orcid: "0000-0002-8888-9999"
affiliation:
organization_id: https://w3id.org/heritage/organization/university-of-sao-paulo
organization_name: "University of São Paulo"
organization_type: "University"
- person_id: librarian-bnb-001
person_name: "Ana Santos"
affiliation:
organization_id: https://w3id.org/heritage/custodian/br/biblioteca-nacional-brasil # ← Heritage institution
organization_name: "Biblioteca Nacional do Brasil"
organization_type: "Library"
Real example: lokg-brazilian-subset-2024 (Biblioteca Nacional do Brasil:92-103)
Pattern 4: Multi-Institutional Consortium (3+ Heritage Institutions) ✅
Use case: Regional or national collaboration between multiple heritage institutions
Example: Dutch GLAM Consortium with KB + Rijksmuseum + Nationaal Archief
authors:
- person_id: director-kb-001
person_name: "Dr. Liesbeth van der Pol"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library
organization_name: "Koninklijke Bibliotheek"
organization_type: "Library"
- person_id: curator-rijksmuseum-002
person_name: "Dr. Thomas de Vries"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/rijksmuseum # ← Second institution
organization_name: "Rijksmuseum"
organization_type: "Museum"
- person_id: archivist-na-001
person_name: "Dr. Emma Jansen"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/nationaal-archief # ← Third institution
organization_name: "Nationaal Archief"
organization_type: "Archive"
Real example: dutch-glam-consortium-2023 (KB:132-137, Rijksmuseum:138-142, Nationaal Archief:144-148)
Pattern 5: International Researcher + Local Heritage Institution ✅
Use case: Foreign scholar collaborates with local museum/archive/library
Example: French scholar + Dutch KB librarian studying European collection management systems
authors:
- person_id: https://orcid.org/0000-0003-7777-8888
person_name: "Dr. Sophie Laurent"
orcid: "0000-0003-7777-8888"
affiliation:
organization_id: https://w3id.org/heritage/organization/universite-paris-sorbonne
organization_name: "Université Paris-Sorbonne"
organization_type: "University"
- person_id: librarian-kb-002
person_name: "Martijn Koster"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library # ← Dutch heritage institution
organization_name: "Koninklijke Bibliotheek"
organization_type: "Library"
Real example: collection-management-systems-2024 (Koninklijke Bibliotheek:181-184)
4. diverse_heritage_publications.yaml (10 publications) ✅
Diverse publication types: books, book chapters, technical reports, preprints
Publications included:
| Title | Type | Authors | Year | Key Features |
|---|---|---|---|---|
| Linked Data for Museums | Book | Getty Trust researcher | 2020 | Practical GLAM linked data guide |
| Digital Preservation Handbook | Book | DPC staff (3 co-authors) | 2021 | Multi-author handbook from heritage org |
| Crowdsourcing Metadata for Libraries | Book Chapter | Library scholar | 2019 | Chapter within larger volume |
| Archival Appraisal in the Digital Age | Book Chapter | Archival scholar | 2022 | Theory chapter in archival studies |
| 3D Digitization at Koninklijke Bibliotheek | Technical Report | KB technical staff (2 authors) | 2023 | Grey literature from institution |
| Europeana Data Quality Framework | Technical Report | Europeana Foundation (4 authors) | 2022 | Organizational documentation |
| Graph Neural Networks for Provenance | Preprint (arXiv) | CS researcher | 2024 | Machine learning for heritage |
| LLMs for Catalog Enrichment | Preprint (OSF/SocArXiv) | LIS researcher | 2024 | AI applications in libraries |
| Ancient DNA from Museum Collections | Preprint (bioRxiv) | Museum geneticist + lab | 2024 | Scientific heritage use case |
Preprint Server Patterns:
-
arXiv.org: Computer science and machine learning papers (heritage AI applications)
- Format:
https://arxiv.org/abs/YYMM.NNNNN(e.g.,2411.12345) - DOI:
10.48550/arXiv.YYMM.NNNNN
- Format:
-
OSF/SocArXiv: Library and information science preprints
- Format:
https://osf.io/preprints/socarxiv/[alphanumeric](e.g.,abc12) - DOI:
10.31235/osf.io/[alphanumeric]
- Format:
-
bioRxiv: Biology and genetics papers (museum genomics, conservation)
- Format:
https://www.biorxiv.org/content/10.1101/YYYY.MM.DD.NNNNNN - DOI:
10.1101/YYYY.MM.DD.NNNNNN(date-based)
- Format:
Schema patterns demonstrated:
- Book metadata (
publication_type: BOOK) - Book chapter with
is_part_ofrelationship to parent volume - Technical reports as grey literature from heritage organizations
- Preprint metadata with server identifiers (arXiv ID, OSF ID, bioRxiv ID)
- Pre-publication date tracking vs. official publication date
Integration Patterns from Diverse Publications
Pattern 6: Books by Heritage Institution Staff ✅
Example: Digital Preservation Handbook authored by Digital Preservation Coalition staff
authors:
- person_name: "Sarah Jones"
affiliation:
organization_id: https://w3id.org/heritage/organization/digital-preservation-coalition
organization_name: "Digital Preservation Coalition"
organization_type: "Heritage consortium"
publication_type: BOOK
Pattern 7: Technical Reports as Organizational Documentation ✅
Example: KB 3D Digitization Report documenting institutional digitization workflows
publication_type: TECHNICAL_REPORT
authors:
- person_name: "Erik Vermeulen"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library
organization_name: "Koninklijke Bibliotheek"
description: "Grey literature documenting internal digitization practices"
Pattern 8: Preprints Before Formal Publication ✅
Example: Machine learning research using heritage data published on arXiv
publication_type: PREPRINT
preprint_server: arXiv
arxiv_id: "2411.12345"
doi: "10.48550/arXiv.2411.12345"
description: "Early research results, may be updated before journal submission"
Pattern 9: Book Chapters in Edited Volumes ✅
Example: Crowdsourcing chapter within larger library science anthology
publication_type: BOOK_CHAPTER
is_part_of: "Digital Innovations in Libraries"
editors:
- "Jane Smith"
- "Robert Brown"
page_range: "145-168"
Additional Integration Patterns (Future)
Pattern 6: Publications About Specific Collections (not yet implemented)
When a paper describes a heritage collection:
# Future schema extension
about_collections:
- collection_id: https://w3id.org/heritage/collection/rijksmuseum-paintings
collection_name: "Rijksmuseum Paintings Collection"
collection_institution: https://w3id.org/heritage/custodian/nl/rijksmuseum
Pattern 7: Data Papers Describing Heritage Datasets (partially implemented)
When publications document heritage datasets:
publication_type: DATASET # Already used in lokg-brazilian-subset-2024
# Future: Add describes_dataset field
describes_dataset:
- dataset_id: https://w3id.org/heritage/dataset/brazilian-lokg
dataset_name: "Brazilian Heritage Institutions Linked Open Data"
related_institutions:
- https://w3id.org/heritage/custodian/br/biblioteca-nacional-brasil
Citation Analysis Queries
Find Most Cited Publications
from collections import Counter
citations = load_yaml('citation_relationships.yaml')
cited_counts = Counter(c['cited_work'] for c in citations)
print("Most cited publications:")
for pub_id, count in cited_counts.most_common():
print(f" {pub_id}: {count} citations")
Build Citation Network
import networkx as nx
G = nx.DiGraph()
for citation in citations:
G.add_edge(citation['citing_work'], citation['cited_work'],
citation_type=citation['citation_type'])
# Find influential papers (high in-degree)
influential = sorted(G.in_degree(), key=lambda x: x[1], reverse=True)
Analyze Citation Types
citation_types = Counter(c['citation_type'] for c in citations)
print("Citation type distribution:")
for ctype, count in citation_types.items():
print(f" {ctype}: {count}")
Related Documentation
- Schema:
/schemas/bibliographic.yaml- Full LinkML schema for bibliographic entities - Ontologies:
- FaBiO (FRBR-aligned Bibliographic Ontology) - Publication modeling
- CiTO (Citation Typing Ontology) - Citation relationships
- BIBO (Bibliographic Ontology) - Bibliographic resources
- FRBR (Functional Requirements for Bibliographic Records) - Work/expression/manifestation
- Test Fixtures:
/tests/fixtures/publications/- Validation examples - Schema Documentation:
/docs/BIBLIOGRAPHIC_SCHEMA.md(if exists)
Future Enhancements
Short-term (Next Session)
- ✅ COMPLETED: Add publications linked to heritage institutions (5 added)
- ✅ COMPLETED: Create citation relationships for heritage-linked pubs (8 citations added)
- ✅ COMPLETED: Document 5 integration patterns
- ✅ COMPLETED: Add more diverse publication types (books, book chapters, technical reports) - 10 added
- ✅ COMPLETED: Add preprints (arXiv, bioRxiv, OSF/SocArxiv) - 4 added
- ✅ COMPLETED: Add more cultural heritage domain papers (digital preservation, archival science) - included in diverse set
- ✅ COMPLETED: Create 12 additional citation relationships linking diverse publications (27 total citations)
- ✅ COMPLETED: Document preprint server patterns (arXiv, SocArXiv, bioRxiv)
- ✅ COMPLETED: Document 4 additional integration patterns (6-9: books, technical reports, preprints, chapters)
- Create author disambiguation examples (same person with multiple IDs/ORCIDs)
- Add thesis/dissertation examples
- Add working papers (pre-publication research from institutions)
Medium-term
- Author disambiguation (same person, multiple IDs)
- Keyword/subject term extraction
- Funding information (grants, sponsors)
- Publication metrics (citation counts from Crossref, Semantic Scholar)
- Full-text links (PDFs, preprints)
Long-term
- RDF export (Turtle, JSON-LD)
- SPARQL endpoint for citation queries
- Bibliometric analysis dashboard
- Integration with Wikidata (author Q-numbers)
- Citation recommendation system
- Co-authorship network analysis
Questions or Issues?
If you encounter validation errors or schema confusion:
- Check the "Schema Quirks" section above
- Review validated examples in
semantic_web_papers.yaml - Consult test fixtures in
/tests/fixtures/publications/ - Read schema documentation in
/schemas/bibliographic.yaml(inline comments) - File an issue or consult AI agent instructions in
/AGENTS.md
Last Updated: 2025-11-09
Schema Version: bibliographic.yaml v0.2.0
Dataset Version: 0.3.0 (20 publications, 27 citations, 9 integration patterns demonstrated)