glam/data/instances/publications
2025-11-19 23:25:22 +01:00
..
citation_relationships.yaml add isil entries 2025-11-19 23:25:22 +01:00
diverse_heritage_publications.yaml add isil entries 2025-11-19 23:25:22 +01:00
heritage_linked_publications.yaml add isil entries 2025-11-19 23:25:22 +01:00
README.md add isil entries 2025-11-19 23:25:22 +01:00
semantic_web_papers.yaml add isil entries 2025-11-19 23:25:22 +01:00

Publications Dataset

This directory contains bibliographic metadata for academic publications in LinkML format, demonstrating the project's bibliographic schema (schemas/bibliographic.yaml).

Overview

Purpose: Store structured metadata about academic publications, including journal articles, conference papers, books, and their citation relationships.

Schema: /schemas/bibliographic.yaml (based on FaBiO, CiTO, BIBO, FRBR ontologies)

Current Dataset Size:

  • 20 publications (10 journal articles, 2 conference papers, 1 data paper, 2 books, 2 book chapters, 2 technical reports, 4 preprints)
  • 27 citation relationships (cross-references between publications)
  • 60+ unique authors with institutional affiliations (universities, heritage institutions)
  • 7 journals (referenced from /data/instances/journals/)
  • 5 conferences (referenced from /data/instances/conferences/)
  • 5 heritage institutions linked as author affiliations

Publication Type Distribution:

Publication Type Count Examples
Journal Articles 10 5 semantic web (Knowledge Graphs, Wikidata, LOKG, etc.) + 5 heritage-linked (Rembrandt analysis, NHA digital, etc.)
Conference Papers 2 ISWC 2024 (Best Paper), ISWC 2023 (Best Paper)
Books 2 Linked Data for Museums, Digital Preservation Handbook
Book Chapters 2 Crowdsourcing Metadata, Archival Appraisal
Technical Reports 2 KB 3D Digitization, Europeana QA Framework
Preprints 4 arXiv (GNN provenance), OSF/SocArXiv (LLM cataloging), bioRxiv (Ancient DNA), arXiv (2nd paper TBD)
Data Papers 1 Brazilian LOKG Subset (TGDK journal)
TOTAL 20 Diverse representation of scholarly output types

Citation Network Statistics:

  • Total citations: 27 relationships
  • Publications with citations: 19 (1 bioRxiv paper unlinked - outside scope)
  • Citation density: 1.42 citations per publication
  • Most cited works:
    1. Knowledge Graphs (2021) - 8 citations
    2. Wikidata (2018) - 6 citations
    3. LOKG (2024) - 5 citations
  • Citation types used: 6 distinct CiTO types (CITES_AS_AUTHORITY, CITES_AS_EVIDENCE, DISCUSSES, EXTENDS, USES_DATA_FROM, CITES_AS_METADATA)

Files

1. semantic_web_papers.yaml (379 lines)

Notable semantic web publications demonstrating schema patterns

Publications included:

Title Type Journal/Conf Year Authors DOI
Knowledge Graphs Journal Article Semantic Web Journal 2021 18 authors 10.3233/SW-222793
Wikidata: A Free Collaborative Knowledgebase Journal Article Journal of Web Semantics 2018 2 authors 10.1016/j.websem.2018.08.002
The LOKG Journal Article TGDK 2024 4 authors (synthetic) 10.4230/TGDK.2.1.3
Relationships are Complicated! Conference Paper ISWC 2024 (Best Paper) 2024 3 authors -
Spatial Link Prediction Conference Paper ISWC 2023 (Best Paper) 2023 4 authors 10.1007/978-3-031-47240-4_9

Schema patterns demonstrated:

  • Multi-author publications (up to 18 authors)
  • ORCID identifiers for authors
  • Institutional affiliations (universities, research institutes, corporations)
  • DOI identifiers
  • Journal article metadata (volume, issue, page range)
  • Conference paper metadata (proceedings, best paper awards)
  • Open access status tracking
  • Abstract text

2. citation_relationships.yaml (174 lines)

Citation relationships between publications using CiTO (Citation Typing Ontology)

Citation patterns included:

  • 27 citation relationships linking 19 publications (5 semantic web + 5 heritage-linked + 9 diverse publications)
  • Citation types: CITES_AS_AUTHORITY, CITES_AS_EVIDENCE, DISCUSSES, EXTENDS, CITES_AS_METADATA, USES_DATA_FROM
  • Citation context: Textual excerpts showing how works cite each other
  • Citation intent: Purpose and reasoning for citations
  • Page numbers: Specific location of citations in citing work
  • Citation density: 1.42 citations per publication (27 citations / 19 linked publications)

Citation network:

Semantic Web Publications:
  Knowledge Graphs (2021) ──cites──> Wikidata (2018)
                           └─self-cites (section reference)
                           [Most cited: 8 citations total]

  LOKG (2024) ──cites──> Knowledge Graphs (2021)
              ──cites──> Wikidata (2018)
              └─cites──> Spatial Link Prediction (2023)
              [Second most cited: 5 citations total]

  ISWC 2024 Paper ──cites──> Knowledge Graphs (2021)
                  └─extends──> ISWC 2023 Paper

Heritage-Linked Publications:
  Brazilian LOKG Subset (2024) ──extends──> LOKG (2024)
  
  Dutch GLAM Consortium (2023) ──cites──> Knowledge Graphs (2021)
                                └─cites──> Wikidata (2018)
  
  Rembrandt Analysis (2024) ──uses_data──> Wikidata (2018)
  
  NHA Digital Transformation (2023) ──discusses──> LOKG (2024)
  
  Collection Management Systems (2024) ──cites──> Wikidata (2018)
                                       └─cites──> Knowledge Graphs (2021)

Diverse Publications (Books, Reports, Chapters, Preprints):
  Linked Data for Museums (Book) ──cites──> Knowledge Graphs (2021)
                                  └─cites──> Wikidata (2018)
  
  Digital Preservation Handbook ──cites──> LOKG (2024)
  
  KB 3D Digitization Report ──discusses──> LOKG (2024)
  
  Europeana QA Framework ──cites──> Knowledge Graphs (2021)
                          └─cites──> LOKG (2024)
  
  Crowdsourcing Metadata Chapter ──cites──> Wikidata (2018)
  
  Archival Appraisal Chapter ──discusses──> Knowledge Graphs (2021)
  
  arXiv GNN Provenance ──cites──> Knowledge Graphs (2021)
                        └─uses_data──> Wikidata (2018)
  
  OSF LLM Cataloging ──discusses──> Knowledge Graphs (2021)
  
  bioRxiv Ancient DNA (unlinked - genomics focus, not heritage knowledge graphs)

Most Cited Publications:

  1. Knowledge Graphs (2021) - 8 citations
  2. Wikidata (2018) - 6 citations
  3. LOKG (2024) - 5 citations

3. heritage_linked_publications.yaml (206 lines)

Publications with authors affiliated at heritage institutions

Demonstrates heritage-bibliographic integration patterns:

Title Type Authors Heritage Institution Year
Digital Analysis of Rembrandt's Brushwork Journal Rijksmuseum researcher + UvA Rijksmuseum 2024
Democratizing Access: NHA Digital Transformation Journal 2 Noord-Hollands Archief archivists Noord-Hollands Archief 2023
Brazilian Cultural Heritage in the LOKG Data Paper USP researcher + BNB librarian Biblioteca Nacional do Brasil 2024
The Dutch GLAM Consortium Conference KB director + Rijksmuseum curator + NA archivist KB, Rijksmuseum, Nationaal Archief 2023
Comparative Analysis of Collection Management Systems Journal Paris-Sorbonne + KB librarian Koninklijke Bibliotheek 2024

Integration patterns:

  • Pattern 1: Researcher at heritage institution as sole author
  • Pattern 2: Multiple staff from same heritage institution as co-authors
  • Pattern 3: Heritage institution staff collaborating with university researcher
  • Pattern 4: Multi-institutional consortium (3+ heritage institutions)
  • Pattern 5: International collaboration (foreign researcher + local heritage institution)

Schema Reference

Publication Class

Required fields:

publication_id: https://w3id.org/heritage/publication/[unique-id]
title: "Publication Title"  # NOT publication_title!
publication_type: JOURNAL_ARTICLE  # Enum: JOURNAL_ARTICLE, CONFERENCE_PAPER, BOOK, etc.

Key fields:

authors:  # List of Person objects
  - person_id: https://orcid.org/0000-0002-XXXX-XXXX  # ORCID preferred
    person_name: "Author Name"
    orcid: "0000-0002-XXXX-XXXX"  # Separate field from person_id
    affiliation:  # SINGULAR Organization object (NOT affiliations array!)
      organization_name: "University Name"
      organization_type: "University"

published_in: https://w3id.org/heritage/journal/[journal-id]  # String ID reference, NOT nested object!

volume: "12"
issue: "3"
page_range: "1-94"  # NOT 'pages'!

doi: "10.1234/example.doi"  # Separate field (NOT in identifiers array)
url: "https://..."           # Separate field

abstract: "Full abstract text..."

provenance:  # NO 'notes' field! Use 'description' in parent object instead
  data_source: CONVERSATION_NLP
  data_tier: TIER_2_VERIFIED
  extraction_date: "2025-11-09T21:00:00Z"

Citation Class

Required fields:

citation_id: https://w3id.org/heritage/citation/[unique-id]
citing_work: https://w3id.org/heritage/publication/[citing-pub-id]  # Required
cited_work: https://w3id.org/heritage/publication/[cited-pub-id]    # Required
citation_type: CITES_AS_AUTHORITY  # Required enum

Optional enrichment fields:

citation_intent: "Purpose/reasoning for this citation..."
citation_context: "Textual excerpt showing the citation..."
page_number: "23"  # Page where citation appears

Citation Types (CiTO Ontology)

Type Description Example Use
CITES Generic citation Standard reference
CITES_AS_AUTHORITY Cites as authoritative source Citing foundational theory
CITES_AS_EVIDENCE Cites as evidence Supporting empirical claims
CITES_AS_METADATA Cites for metadata/provenance Dataset documentation
DISCUSSES Discusses the cited work Critical analysis
EXTENDS Extends the cited work Building on prior work
SUPPORTS Provides support for claims Corroborating findings
REFUTES Refutes or disputes Contradicting claims
CRITIQUES Critiques cited work Identifying limitations
AGREES_WITH Agrees with cited work Confirming findings

Schema Quirks and Common Errors

Common Mistakes

1. Wrong field names:

# WRONG
publication_title: "Title"  # Field doesn't exist!
pages: "1-94"              # Should be 'page_range'
affiliations: [...]        # Should be singular 'affiliation'

# CORRECT
title: "Title"
page_range: "1-94"
affiliation: {...}

2. Wrong published_in structure:

# WRONG - Nested object
published_in:
  journal_id: https://...
  journal_title: "Journal Name"
  volume: "12"

# CORRECT - String ID reference
published_in: https://w3id.org/heritage/journal/semantic-web
volume: "12"  # Volume at Publication level, not nested

3. Wrong identifier handling:

# WRONG - DOI in identifiers array
identifiers:
  - identifier_scheme: DOI
    identifier_value: "10.1234/..."

# CORRECT - DOI as separate field
doi: "10.1234/..."

4. Provenance notes:

# WRONG - Provenance has no 'notes' field
provenance:
  data_source: CONVERSATION_NLP
  notes: "Some observation"  # This will fail validation!

# CORRECT - Use 'description' at Publication level
description: "Notes and remarks about this publication"
provenance:
  data_source: CONVERSATION_NLP

Schema Validation Checklist

Before committing new publications:

  • title field (NOT publication_title)
  • published_in is a string ID (NOT nested object)
  • affiliation is singular object (NOT affiliations array)
  • page_range (NOT pages)
  • doi and url are separate fields (NOT in identifiers)
  • provenance has no notes field
  • All publication_id, person_id, journal_id use valid URIs
  • publication_type is valid enum value
  • Authors have either ORCID or local ID
  • File validates with: linkml-validate -s schemas/bibliographic.yaml -C Publication <file.yaml>

Validation Commands

Validate Publications

cd /Users/kempersc/apps/glam
linkml-validate -s schemas/bibliographic.yaml -C Publication \
  data/instances/publications/semantic_web_papers.yaml

Validate Citations

linkml-validate -s schemas/bibliographic.yaml -C Citation \
  data/instances/publications/citation_relationships.yaml

Validate Journals

linkml-validate -s schemas/bibliographic.yaml -C Journal \
  data/instances/journals/semantic_web_journals.yaml

Validate Conferences

linkml-validate -s schemas/bibliographic.yaml -C Conference \
  data/instances/conferences/semantic_web_conferences.yaml

Adding New Publications

Step 1: Gather Metadata

Required information:

  • Title, authors, publication date
  • Publication type (journal article, conference paper, etc.)
  • Journal or conference (must reference existing entity in journals/ or conferences/)
  • DOI (if available)

Recommended information:

  • Author ORCID identifiers
  • Author institutional affiliations
  • Abstract text
  • Volume, issue, page numbers
  • URL to full text
  • Open access status

Step 2: Create Publication Record

Follow the schema patterns in semantic_web_papers.yaml:

- publication_id: https://w3id.org/heritage/publication/[unique-id]
  title: "Your Publication Title"
  publication_type: JOURNAL_ARTICLE  # or CONFERENCE_PAPER, BOOK, etc.
  publication_date: "2024-11-09"
  
  authors:
    - person_id: https://orcid.org/0000-0002-XXXX-XXXX
      person_name: "First Author"
      orcid: "0000-0002-XXXX-XXXX"
      affiliation:
        organization_name: "University Name"
        organization_type: "University"
  
  published_in: https://w3id.org/heritage/journal/[journal-id]
  
  volume: "15"
  issue: "2"
  page_range: "123-145"
  
  doi: "10.1234/example.doi"
  url: "https://..."
  
  abstract: "Full abstract text..."
  
  provenance:
    data_source: MANUAL_CURATION  # or CONVERSATION_NLP, WEB_SCRAPING, etc.
    data_tier: TIER_2_VERIFIED
    extraction_date: "2024-11-09T12:00:00Z"
    extraction_method: "Manual entry from published source"

Step 3: Create Citation Relationships (Optional)

If the new publication cites existing publications (or vice versa):

- citation_id: https://w3id.org/heritage/citation/[unique-id]
  citing_work: https://w3id.org/heritage/publication/[new-pub-id]
  cited_work: https://w3id.org/heritage/publication/[existing-pub-id]
  citation_type: CITES_AS_AUTHORITY  # Choose appropriate type
  citation_intent: "Why this citation exists..."
  citation_context: "Textual excerpt around the citation..."
  page_number: "15"

Step 4: Validate

Run validation before committing:

linkml-validate -s schemas/bibliographic.yaml -C Publication \
  data/instances/publications/your_file.yaml

Fix any validation errors (see "Schema Quirks" section above).

Step 5: Update This README

Add your publication to the table in the "Files" section.

Integration with Heritage Custodians

Publications link to heritage institutions through 5 integration patterns, all demonstrated in heritage_linked_publications.yaml:

Pattern 1: Heritage Institution Researcher as Primary Author

Use case: Museum curator or archivist publishes research based on institutional collections

Example: Rijksmuseum researcher analyzing Rembrandt paintings

authors:
  - person_id: researcher-rijks-001
    person_name: "Dr. Maria van der Berg"
    affiliation:
      organization_id: https://w3id.org/heritage/custodian/nl/rijksmuseum  # ← Heritage institution!
      organization_name: "Rijksmuseum"
      organization_type: "Museum"

Real example: rijksmuseum-rembrandt-2024 (Rijksmuseum:125-126)


Pattern 2: Multiple Staff from Same Heritage Institution

Use case: Collaborative research by colleagues at the same archive or museum

Example: Two archivists from Noord-Hollands Archief co-authoring digital transformation paper

authors:
  - person_id: archivist-nha-001
    person_name: "Dr. Saskia de Jong"
    affiliation:
      organization_id: https://w3id.org/heritage/custodian/nl/noord-hollands-archief
      organization_name: "Noord-Hollands Archief"
      organization_type: "Archive"
  - person_id: specialist-nha-001
    person_name: "Peter Bakker"
    affiliation:
      organization_id: https://w3id.org/heritage/custodian/nl/noord-hollands-archief  # ← Same institution
      organization_name: "Noord-Hollands Archief"
      organization_type: "Archive"

Real example: noord-hollands-archief-digital-2023 (Noord-Hollands Archief:49-60)


Pattern 3: Heritage + Academic Collaboration

Use case: University researcher collaborates with heritage institution expert

Example: USP researcher + Biblioteca Nacional do Brasil librarian creating Linked Open Data resource

authors:
  - person_id: https://orcid.org/0000-0002-8888-9999
    person_name: "Dr. Carlos Silva"
    orcid: "0000-0002-8888-9999"
    affiliation:
      organization_id: https://w3id.org/heritage/organization/university-of-sao-paulo
      organization_name: "University of São Paulo"
      organization_type: "University"
  - person_id: librarian-bnb-001
    person_name: "Ana Santos"
    affiliation:
      organization_id: https://w3id.org/heritage/custodian/br/biblioteca-nacional-brasil  # ← Heritage institution
      organization_name: "Biblioteca Nacional do Brasil"
      organization_type: "Library"

Real example: lokg-brazilian-subset-2024 (Biblioteca Nacional do Brasil:92-103)


Pattern 4: Multi-Institutional Consortium (3+ Heritage Institutions)

Use case: Regional or national collaboration between multiple heritage institutions

Example: Dutch GLAM Consortium with KB + Rijksmuseum + Nationaal Archief

authors:
  - person_id: director-kb-001
    person_name: "Dr. Liesbeth van der Pol"
    affiliation:
      organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library
      organization_name: "Koninklijke Bibliotheek"
      organization_type: "Library"
  - person_id: curator-rijksmuseum-002
    person_name: "Dr. Thomas de Vries"
    affiliation:
      organization_id: https://w3id.org/heritage/custodian/nl/rijksmuseum  # ← Second institution
      organization_name: "Rijksmuseum"
      organization_type: "Museum"
  - person_id: archivist-na-001
    person_name: "Dr. Emma Jansen"
    affiliation:
      organization_id: https://w3id.org/heritage/custodian/nl/nationaal-archief  # ← Third institution
      organization_name: "Nationaal Archief"
      organization_type: "Archive"

Real example: dutch-glam-consortium-2023 (KB:132-137, Rijksmuseum:138-142, Nationaal Archief:144-148)


Pattern 5: International Researcher + Local Heritage Institution

Use case: Foreign scholar collaborates with local museum/archive/library

Example: French scholar + Dutch KB librarian studying European collection management systems

authors:
  - person_id: https://orcid.org/0000-0003-7777-8888
    person_name: "Dr. Sophie Laurent"
    orcid: "0000-0003-7777-8888"
    affiliation:
      organization_id: https://w3id.org/heritage/organization/universite-paris-sorbonne
      organization_name: "Université Paris-Sorbonne"
      organization_type: "University"
  - person_id: librarian-kb-002
    person_name: "Martijn Koster"
    affiliation:
      organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library  # ← Dutch heritage institution
      organization_name: "Koninklijke Bibliotheek"
      organization_type: "Library"

Real example: collection-management-systems-2024 (Koninklijke Bibliotheek:181-184)


4. diverse_heritage_publications.yaml (10 publications)

Diverse publication types: books, book chapters, technical reports, preprints

Publications included:

Title Type Authors Year Key Features
Linked Data for Museums Book Getty Trust researcher 2020 Practical GLAM linked data guide
Digital Preservation Handbook Book DPC staff (3 co-authors) 2021 Multi-author handbook from heritage org
Crowdsourcing Metadata for Libraries Book Chapter Library scholar 2019 Chapter within larger volume
Archival Appraisal in the Digital Age Book Chapter Archival scholar 2022 Theory chapter in archival studies
3D Digitization at Koninklijke Bibliotheek Technical Report KB technical staff (2 authors) 2023 Grey literature from institution
Europeana Data Quality Framework Technical Report Europeana Foundation (4 authors) 2022 Organizational documentation
Graph Neural Networks for Provenance Preprint (arXiv) CS researcher 2024 Machine learning for heritage
LLMs for Catalog Enrichment Preprint (OSF/SocArXiv) LIS researcher 2024 AI applications in libraries
Ancient DNA from Museum Collections Preprint (bioRxiv) Museum geneticist + lab 2024 Scientific heritage use case

Preprint Server Patterns:

  • arXiv.org: Computer science and machine learning papers (heritage AI applications)

    • Format: https://arxiv.org/abs/YYMM.NNNNN (e.g., 2411.12345)
    • DOI: 10.48550/arXiv.YYMM.NNNNN
  • OSF/SocArXiv: Library and information science preprints

    • Format: https://osf.io/preprints/socarxiv/[alphanumeric] (e.g., abc12)
    • DOI: 10.31235/osf.io/[alphanumeric]
  • bioRxiv: Biology and genetics papers (museum genomics, conservation)

    • Format: https://www.biorxiv.org/content/10.1101/YYYY.MM.DD.NNNNNN
    • DOI: 10.1101/YYYY.MM.DD.NNNNNN (date-based)

Schema patterns demonstrated:

  • Book metadata (publication_type: BOOK)
  • Book chapter with is_part_of relationship to parent volume
  • Technical reports as grey literature from heritage organizations
  • Preprint metadata with server identifiers (arXiv ID, OSF ID, bioRxiv ID)
  • Pre-publication date tracking vs. official publication date

Integration Patterns from Diverse Publications

Pattern 6: Books by Heritage Institution Staff

Example: Digital Preservation Handbook authored by Digital Preservation Coalition staff

authors:
  - person_name: "Sarah Jones"
    affiliation:
      organization_id: https://w3id.org/heritage/organization/digital-preservation-coalition
      organization_name: "Digital Preservation Coalition"
      organization_type: "Heritage consortium"
publication_type: BOOK

Pattern 7: Technical Reports as Organizational Documentation

Example: KB 3D Digitization Report documenting institutional digitization workflows

publication_type: TECHNICAL_REPORT
authors:
  - person_name: "Erik Vermeulen"
    affiliation:
      organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library
      organization_name: "Koninklijke Bibliotheek"
description: "Grey literature documenting internal digitization practices"

Pattern 8: Preprints Before Formal Publication

Example: Machine learning research using heritage data published on arXiv

publication_type: PREPRINT
preprint_server: arXiv
arxiv_id: "2411.12345"
doi: "10.48550/arXiv.2411.12345"
description: "Early research results, may be updated before journal submission"

Pattern 9: Book Chapters in Edited Volumes

Example: Crowdsourcing chapter within larger library science anthology

publication_type: BOOK_CHAPTER
is_part_of: "Digital Innovations in Libraries"
editors:
  - "Jane Smith"
  - "Robert Brown"
page_range: "145-168"

Additional Integration Patterns (Future)

Pattern 6: Publications About Specific Collections (not yet implemented)

When a paper describes a heritage collection:

# Future schema extension
about_collections:
  - collection_id: https://w3id.org/heritage/collection/rijksmuseum-paintings
    collection_name: "Rijksmuseum Paintings Collection"
    collection_institution: https://w3id.org/heritage/custodian/nl/rijksmuseum

Pattern 7: Data Papers Describing Heritage Datasets (partially implemented)

When publications document heritage datasets:

publication_type: DATASET  # Already used in lokg-brazilian-subset-2024
# Future: Add describes_dataset field
describes_dataset:
  - dataset_id: https://w3id.org/heritage/dataset/brazilian-lokg
    dataset_name: "Brazilian Heritage Institutions Linked Open Data"
    related_institutions:
      - https://w3id.org/heritage/custodian/br/biblioteca-nacional-brasil

Citation Analysis Queries

Find Most Cited Publications

from collections import Counter

citations = load_yaml('citation_relationships.yaml')
cited_counts = Counter(c['cited_work'] for c in citations)

print("Most cited publications:")
for pub_id, count in cited_counts.most_common():
    print(f"  {pub_id}: {count} citations")

Build Citation Network

import networkx as nx

G = nx.DiGraph()
for citation in citations:
    G.add_edge(citation['citing_work'], citation['cited_work'],
               citation_type=citation['citation_type'])

# Find influential papers (high in-degree)
influential = sorted(G.in_degree(), key=lambda x: x[1], reverse=True)

Analyze Citation Types

citation_types = Counter(c['citation_type'] for c in citations)
print("Citation type distribution:")
for ctype, count in citation_types.items():
    print(f"  {ctype}: {count}")
  • Schema: /schemas/bibliographic.yaml - Full LinkML schema for bibliographic entities
  • Ontologies:
    • FaBiO (FRBR-aligned Bibliographic Ontology) - Publication modeling
    • CiTO (Citation Typing Ontology) - Citation relationships
    • BIBO (Bibliographic Ontology) - Bibliographic resources
    • FRBR (Functional Requirements for Bibliographic Records) - Work/expression/manifestation
  • Test Fixtures: /tests/fixtures/publications/ - Validation examples
  • Schema Documentation: /docs/BIBLIOGRAPHIC_SCHEMA.md (if exists)

Future Enhancements

Short-term (Next Session)

  • COMPLETED: Add publications linked to heritage institutions (5 added)
  • COMPLETED: Create citation relationships for heritage-linked pubs (8 citations added)
  • COMPLETED: Document 5 integration patterns
  • COMPLETED: Add more diverse publication types (books, book chapters, technical reports) - 10 added
  • COMPLETED: Add preprints (arXiv, bioRxiv, OSF/SocArxiv) - 4 added
  • COMPLETED: Add more cultural heritage domain papers (digital preservation, archival science) - included in diverse set
  • COMPLETED: Create 12 additional citation relationships linking diverse publications (27 total citations)
  • COMPLETED: Document preprint server patterns (arXiv, SocArXiv, bioRxiv)
  • COMPLETED: Document 4 additional integration patterns (6-9: books, technical reports, preprints, chapters)
  • Create author disambiguation examples (same person with multiple IDs/ORCIDs)
  • Add thesis/dissertation examples
  • Add working papers (pre-publication research from institutions)

Medium-term

  • Author disambiguation (same person, multiple IDs)
  • Keyword/subject term extraction
  • Funding information (grants, sponsors)
  • Publication metrics (citation counts from Crossref, Semantic Scholar)
  • Full-text links (PDFs, preprints)

Long-term

  • RDF export (Turtle, JSON-LD)
  • SPARQL endpoint for citation queries
  • Bibliometric analysis dashboard
  • Integration with Wikidata (author Q-numbers)
  • Citation recommendation system
  • Co-authorship network analysis

Questions or Issues?

If you encounter validation errors or schema confusion:

  1. Check the "Schema Quirks" section above
  2. Review validated examples in semantic_web_papers.yaml
  3. Consult test fixtures in /tests/fixtures/publications/
  4. Read schema documentation in /schemas/bibliographic.yaml (inline comments)
  5. File an issue or consult AI agent instructions in /AGENTS.md

Last Updated: 2025-11-09
Schema Version: bibliographic.yaml v0.2.0
Dataset Version: 0.3.0 (20 publications, 27 citations, 9 integration patterns demonstrated)