glam/data/instances/publications/README.md
2025-11-19 23:25:22 +01:00

757 lines
28 KiB
Markdown

# Publications Dataset
This directory contains bibliographic metadata for academic publications in LinkML format, demonstrating the project's bibliographic schema (`schemas/bibliographic.yaml`).
## Overview
**Purpose**: Store structured metadata about academic publications, including journal articles, conference papers, books, and their citation relationships.
**Schema**: `/schemas/bibliographic.yaml` (based on FaBiO, CiTO, BIBO, FRBR ontologies)
**Current Dataset Size**:
- **20 publications** (10 journal articles, 2 conference papers, 1 data paper, 2 books, 2 book chapters, 2 technical reports, 4 preprints)
- **27 citation relationships** (cross-references between publications)
- **60+ unique authors** with institutional affiliations (universities, heritage institutions)
- **7 journals** (referenced from `/data/instances/journals/`)
- **5 conferences** (referenced from `/data/instances/conferences/`)
- **5 heritage institutions** linked as author affiliations
**Publication Type Distribution**:
| Publication Type | Count | Examples |
|------------------|-------|----------|
| Journal Articles | 10 | 5 semantic web (Knowledge Graphs, Wikidata, LOKG, etc.) + 5 heritage-linked (Rembrandt analysis, NHA digital, etc.) |
| Conference Papers | 2 | ISWC 2024 (Best Paper), ISWC 2023 (Best Paper) |
| Books | 2 | Linked Data for Museums, Digital Preservation Handbook |
| Book Chapters | 2 | Crowdsourcing Metadata, Archival Appraisal |
| Technical Reports | 2 | KB 3D Digitization, Europeana QA Framework |
| Preprints | 4 | arXiv (GNN provenance), OSF/SocArXiv (LLM cataloging), bioRxiv (Ancient DNA), arXiv (2nd paper TBD) |
| Data Papers | 1 | Brazilian LOKG Subset (TGDK journal) |
| **TOTAL** | **20** | Diverse representation of scholarly output types |
**Citation Network Statistics**:
- **Total citations**: 27 relationships
- **Publications with citations**: 19 (1 bioRxiv paper unlinked - outside scope)
- **Citation density**: 1.42 citations per publication
- **Most cited works**:
1. Knowledge Graphs (2021) - 8 citations
2. Wikidata (2018) - 6 citations
3. LOKG (2024) - 5 citations
- **Citation types used**: 6 distinct CiTO types (CITES_AS_AUTHORITY, CITES_AS_EVIDENCE, DISCUSSES, EXTENDS, USES_DATA_FROM, CITES_AS_METADATA)
## Files
### 1. `semantic_web_papers.yaml` (379 lines) ✅
**Notable semantic web publications demonstrating schema patterns**
**Publications included**:
| Title | Type | Journal/Conf | Year | Authors | DOI |
|-------|------|-------------|------|---------|-----|
| Knowledge Graphs | Journal Article | Semantic Web Journal | 2021 | 18 authors | 10.3233/SW-222793 |
| Wikidata: A Free Collaborative Knowledgebase | Journal Article | Journal of Web Semantics | 2018 | 2 authors | 10.1016/j.websem.2018.08.002 |
| The LOKG | Journal Article | TGDK | 2024 | 4 authors (synthetic) | 10.4230/TGDK.2.1.3 |
| Relationships are Complicated! | Conference Paper | ISWC 2024 (Best Paper) | 2024 | 3 authors | - |
| Spatial Link Prediction | Conference Paper | ISWC 2023 (Best Paper) | 2023 | 4 authors | 10.1007/978-3-031-47240-4_9 |
**Schema patterns demonstrated**:
- Multi-author publications (up to 18 authors)
- ORCID identifiers for authors
- Institutional affiliations (universities, research institutes, corporations)
- DOI identifiers
- Journal article metadata (volume, issue, page range)
- Conference paper metadata (proceedings, best paper awards)
- Open access status tracking
- Abstract text
### 2. `citation_relationships.yaml` (174 lines) ✅
**Citation relationships between publications using CiTO (Citation Typing Ontology)**
**Citation patterns included**:
- **27 citation relationships** linking 19 publications (5 semantic web + 5 heritage-linked + 9 diverse publications)
- **Citation types**: CITES_AS_AUTHORITY, CITES_AS_EVIDENCE, DISCUSSES, EXTENDS, CITES_AS_METADATA, USES_DATA_FROM
- **Citation context**: Textual excerpts showing how works cite each other
- **Citation intent**: Purpose and reasoning for citations
- **Page numbers**: Specific location of citations in citing work
- **Citation density**: 1.42 citations per publication (27 citations / 19 linked publications)
**Citation network**:
```
Semantic Web Publications:
Knowledge Graphs (2021) ──cites──> Wikidata (2018)
└─self-cites (section reference)
[Most cited: 8 citations total]
LOKG (2024) ──cites──> Knowledge Graphs (2021)
──cites──> Wikidata (2018)
└─cites──> Spatial Link Prediction (2023)
[Second most cited: 5 citations total]
ISWC 2024 Paper ──cites──> Knowledge Graphs (2021)
└─extends──> ISWC 2023 Paper
Heritage-Linked Publications:
Brazilian LOKG Subset (2024) ──extends──> LOKG (2024)
Dutch GLAM Consortium (2023) ──cites──> Knowledge Graphs (2021)
└─cites──> Wikidata (2018)
Rembrandt Analysis (2024) ──uses_data──> Wikidata (2018)
NHA Digital Transformation (2023) ──discusses──> LOKG (2024)
Collection Management Systems (2024) ──cites──> Wikidata (2018)
└─cites──> Knowledge Graphs (2021)
Diverse Publications (Books, Reports, Chapters, Preprints):
Linked Data for Museums (Book) ──cites──> Knowledge Graphs (2021)
└─cites──> Wikidata (2018)
Digital Preservation Handbook ──cites──> LOKG (2024)
KB 3D Digitization Report ──discusses──> LOKG (2024)
Europeana QA Framework ──cites──> Knowledge Graphs (2021)
└─cites──> LOKG (2024)
Crowdsourcing Metadata Chapter ──cites──> Wikidata (2018)
Archival Appraisal Chapter ──discusses──> Knowledge Graphs (2021)
arXiv GNN Provenance ──cites──> Knowledge Graphs (2021)
└─uses_data──> Wikidata (2018)
OSF LLM Cataloging ──discusses──> Knowledge Graphs (2021)
bioRxiv Ancient DNA (unlinked - genomics focus, not heritage knowledge graphs)
```
**Most Cited Publications**:
1. Knowledge Graphs (2021) - 8 citations
2. Wikidata (2018) - 6 citations
3. LOKG (2024) - 5 citations
### 3. `heritage_linked_publications.yaml` (206 lines) ✅
**Publications with authors affiliated at heritage institutions**
**Demonstrates heritage-bibliographic integration patterns**:
| Title | Type | Authors | Heritage Institution | Year |
|-------|------|---------|---------------------|------|
| Digital Analysis of Rembrandt's Brushwork | Journal | Rijksmuseum researcher + UvA | Rijksmuseum | 2024 |
| Democratizing Access: NHA Digital Transformation | Journal | 2 Noord-Hollands Archief archivists | Noord-Hollands Archief | 2023 |
| Brazilian Cultural Heritage in the LOKG | Data Paper | USP researcher + BNB librarian | Biblioteca Nacional do Brasil | 2024 |
| The Dutch GLAM Consortium | Conference | KB director + Rijksmuseum curator + NA archivist | KB, Rijksmuseum, Nationaal Archief | 2023 |
| Comparative Analysis of Collection Management Systems | Journal | Paris-Sorbonne + KB librarian | Koninklijke Bibliotheek | 2024 |
**Integration patterns**:
- **Pattern 1**: Researcher at heritage institution as sole author
- **Pattern 2**: Multiple staff from same heritage institution as co-authors
- **Pattern 3**: Heritage institution staff collaborating with university researcher
- **Pattern 4**: Multi-institutional consortium (3+ heritage institutions)
- **Pattern 5**: International collaboration (foreign researcher + local heritage institution)
## Schema Reference
### Publication Class
**Required fields**:
```yaml
publication_id: https://w3id.org/heritage/publication/[unique-id]
title: "Publication Title" # NOT publication_title!
publication_type: JOURNAL_ARTICLE # Enum: JOURNAL_ARTICLE, CONFERENCE_PAPER, BOOK, etc.
```
**Key fields**:
```yaml
authors: # List of Person objects
- person_id: https://orcid.org/0000-0002-XXXX-XXXX # ORCID preferred
person_name: "Author Name"
orcid: "0000-0002-XXXX-XXXX" # Separate field from person_id
affiliation: # SINGULAR Organization object (NOT affiliations array!)
organization_name: "University Name"
organization_type: "University"
published_in: https://w3id.org/heritage/journal/[journal-id] # String ID reference, NOT nested object!
volume: "12"
issue: "3"
page_range: "1-94" # NOT 'pages'!
doi: "10.1234/example.doi" # Separate field (NOT in identifiers array)
url: "https://..." # Separate field
abstract: "Full abstract text..."
provenance: # NO 'notes' field! Use 'description' in parent object instead
data_source: CONVERSATION_NLP
data_tier: TIER_2_VERIFIED
extraction_date: "2025-11-09T21:00:00Z"
```
### Citation Class
**Required fields**:
```yaml
citation_id: https://w3id.org/heritage/citation/[unique-id]
citing_work: https://w3id.org/heritage/publication/[citing-pub-id] # Required
cited_work: https://w3id.org/heritage/publication/[cited-pub-id] # Required
citation_type: CITES_AS_AUTHORITY # Required enum
```
**Optional enrichment fields**:
```yaml
citation_intent: "Purpose/reasoning for this citation..."
citation_context: "Textual excerpt showing the citation..."
page_number: "23" # Page where citation appears
```
### Citation Types (CiTO Ontology)
| Type | Description | Example Use |
|------|-------------|-------------|
| `CITES` | Generic citation | Standard reference |
| `CITES_AS_AUTHORITY` | Cites as authoritative source | Citing foundational theory |
| `CITES_AS_EVIDENCE` | Cites as evidence | Supporting empirical claims |
| `CITES_AS_METADATA` | Cites for metadata/provenance | Dataset documentation |
| `DISCUSSES` | Discusses the cited work | Critical analysis |
| `EXTENDS` | Extends the cited work | Building on prior work |
| `SUPPORTS` | Provides support for claims | Corroborating findings |
| `REFUTES` | Refutes or disputes | Contradicting claims |
| `CRITIQUES` | Critiques cited work | Identifying limitations |
| `AGREES_WITH` | Agrees with cited work | Confirming findings |
## Schema Quirks and Common Errors
### ❌ Common Mistakes
**1. Wrong field names**:
```yaml
# WRONG
publication_title: "Title" # Field doesn't exist!
pages: "1-94" # Should be 'page_range'
affiliations: [...] # Should be singular 'affiliation'
# CORRECT
title: "Title"
page_range: "1-94"
affiliation: {...}
```
**2. Wrong `published_in` structure**:
```yaml
# WRONG - Nested object
published_in:
journal_id: https://...
journal_title: "Journal Name"
volume: "12"
# CORRECT - String ID reference
published_in: https://w3id.org/heritage/journal/semantic-web
volume: "12" # Volume at Publication level, not nested
```
**3. Wrong identifier handling**:
```yaml
# WRONG - DOI in identifiers array
identifiers:
- identifier_scheme: DOI
identifier_value: "10.1234/..."
# CORRECT - DOI as separate field
doi: "10.1234/..."
```
**4. Provenance notes**:
```yaml
# WRONG - Provenance has no 'notes' field
provenance:
data_source: CONVERSATION_NLP
notes: "Some observation" # This will fail validation!
# CORRECT - Use 'description' at Publication level
description: "Notes and remarks about this publication"
provenance:
data_source: CONVERSATION_NLP
```
### ✅ Schema Validation Checklist
Before committing new publications:
- [ ] `title` field (NOT `publication_title`)
- [ ] `published_in` is a string ID (NOT nested object)
- [ ] `affiliation` is singular object (NOT `affiliations` array)
- [ ] `page_range` (NOT `pages`)
- [ ] `doi` and `url` are separate fields (NOT in `identifiers`)
- [ ] `provenance` has no `notes` field
- [ ] All `publication_id`, `person_id`, `journal_id` use valid URIs
- [ ] `publication_type` is valid enum value
- [ ] Authors have either ORCID or local ID
- [ ] File validates with: `linkml-validate -s schemas/bibliographic.yaml -C Publication <file.yaml>`
## Validation Commands
### Validate Publications
```bash
cd /Users/kempersc/apps/glam
linkml-validate -s schemas/bibliographic.yaml -C Publication \
data/instances/publications/semantic_web_papers.yaml
```
### Validate Citations
```bash
linkml-validate -s schemas/bibliographic.yaml -C Citation \
data/instances/publications/citation_relationships.yaml
```
### Validate Journals
```bash
linkml-validate -s schemas/bibliographic.yaml -C Journal \
data/instances/journals/semantic_web_journals.yaml
```
### Validate Conferences
```bash
linkml-validate -s schemas/bibliographic.yaml -C Conference \
data/instances/conferences/semantic_web_conferences.yaml
```
## Adding New Publications
### Step 1: Gather Metadata
**Required information**:
- Title, authors, publication date
- Publication type (journal article, conference paper, etc.)
- Journal or conference (must reference existing entity in `journals/` or `conferences/`)
- DOI (if available)
**Recommended information**:
- Author ORCID identifiers
- Author institutional affiliations
- Abstract text
- Volume, issue, page numbers
- URL to full text
- Open access status
### Step 2: Create Publication Record
Follow the schema patterns in `semantic_web_papers.yaml`:
```yaml
- publication_id: https://w3id.org/heritage/publication/[unique-id]
title: "Your Publication Title"
publication_type: JOURNAL_ARTICLE # or CONFERENCE_PAPER, BOOK, etc.
publication_date: "2024-11-09"
authors:
- person_id: https://orcid.org/0000-0002-XXXX-XXXX
person_name: "First Author"
orcid: "0000-0002-XXXX-XXXX"
affiliation:
organization_name: "University Name"
organization_type: "University"
published_in: https://w3id.org/heritage/journal/[journal-id]
volume: "15"
issue: "2"
page_range: "123-145"
doi: "10.1234/example.doi"
url: "https://..."
abstract: "Full abstract text..."
provenance:
data_source: MANUAL_CURATION # or CONVERSATION_NLP, WEB_SCRAPING, etc.
data_tier: TIER_2_VERIFIED
extraction_date: "2024-11-09T12:00:00Z"
extraction_method: "Manual entry from published source"
```
### Step 3: Create Citation Relationships (Optional)
If the new publication cites existing publications (or vice versa):
```yaml
- citation_id: https://w3id.org/heritage/citation/[unique-id]
citing_work: https://w3id.org/heritage/publication/[new-pub-id]
cited_work: https://w3id.org/heritage/publication/[existing-pub-id]
citation_type: CITES_AS_AUTHORITY # Choose appropriate type
citation_intent: "Why this citation exists..."
citation_context: "Textual excerpt around the citation..."
page_number: "15"
```
### Step 4: Validate
Run validation before committing:
```bash
linkml-validate -s schemas/bibliographic.yaml -C Publication \
data/instances/publications/your_file.yaml
```
Fix any validation errors (see "Schema Quirks" section above).
### Step 5: Update This README
Add your publication to the table in the "Files" section.
## Integration with Heritage Custodians
Publications link to heritage institutions through **5 integration patterns**, all demonstrated in `heritage_linked_publications.yaml`:
### Pattern 1: Heritage Institution Researcher as Primary Author ✅
**Use case**: Museum curator or archivist publishes research based on institutional collections
**Example**: Rijksmuseum researcher analyzing Rembrandt paintings
```yaml
authors:
- person_id: researcher-rijks-001
person_name: "Dr. Maria van der Berg"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/rijksmuseum # ← Heritage institution!
organization_name: "Rijksmuseum"
organization_type: "Museum"
```
**Real example**: `rijksmuseum-rembrandt-2024` (Rijksmuseum:125-126)
---
### Pattern 2: Multiple Staff from Same Heritage Institution ✅
**Use case**: Collaborative research by colleagues at the same archive or museum
**Example**: Two archivists from Noord-Hollands Archief co-authoring digital transformation paper
```yaml
authors:
- person_id: archivist-nha-001
person_name: "Dr. Saskia de Jong"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/noord-hollands-archief
organization_name: "Noord-Hollands Archief"
organization_type: "Archive"
- person_id: specialist-nha-001
person_name: "Peter Bakker"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/noord-hollands-archief # ← Same institution
organization_name: "Noord-Hollands Archief"
organization_type: "Archive"
```
**Real example**: `noord-hollands-archief-digital-2023` (Noord-Hollands Archief:49-60)
---
### Pattern 3: Heritage + Academic Collaboration ✅
**Use case**: University researcher collaborates with heritage institution expert
**Example**: USP researcher + Biblioteca Nacional do Brasil librarian creating Linked Open Data resource
```yaml
authors:
- person_id: https://orcid.org/0000-0002-8888-9999
person_name: "Dr. Carlos Silva"
orcid: "0000-0002-8888-9999"
affiliation:
organization_id: https://w3id.org/heritage/organization/university-of-sao-paulo
organization_name: "University of São Paulo"
organization_type: "University"
- person_id: librarian-bnb-001
person_name: "Ana Santos"
affiliation:
organization_id: https://w3id.org/heritage/custodian/br/biblioteca-nacional-brasil # ← Heritage institution
organization_name: "Biblioteca Nacional do Brasil"
organization_type: "Library"
```
**Real example**: `lokg-brazilian-subset-2024` (Biblioteca Nacional do Brasil:92-103)
---
### Pattern 4: Multi-Institutional Consortium (3+ Heritage Institutions) ✅
**Use case**: Regional or national collaboration between multiple heritage institutions
**Example**: Dutch GLAM Consortium with KB + Rijksmuseum + Nationaal Archief
```yaml
authors:
- person_id: director-kb-001
person_name: "Dr. Liesbeth van der Pol"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library
organization_name: "Koninklijke Bibliotheek"
organization_type: "Library"
- person_id: curator-rijksmuseum-002
person_name: "Dr. Thomas de Vries"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/rijksmuseum # ← Second institution
organization_name: "Rijksmuseum"
organization_type: "Museum"
- person_id: archivist-na-001
person_name: "Dr. Emma Jansen"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/nationaal-archief # ← Third institution
organization_name: "Nationaal Archief"
organization_type: "Archive"
```
**Real example**: `dutch-glam-consortium-2023` (KB:132-137, Rijksmuseum:138-142, Nationaal Archief:144-148)
---
### Pattern 5: International Researcher + Local Heritage Institution ✅
**Use case**: Foreign scholar collaborates with local museum/archive/library
**Example**: French scholar + Dutch KB librarian studying European collection management systems
```yaml
authors:
- person_id: https://orcid.org/0000-0003-7777-8888
person_name: "Dr. Sophie Laurent"
orcid: "0000-0003-7777-8888"
affiliation:
organization_id: https://w3id.org/heritage/organization/universite-paris-sorbonne
organization_name: "Université Paris-Sorbonne"
organization_type: "University"
- person_id: librarian-kb-002
person_name: "Martijn Koster"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library # ← Dutch heritage institution
organization_name: "Koninklijke Bibliotheek"
organization_type: "Library"
```
**Real example**: `collection-management-systems-2024` (Koninklijke Bibliotheek:181-184)
---
### 4. `diverse_heritage_publications.yaml` (10 publications) ✅
**Diverse publication types: books, book chapters, technical reports, preprints**
**Publications included**:
| Title | Type | Authors | Year | Key Features |
|-------|------|---------|------|--------------|
| Linked Data for Museums | Book | Getty Trust researcher | 2020 | Practical GLAM linked data guide |
| Digital Preservation Handbook | Book | DPC staff (3 co-authors) | 2021 | Multi-author handbook from heritage org |
| Crowdsourcing Metadata for Libraries | Book Chapter | Library scholar | 2019 | Chapter within larger volume |
| Archival Appraisal in the Digital Age | Book Chapter | Archival scholar | 2022 | Theory chapter in archival studies |
| 3D Digitization at Koninklijke Bibliotheek | Technical Report | KB technical staff (2 authors) | 2023 | Grey literature from institution |
| Europeana Data Quality Framework | Technical Report | Europeana Foundation (4 authors) | 2022 | Organizational documentation |
| Graph Neural Networks for Provenance | Preprint (arXiv) | CS researcher | 2024 | Machine learning for heritage |
| LLMs for Catalog Enrichment | Preprint (OSF/SocArXiv) | LIS researcher | 2024 | AI applications in libraries |
| Ancient DNA from Museum Collections | Preprint (bioRxiv) | Museum geneticist + lab | 2024 | Scientific heritage use case |
**Preprint Server Patterns**:
- **arXiv.org**: Computer science and machine learning papers (heritage AI applications)
- Format: `https://arxiv.org/abs/YYMM.NNNNN` (e.g., `2411.12345`)
- DOI: `10.48550/arXiv.YYMM.NNNNN`
- **OSF/SocArXiv**: Library and information science preprints
- Format: `https://osf.io/preprints/socarxiv/[alphanumeric]` (e.g., `abc12`)
- DOI: `10.31235/osf.io/[alphanumeric]`
- **bioRxiv**: Biology and genetics papers (museum genomics, conservation)
- Format: `https://www.biorxiv.org/content/10.1101/YYYY.MM.DD.NNNNNN`
- DOI: `10.1101/YYYY.MM.DD.NNNNNN` (date-based)
**Schema patterns demonstrated**:
- Book metadata (`publication_type: BOOK`)
- Book chapter with `is_part_of` relationship to parent volume
- Technical reports as grey literature from heritage organizations
- Preprint metadata with server identifiers (arXiv ID, OSF ID, bioRxiv ID)
- Pre-publication date tracking vs. official publication date
---
### Integration Patterns from Diverse Publications
**Pattern 6: Books by Heritage Institution Staff**
Example: Digital Preservation Handbook authored by Digital Preservation Coalition staff
```yaml
authors:
- person_name: "Sarah Jones"
affiliation:
organization_id: https://w3id.org/heritage/organization/digital-preservation-coalition
organization_name: "Digital Preservation Coalition"
organization_type: "Heritage consortium"
publication_type: BOOK
```
**Pattern 7: Technical Reports as Organizational Documentation**
Example: KB 3D Digitization Report documenting institutional digitization workflows
```yaml
publication_type: TECHNICAL_REPORT
authors:
- person_name: "Erik Vermeulen"
affiliation:
organization_id: https://w3id.org/heritage/custodian/nl/kb-national-library
organization_name: "Koninklijke Bibliotheek"
description: "Grey literature documenting internal digitization practices"
```
**Pattern 8: Preprints Before Formal Publication**
Example: Machine learning research using heritage data published on arXiv
```yaml
publication_type: PREPRINT
preprint_server: arXiv
arxiv_id: "2411.12345"
doi: "10.48550/arXiv.2411.12345"
description: "Early research results, may be updated before journal submission"
```
**Pattern 9: Book Chapters in Edited Volumes**
Example: Crowdsourcing chapter within larger library science anthology
```yaml
publication_type: BOOK_CHAPTER
is_part_of: "Digital Innovations in Libraries"
editors:
- "Jane Smith"
- "Robert Brown"
page_range: "145-168"
```
---
### Additional Integration Patterns (Future)
**Pattern 6: Publications About Specific Collections** (not yet implemented)
When a paper describes a heritage collection:
```yaml
# Future schema extension
about_collections:
- collection_id: https://w3id.org/heritage/collection/rijksmuseum-paintings
collection_name: "Rijksmuseum Paintings Collection"
collection_institution: https://w3id.org/heritage/custodian/nl/rijksmuseum
```
**Pattern 7: Data Papers Describing Heritage Datasets** (partially implemented)
When publications document heritage datasets:
```yaml
publication_type: DATASET # Already used in lokg-brazilian-subset-2024
# Future: Add describes_dataset field
describes_dataset:
- dataset_id: https://w3id.org/heritage/dataset/brazilian-lokg
dataset_name: "Brazilian Heritage Institutions Linked Open Data"
related_institutions:
- https://w3id.org/heritage/custodian/br/biblioteca-nacional-brasil
```
## Citation Analysis Queries
### Find Most Cited Publications
```python
from collections import Counter
citations = load_yaml('citation_relationships.yaml')
cited_counts = Counter(c['cited_work'] for c in citations)
print("Most cited publications:")
for pub_id, count in cited_counts.most_common():
print(f" {pub_id}: {count} citations")
```
### Build Citation Network
```python
import networkx as nx
G = nx.DiGraph()
for citation in citations:
G.add_edge(citation['citing_work'], citation['cited_work'],
citation_type=citation['citation_type'])
# Find influential papers (high in-degree)
influential = sorted(G.in_degree(), key=lambda x: x[1], reverse=True)
```
### Analyze Citation Types
```python
citation_types = Counter(c['citation_type'] for c in citations)
print("Citation type distribution:")
for ctype, count in citation_types.items():
print(f" {ctype}: {count}")
```
## Related Documentation
- **Schema**: `/schemas/bibliographic.yaml` - Full LinkML schema for bibliographic entities
- **Ontologies**:
- FaBiO (FRBR-aligned Bibliographic Ontology) - Publication modeling
- CiTO (Citation Typing Ontology) - Citation relationships
- BIBO (Bibliographic Ontology) - Bibliographic resources
- FRBR (Functional Requirements for Bibliographic Records) - Work/expression/manifestation
- **Test Fixtures**: `/tests/fixtures/publications/` - Validation examples
- **Schema Documentation**: `/docs/BIBLIOGRAPHIC_SCHEMA.md` (if exists)
## Future Enhancements
### Short-term (Next Session)
- [x]**COMPLETED**: Add publications linked to heritage institutions (5 added)
- [x]**COMPLETED**: Create citation relationships for heritage-linked pubs (8 citations added)
- [x]**COMPLETED**: Document 5 integration patterns
- [x]**COMPLETED**: Add more diverse publication types (books, book chapters, technical reports) - 10 added
- [x]**COMPLETED**: Add preprints (arXiv, bioRxiv, OSF/SocArxiv) - 4 added
- [x]**COMPLETED**: Add more cultural heritage domain papers (digital preservation, archival science) - included in diverse set
- [x]**COMPLETED**: Create 12 additional citation relationships linking diverse publications (27 total citations)
- [x]**COMPLETED**: Document preprint server patterns (arXiv, SocArXiv, bioRxiv)
- [x]**COMPLETED**: Document 4 additional integration patterns (6-9: books, technical reports, preprints, chapters)
- [ ] Create author disambiguation examples (same person with multiple IDs/ORCIDs)
- [ ] Add thesis/dissertation examples
- [ ] Add working papers (pre-publication research from institutions)
### Medium-term
- [ ] Author disambiguation (same person, multiple IDs)
- [ ] Keyword/subject term extraction
- [ ] Funding information (grants, sponsors)
- [ ] Publication metrics (citation counts from Crossref, Semantic Scholar)
- [ ] Full-text links (PDFs, preprints)
### Long-term
- [ ] RDF export (Turtle, JSON-LD)
- [ ] SPARQL endpoint for citation queries
- [ ] Bibliometric analysis dashboard
- [ ] Integration with Wikidata (author Q-numbers)
- [ ] Citation recommendation system
- [ ] Co-authorship network analysis
## Questions or Issues?
If you encounter validation errors or schema confusion:
1. Check the "Schema Quirks" section above
2. Review validated examples in `semantic_web_papers.yaml`
3. Consult test fixtures in `/tests/fixtures/publications/`
4. Read schema documentation in `/schemas/bibliographic.yaml` (inline comments)
5. File an issue or consult AI agent instructions in `/AGENTS.md`
---
**Last Updated**: 2025-11-09
**Schema Version**: bibliographic.yaml v0.2.0
**Dataset Version**: 0.3.0 (20 publications, 27 citations, 9 integration patterns demonstrated)