glam/NEXT_SESSION_HANDOFF.md
2025-12-16 20:27:39 +01:00

6.5 KiB

Next Session Handoff

Last Updated: 2025-12-16
Current Focus: Schema development complete - Video, LinkedIn/Person, SocialMedia schemas committed


📦 Schema Work Completed This Session (2025-12-16)

Commits Made

1. Video Content Schema (10 files, 7,250 insertions)

Commit: 3991751c78

Classes (9 files):

  • VideoPost, VideoComment - Social media video modeling
  • VideoTextContent - Base class for text content extraction
  • VideoTranscript, VideoSubtitle - Text with timing and formatting
  • VideoTimeSegment - Time code handling with ISO 8601 duration
  • VideoAnnotation - Base annotation with W3C Web Annotation alignment
  • VideoAnnotationTypes - Scene, Object, OCR detection annotations
  • VideoChapter, VideoChapterList - Navigation and chapter structure
  • VideoAudioAnnotation - Speaker diarization, music, sound events

Enumerations (12):

  • VideoDefinitionEnum, LiveBroadcastStatusEnum
  • TranscriptFormatEnum, SubtitleFormatEnum, SubtitlePositionEnum
  • AnnotationTypeEnum, AnnotationMotivationEnum
  • DetectionLevelEnum, SceneTypeEnum, TransitionTypeEnum, TextTypeEnum
  • ChapterSourceEnum, AudioEventTypeEnum, SoundEventTypeEnum, MusicTypeEnum

Examples (904 lines, 10 heritage-themed examples):

  • Rijksmuseum virtual tour chapters (5 chapters with Wikidata refs)
  • Operation Night Watch documentary chapters (5 chapters)
  • VideoAudioAnnotation: curator interview, exhibition promo, museum lecture

2. LinkedIn Profile & Person Modeling (27 files, 4,369 insertions)

Commit: f30f39d93e

Classes (9):

  • PersonName - Dutch naming conventions (surname_prefix, patronym, etc.)
  • PersonConnection - Professional network with heritage relevance
  • ConnectionNetwork - Network-level analysis and statistics
  • LinkedInProfile - Complete professional profile structure
  • WorkExperience - Employment history with heritage institution detection
  • EducationCredential - Academic background and qualifications
  • LanguageProficiency - Language skills with ISO 639-1 codes
  • ExtractionMetadata - Provenance tracking for extracted data
  • HeritageRelevance - GLAMORCUBESFIXPHDNT type scoring

Slots (17):

  • Name: given_name, base_surname, surname_prefix, patronym, initials
  • Identity: age, birth_date, birth_place, death_place, gender_identity, pronouns
  • Professional: occupation, religion

3. Social Media Post Schema (4 files, 2,280 insertions)

Commit: 3b05ace16f

Classes:

  • SocialMediaPost - Platform-agnostic post modeling
  • SocialMediaPostType, SocialMediaPostTypes - Post type taxonomy
  • SocialMediaContent - Rich content with media, hashtags, mentions

4. Schema Annotation Fix (51 files, 451 insertions)

Commit: 14e7c13d41

Fixed YAML quoting for custodian_types annotations across all class files:

  • Before: custodian_types: ["A", "G"]
  • After: custodian_types: '["A", "G"]'

📊 Remaining Uncommitted Changes

Summary

Category Modified Untracked Total
data/custodian/ 380 1,379 1,759
frontend/ 66 50 116
scripts/ 0 9 9
Other 3 0 3
TOTAL 449 1,438 1,887

Untracked Custodian Files (~1,379)

New Dutch custodian YAML files created from enrichment pipelines - need review before committing.

Modified Custodian Files (~380)

Enrichment updates to existing custodian records (digital platforms, web claims, etc.)

New Scripts (9)

LinkedIn integration and enrichment scripts:

  • scripts/build_linkedin_index.py
  • scripts/extract_about_page_data.py
  • scripts/extract_timeline_events.py
  • scripts/generate_linkedin_custodian_yaml.py
  • scripts/match_linkedin_by_name.py
  • scripts/match_linkedin_by_name_fast.py
  • scripts/match_linkedin_names_ultra.py
  • scripts/merge_linkedin_to_custodians.py
  • scripts/verify_website_links.py

🎯 Priority Next Steps

Option 1: Commit Custodian Data Batch

# Review a sample of changes
cd /Users/kempersc/apps/glam
git diff data/custodian/NL-DR-ASS-A-DA.yaml | head -50

# Stage and commit in batches by province
git add data/custodian/NL-DR-*.yaml
git commit -m "data(NL-DR): Enrich Drenthe custodians with digital platforms"

Option 2: Commit LinkedIn Scripts

git add scripts/*linkedin*.py scripts/extract_*.py scripts/verify_*.py
git commit -m "feat(scripts): Add LinkedIn profile extraction and matching"

Option 3: Continue with Country-Specific Work

  • Czech Republic: ISIL code investigation (Task 6)
  • Argentina: IRAM email + LinkML export
  • Netherlands: GHCID generation for new custodians

🔗 Schema Version Status

Current Version: v0.9.10 (post-today's commits)

New Schema Components

Module Classes Enums Slots
Video 14 12 -
LinkedIn/Person 9 1 17
SocialMedia 4 - -

Total Schema (estimated)

  • Classes: ~150+
  • Enums: ~60+
  • Slots: ~200+

🗂️ Key File Locations

Schema Files (committed today)

schemas/20251121/linkml/modules/classes/Video*.yaml (9 files)
schemas/20251121/linkml/modules/classes/PersonName.yaml
schemas/20251121/linkml/modules/classes/LinkedInProfile.yaml
schemas/20251121/linkml/modules/classes/WorkExperience.yaml
schemas/20251121/linkml/modules/classes/SocialMedia*.yaml (4 files)
schemas/20251121/linkml/examples/video_content_examples.yaml

Uncommitted Work

data/custodian/*.yaml (1,759 files)
frontend/public/* (51 files)
frontend/src/* (15 files)
scripts/*linkedin*.py (9 scripts)

🇨🇿 Czech Republic Status (Previous Session)

Completed

  • ARON Metadata Analysis (no contact data)
  • Wikidata Enrichment (77.3% coverage, 6,719 matches)
  • Dataset #1 globally (8,694 institutions)

Pending

  • 🔲 Task 6: ISIL code investigation

🇦🇷 Argentina Status

Completed

  • CONABIP Libraries (288 scraped + enriched)
  • AGN national archive scraped
  • Email drafts ready

Pending

  • 🔲 Send IRAM email for ISIL registry
  • 🔲 LinkML export of CONABIP data

Quick Commands

# View recent commits
git log --oneline -10

# Check uncommitted changes
git status --short | wc -l

# Review custodian changes
git diff data/custodian/NL-DR-ASS-A-DA.yaml

# Validate schema
linkml-validate schemas/20251121/linkml/modules/classes/VideoPost.yaml

Session End: 2025-12-16 Next Action: Choose between committing custodian data, LinkedIn scripts, or continuing country work