6.5 KiB
Next Session Handoff
Last Updated: 2025-12-16
Current Focus: Schema development complete - Video, LinkedIn/Person, SocialMedia schemas committed
📦 Schema Work Completed This Session (2025-12-16) ✅
Commits Made
1. Video Content Schema (10 files, 7,250 insertions)
Commit: 3991751c78
Classes (9 files):
VideoPost,VideoComment- Social media video modelingVideoTextContent- Base class for text content extractionVideoTranscript,VideoSubtitle- Text with timing and formattingVideoTimeSegment- Time code handling with ISO 8601 durationVideoAnnotation- Base annotation with W3C Web Annotation alignmentVideoAnnotationTypes- Scene, Object, OCR detection annotationsVideoChapter,VideoChapterList- Navigation and chapter structureVideoAudioAnnotation- Speaker diarization, music, sound events
Enumerations (12):
VideoDefinitionEnum,LiveBroadcastStatusEnumTranscriptFormatEnum,SubtitleFormatEnum,SubtitlePositionEnumAnnotationTypeEnum,AnnotationMotivationEnumDetectionLevelEnum,SceneTypeEnum,TransitionTypeEnum,TextTypeEnumChapterSourceEnum,AudioEventTypeEnum,SoundEventTypeEnum,MusicTypeEnum
Examples (904 lines, 10 heritage-themed examples):
- Rijksmuseum virtual tour chapters (5 chapters with Wikidata refs)
- Operation Night Watch documentary chapters (5 chapters)
- VideoAudioAnnotation: curator interview, exhibition promo, museum lecture
2. LinkedIn Profile & Person Modeling (27 files, 4,369 insertions)
Commit: f30f39d93e
Classes (9):
PersonName- Dutch naming conventions (surname_prefix, patronym, etc.)PersonConnection- Professional network with heritage relevanceConnectionNetwork- Network-level analysis and statisticsLinkedInProfile- Complete professional profile structureWorkExperience- Employment history with heritage institution detectionEducationCredential- Academic background and qualificationsLanguageProficiency- Language skills with ISO 639-1 codesExtractionMetadata- Provenance tracking for extracted dataHeritageRelevance- GLAMORCUBESFIXPHDNT type scoring
Slots (17):
- Name:
given_name,base_surname,surname_prefix,patronym,initials - Identity:
age,birth_date,birth_place,death_place,gender_identity,pronouns - Professional:
occupation,religion
3. Social Media Post Schema (4 files, 2,280 insertions)
Commit: 3b05ace16f
Classes:
SocialMediaPost- Platform-agnostic post modelingSocialMediaPostType,SocialMediaPostTypes- Post type taxonomySocialMediaContent- Rich content with media, hashtags, mentions
4. Schema Annotation Fix (51 files, 451 insertions)
Commit: 14e7c13d41
Fixed YAML quoting for custodian_types annotations across all class files:
- Before:
custodian_types: ["A", "G"] - After:
custodian_types: '["A", "G"]'
📊 Remaining Uncommitted Changes
Summary
| Category | Modified | Untracked | Total |
|---|---|---|---|
| data/custodian/ | 380 | 1,379 | 1,759 |
| frontend/ | 66 | 50 | 116 |
| scripts/ | 0 | 9 | 9 |
| Other | 3 | 0 | 3 |
| TOTAL | 449 | 1,438 | 1,887 |
Untracked Custodian Files (~1,379)
New Dutch custodian YAML files created from enrichment pipelines - need review before committing.
Modified Custodian Files (~380)
Enrichment updates to existing custodian records (digital platforms, web claims, etc.)
New Scripts (9)
LinkedIn integration and enrichment scripts:
scripts/build_linkedin_index.pyscripts/extract_about_page_data.pyscripts/extract_timeline_events.pyscripts/generate_linkedin_custodian_yaml.pyscripts/match_linkedin_by_name.pyscripts/match_linkedin_by_name_fast.pyscripts/match_linkedin_names_ultra.pyscripts/merge_linkedin_to_custodians.pyscripts/verify_website_links.py
🎯 Priority Next Steps
Option 1: Commit Custodian Data Batch
# Review a sample of changes
cd /Users/kempersc/apps/glam
git diff data/custodian/NL-DR-ASS-A-DA.yaml | head -50
# Stage and commit in batches by province
git add data/custodian/NL-DR-*.yaml
git commit -m "data(NL-DR): Enrich Drenthe custodians with digital platforms"
Option 2: Commit LinkedIn Scripts
git add scripts/*linkedin*.py scripts/extract_*.py scripts/verify_*.py
git commit -m "feat(scripts): Add LinkedIn profile extraction and matching"
Option 3: Continue with Country-Specific Work
- Czech Republic: ISIL code investigation (Task 6)
- Argentina: IRAM email + LinkML export
- Netherlands: GHCID generation for new custodians
🔗 Schema Version Status
Current Version: v0.9.10 (post-today's commits)
New Schema Components
| Module | Classes | Enums | Slots |
|---|---|---|---|
| Video | 14 | 12 | - |
| LinkedIn/Person | 9 | 1 | 17 |
| SocialMedia | 4 | - | - |
Total Schema (estimated)
- Classes: ~150+
- Enums: ~60+
- Slots: ~200+
🗂️ Key File Locations
Schema Files (committed today)
schemas/20251121/linkml/modules/classes/Video*.yaml (9 files)
schemas/20251121/linkml/modules/classes/PersonName.yaml
schemas/20251121/linkml/modules/classes/LinkedInProfile.yaml
schemas/20251121/linkml/modules/classes/WorkExperience.yaml
schemas/20251121/linkml/modules/classes/SocialMedia*.yaml (4 files)
schemas/20251121/linkml/examples/video_content_examples.yaml
Uncommitted Work
data/custodian/*.yaml (1,759 files)
frontend/public/* (51 files)
frontend/src/* (15 files)
scripts/*linkedin*.py (9 scripts)
🇨🇿 Czech Republic Status (Previous Session)
Completed
- ✅ ARON Metadata Analysis (no contact data)
- ✅ Wikidata Enrichment (77.3% coverage, 6,719 matches)
- ✅ Dataset #1 globally (8,694 institutions)
Pending
- 🔲 Task 6: ISIL code investigation
🇦🇷 Argentina Status
Completed
- ✅ CONABIP Libraries (288 scraped + enriched)
- ✅ AGN national archive scraped
- ✅ Email drafts ready
Pending
- 🔲 Send IRAM email for ISIL registry
- 🔲 LinkML export of CONABIP data
Quick Commands
# View recent commits
git log --oneline -10
# Check uncommitted changes
git status --short | wc -l
# Review custodian changes
git diff data/custodian/NL-DR-ASS-A-DA.yaml
# Validate schema
linkml-validate schemas/20251121/linkml/modules/classes/VideoPost.yaml
Session End: 2025-12-16 Next Action: Choose between committing custodian data, LinkedIn scripts, or continuing country work