glam/NEXT_SESSION_HANDOFF.md
2025-12-16 20:27:39 +01:00

214 lines
6.5 KiB
Markdown

# Next Session Handoff
**Last Updated**: 2025-12-16
**Current Focus**: Schema development complete - Video, LinkedIn/Person, SocialMedia schemas committed
---
## 📦 Schema Work Completed This Session (2025-12-16) ✅
### Commits Made
#### 1. Video Content Schema (10 files, 7,250 insertions)
**Commit**: `3991751c78`
**Classes (9 files)**:
- `VideoPost`, `VideoComment` - Social media video modeling
- `VideoTextContent` - Base class for text content extraction
- `VideoTranscript`, `VideoSubtitle` - Text with timing and formatting
- `VideoTimeSegment` - Time code handling with ISO 8601 duration
- `VideoAnnotation` - Base annotation with W3C Web Annotation alignment
- `VideoAnnotationTypes` - Scene, Object, OCR detection annotations
- `VideoChapter`, `VideoChapterList` - Navigation and chapter structure
- `VideoAudioAnnotation` - Speaker diarization, music, sound events
**Enumerations (12)**:
- `VideoDefinitionEnum`, `LiveBroadcastStatusEnum`
- `TranscriptFormatEnum`, `SubtitleFormatEnum`, `SubtitlePositionEnum`
- `AnnotationTypeEnum`, `AnnotationMotivationEnum`
- `DetectionLevelEnum`, `SceneTypeEnum`, `TransitionTypeEnum`, `TextTypeEnum`
- `ChapterSourceEnum`, `AudioEventTypeEnum`, `SoundEventTypeEnum`, `MusicTypeEnum`
**Examples** (904 lines, 10 heritage-themed examples):
- Rijksmuseum virtual tour chapters (5 chapters with Wikidata refs)
- Operation Night Watch documentary chapters (5 chapters)
- VideoAudioAnnotation: curator interview, exhibition promo, museum lecture
#### 2. LinkedIn Profile & Person Modeling (27 files, 4,369 insertions)
**Commit**: `f30f39d93e`
**Classes (9)**:
- `PersonName` - Dutch naming conventions (surname_prefix, patronym, etc.)
- `PersonConnection` - Professional network with heritage relevance
- `ConnectionNetwork` - Network-level analysis and statistics
- `LinkedInProfile` - Complete professional profile structure
- `WorkExperience` - Employment history with heritage institution detection
- `EducationCredential` - Academic background and qualifications
- `LanguageProficiency` - Language skills with ISO 639-1 codes
- `ExtractionMetadata` - Provenance tracking for extracted data
- `HeritageRelevance` - GLAMORCUBESFIXPHDNT type scoring
**Slots (17)**:
- Name: `given_name`, `base_surname`, `surname_prefix`, `patronym`, `initials`
- Identity: `age`, `birth_date`, `birth_place`, `death_place`, `gender_identity`, `pronouns`
- Professional: `occupation`, `religion`
#### 3. Social Media Post Schema (4 files, 2,280 insertions)
**Commit**: `3b05ace16f`
**Classes**:
- `SocialMediaPost` - Platform-agnostic post modeling
- `SocialMediaPostType`, `SocialMediaPostTypes` - Post type taxonomy
- `SocialMediaContent` - Rich content with media, hashtags, mentions
#### 4. Schema Annotation Fix (51 files, 451 insertions)
**Commit**: `14e7c13d41`
Fixed YAML quoting for `custodian_types` annotations across all class files:
- Before: `custodian_types: ["A", "G"]`
- After: `custodian_types: '["A", "G"]'`
---
## 📊 Remaining Uncommitted Changes
### Summary
| Category | Modified | Untracked | Total |
|----------|----------|-----------|-------|
| **data/custodian/** | 380 | 1,379 | 1,759 |
| **frontend/** | 66 | 50 | 116 |
| **scripts/** | 0 | 9 | 9 |
| **Other** | 3 | 0 | 3 |
| **TOTAL** | 449 | 1,438 | 1,887 |
### Untracked Custodian Files (~1,379)
New Dutch custodian YAML files created from enrichment pipelines - need review before committing.
### Modified Custodian Files (~380)
Enrichment updates to existing custodian records (digital platforms, web claims, etc.)
### New Scripts (9)
LinkedIn integration and enrichment scripts:
- `scripts/build_linkedin_index.py`
- `scripts/extract_about_page_data.py`
- `scripts/extract_timeline_events.py`
- `scripts/generate_linkedin_custodian_yaml.py`
- `scripts/match_linkedin_by_name.py`
- `scripts/match_linkedin_by_name_fast.py`
- `scripts/match_linkedin_names_ultra.py`
- `scripts/merge_linkedin_to_custodians.py`
- `scripts/verify_website_links.py`
---
## 🎯 Priority Next Steps
### Option 1: Commit Custodian Data Batch
```bash
# Review a sample of changes
cd /Users/kempersc/apps/glam
git diff data/custodian/NL-DR-ASS-A-DA.yaml | head -50
# Stage and commit in batches by province
git add data/custodian/NL-DR-*.yaml
git commit -m "data(NL-DR): Enrich Drenthe custodians with digital platforms"
```
### Option 2: Commit LinkedIn Scripts
```bash
git add scripts/*linkedin*.py scripts/extract_*.py scripts/verify_*.py
git commit -m "feat(scripts): Add LinkedIn profile extraction and matching"
```
### Option 3: Continue with Country-Specific Work
- Czech Republic: ISIL code investigation (Task 6)
- Argentina: IRAM email + LinkML export
- Netherlands: GHCID generation for new custodians
---
## 🔗 Schema Version Status
**Current Version**: v0.9.10 (post-today's commits)
### New Schema Components
| Module | Classes | Enums | Slots |
|--------|---------|-------|-------|
| **Video** | 14 | 12 | - |
| **LinkedIn/Person** | 9 | 1 | 17 |
| **SocialMedia** | 4 | - | - |
### Total Schema (estimated)
- Classes: ~150+
- Enums: ~60+
- Slots: ~200+
---
## 🗂️ Key File Locations
### Schema Files (committed today)
```
schemas/20251121/linkml/modules/classes/Video*.yaml (9 files)
schemas/20251121/linkml/modules/classes/PersonName.yaml
schemas/20251121/linkml/modules/classes/LinkedInProfile.yaml
schemas/20251121/linkml/modules/classes/WorkExperience.yaml
schemas/20251121/linkml/modules/classes/SocialMedia*.yaml (4 files)
schemas/20251121/linkml/examples/video_content_examples.yaml
```
### Uncommitted Work
```
data/custodian/*.yaml (1,759 files)
frontend/public/* (51 files)
frontend/src/* (15 files)
scripts/*linkedin*.py (9 scripts)
```
---
## 🇨🇿 Czech Republic Status (Previous Session)
### Completed
- ✅ ARON Metadata Analysis (no contact data)
- ✅ Wikidata Enrichment (77.3% coverage, 6,719 matches)
- ✅ Dataset #1 globally (8,694 institutions)
### Pending
- 🔲 Task 6: ISIL code investigation
---
## 🇦🇷 Argentina Status
### Completed
- ✅ CONABIP Libraries (288 scraped + enriched)
- ✅ AGN national archive scraped
- ✅ Email drafts ready
### Pending
- 🔲 Send IRAM email for ISIL registry
- 🔲 LinkML export of CONABIP data
---
## Quick Commands
```bash
# View recent commits
git log --oneline -10
# Check uncommitted changes
git status --short | wc -l
# Review custodian changes
git diff data/custodian/NL-DR-ASS-A-DA.yaml
# Validate schema
linkml-validate schemas/20251121/linkml/modules/classes/VideoPost.yaml
```
---
**Session End**: 2025-12-16
**Next Action**: Choose between committing custodian data, LinkedIn scripts, or continuing country work