214 lines
6.5 KiB
Markdown
214 lines
6.5 KiB
Markdown
# Next Session Handoff
|
|
|
|
**Last Updated**: 2025-12-16
|
|
**Current Focus**: Schema development complete - Video, LinkedIn/Person, SocialMedia schemas committed
|
|
|
|
---
|
|
|
|
## 📦 Schema Work Completed This Session (2025-12-16) ✅
|
|
|
|
### Commits Made
|
|
|
|
#### 1. Video Content Schema (10 files, 7,250 insertions)
|
|
**Commit**: `3991751c78`
|
|
|
|
**Classes (9 files)**:
|
|
- `VideoPost`, `VideoComment` - Social media video modeling
|
|
- `VideoTextContent` - Base class for text content extraction
|
|
- `VideoTranscript`, `VideoSubtitle` - Text with timing and formatting
|
|
- `VideoTimeSegment` - Time code handling with ISO 8601 duration
|
|
- `VideoAnnotation` - Base annotation with W3C Web Annotation alignment
|
|
- `VideoAnnotationTypes` - Scene, Object, OCR detection annotations
|
|
- `VideoChapter`, `VideoChapterList` - Navigation and chapter structure
|
|
- `VideoAudioAnnotation` - Speaker diarization, music, sound events
|
|
|
|
**Enumerations (12)**:
|
|
- `VideoDefinitionEnum`, `LiveBroadcastStatusEnum`
|
|
- `TranscriptFormatEnum`, `SubtitleFormatEnum`, `SubtitlePositionEnum`
|
|
- `AnnotationTypeEnum`, `AnnotationMotivationEnum`
|
|
- `DetectionLevelEnum`, `SceneTypeEnum`, `TransitionTypeEnum`, `TextTypeEnum`
|
|
- `ChapterSourceEnum`, `AudioEventTypeEnum`, `SoundEventTypeEnum`, `MusicTypeEnum`
|
|
|
|
**Examples** (904 lines, 10 heritage-themed examples):
|
|
- Rijksmuseum virtual tour chapters (5 chapters with Wikidata refs)
|
|
- Operation Night Watch documentary chapters (5 chapters)
|
|
- VideoAudioAnnotation: curator interview, exhibition promo, museum lecture
|
|
|
|
#### 2. LinkedIn Profile & Person Modeling (27 files, 4,369 insertions)
|
|
**Commit**: `f30f39d93e`
|
|
|
|
**Classes (9)**:
|
|
- `PersonName` - Dutch naming conventions (surname_prefix, patronym, etc.)
|
|
- `PersonConnection` - Professional network with heritage relevance
|
|
- `ConnectionNetwork` - Network-level analysis and statistics
|
|
- `LinkedInProfile` - Complete professional profile structure
|
|
- `WorkExperience` - Employment history with heritage institution detection
|
|
- `EducationCredential` - Academic background and qualifications
|
|
- `LanguageProficiency` - Language skills with ISO 639-1 codes
|
|
- `ExtractionMetadata` - Provenance tracking for extracted data
|
|
- `HeritageRelevance` - GLAMORCUBESFIXPHDNT type scoring
|
|
|
|
**Slots (17)**:
|
|
- Name: `given_name`, `base_surname`, `surname_prefix`, `patronym`, `initials`
|
|
- Identity: `age`, `birth_date`, `birth_place`, `death_place`, `gender_identity`, `pronouns`
|
|
- Professional: `occupation`, `religion`
|
|
|
|
#### 3. Social Media Post Schema (4 files, 2,280 insertions)
|
|
**Commit**: `3b05ace16f`
|
|
|
|
**Classes**:
|
|
- `SocialMediaPost` - Platform-agnostic post modeling
|
|
- `SocialMediaPostType`, `SocialMediaPostTypes` - Post type taxonomy
|
|
- `SocialMediaContent` - Rich content with media, hashtags, mentions
|
|
|
|
#### 4. Schema Annotation Fix (51 files, 451 insertions)
|
|
**Commit**: `14e7c13d41`
|
|
|
|
Fixed YAML quoting for `custodian_types` annotations across all class files:
|
|
- Before: `custodian_types: ["A", "G"]`
|
|
- After: `custodian_types: '["A", "G"]'`
|
|
|
|
---
|
|
|
|
## 📊 Remaining Uncommitted Changes
|
|
|
|
### Summary
|
|
| Category | Modified | Untracked | Total |
|
|
|----------|----------|-----------|-------|
|
|
| **data/custodian/** | 380 | 1,379 | 1,759 |
|
|
| **frontend/** | 66 | 50 | 116 |
|
|
| **scripts/** | 0 | 9 | 9 |
|
|
| **Other** | 3 | 0 | 3 |
|
|
| **TOTAL** | 449 | 1,438 | 1,887 |
|
|
|
|
### Untracked Custodian Files (~1,379)
|
|
New Dutch custodian YAML files created from enrichment pipelines - need review before committing.
|
|
|
|
### Modified Custodian Files (~380)
|
|
Enrichment updates to existing custodian records (digital platforms, web claims, etc.)
|
|
|
|
### New Scripts (9)
|
|
LinkedIn integration and enrichment scripts:
|
|
- `scripts/build_linkedin_index.py`
|
|
- `scripts/extract_about_page_data.py`
|
|
- `scripts/extract_timeline_events.py`
|
|
- `scripts/generate_linkedin_custodian_yaml.py`
|
|
- `scripts/match_linkedin_by_name.py`
|
|
- `scripts/match_linkedin_by_name_fast.py`
|
|
- `scripts/match_linkedin_names_ultra.py`
|
|
- `scripts/merge_linkedin_to_custodians.py`
|
|
- `scripts/verify_website_links.py`
|
|
|
|
---
|
|
|
|
## 🎯 Priority Next Steps
|
|
|
|
### Option 1: Commit Custodian Data Batch
|
|
```bash
|
|
# Review a sample of changes
|
|
cd /Users/kempersc/apps/glam
|
|
git diff data/custodian/NL-DR-ASS-A-DA.yaml | head -50
|
|
|
|
# Stage and commit in batches by province
|
|
git add data/custodian/NL-DR-*.yaml
|
|
git commit -m "data(NL-DR): Enrich Drenthe custodians with digital platforms"
|
|
```
|
|
|
|
### Option 2: Commit LinkedIn Scripts
|
|
```bash
|
|
git add scripts/*linkedin*.py scripts/extract_*.py scripts/verify_*.py
|
|
git commit -m "feat(scripts): Add LinkedIn profile extraction and matching"
|
|
```
|
|
|
|
### Option 3: Continue with Country-Specific Work
|
|
- Czech Republic: ISIL code investigation (Task 6)
|
|
- Argentina: IRAM email + LinkML export
|
|
- Netherlands: GHCID generation for new custodians
|
|
|
|
---
|
|
|
|
## 🔗 Schema Version Status
|
|
|
|
**Current Version**: v0.9.10 (post-today's commits)
|
|
|
|
### New Schema Components
|
|
| Module | Classes | Enums | Slots |
|
|
|--------|---------|-------|-------|
|
|
| **Video** | 14 | 12 | - |
|
|
| **LinkedIn/Person** | 9 | 1 | 17 |
|
|
| **SocialMedia** | 4 | - | - |
|
|
|
|
### Total Schema (estimated)
|
|
- Classes: ~150+
|
|
- Enums: ~60+
|
|
- Slots: ~200+
|
|
|
|
---
|
|
|
|
## 🗂️ Key File Locations
|
|
|
|
### Schema Files (committed today)
|
|
```
|
|
schemas/20251121/linkml/modules/classes/Video*.yaml (9 files)
|
|
schemas/20251121/linkml/modules/classes/PersonName.yaml
|
|
schemas/20251121/linkml/modules/classes/LinkedInProfile.yaml
|
|
schemas/20251121/linkml/modules/classes/WorkExperience.yaml
|
|
schemas/20251121/linkml/modules/classes/SocialMedia*.yaml (4 files)
|
|
schemas/20251121/linkml/examples/video_content_examples.yaml
|
|
```
|
|
|
|
### Uncommitted Work
|
|
```
|
|
data/custodian/*.yaml (1,759 files)
|
|
frontend/public/* (51 files)
|
|
frontend/src/* (15 files)
|
|
scripts/*linkedin*.py (9 scripts)
|
|
```
|
|
|
|
---
|
|
|
|
## 🇨🇿 Czech Republic Status (Previous Session)
|
|
|
|
### Completed
|
|
- ✅ ARON Metadata Analysis (no contact data)
|
|
- ✅ Wikidata Enrichment (77.3% coverage, 6,719 matches)
|
|
- ✅ Dataset #1 globally (8,694 institutions)
|
|
|
|
### Pending
|
|
- 🔲 Task 6: ISIL code investigation
|
|
|
|
---
|
|
|
|
## 🇦🇷 Argentina Status
|
|
|
|
### Completed
|
|
- ✅ CONABIP Libraries (288 scraped + enriched)
|
|
- ✅ AGN national archive scraped
|
|
- ✅ Email drafts ready
|
|
|
|
### Pending
|
|
- 🔲 Send IRAM email for ISIL registry
|
|
- 🔲 LinkML export of CONABIP data
|
|
|
|
---
|
|
|
|
## Quick Commands
|
|
|
|
```bash
|
|
# View recent commits
|
|
git log --oneline -10
|
|
|
|
# Check uncommitted changes
|
|
git status --short | wc -l
|
|
|
|
# Review custodian changes
|
|
git diff data/custodian/NL-DR-ASS-A-DA.yaml
|
|
|
|
# Validate schema
|
|
linkml-validate schemas/20251121/linkml/modules/classes/VideoPost.yaml
|
|
```
|
|
|
|
---
|
|
|
|
**Session End**: 2025-12-16
|
|
**Next Action**: Choose between committing custodian data, LinkedIn scripts, or continuing country work
|