1.7 KiB
1.7 KiB
V5 Extraction - Quick Reference
Status: ✅ 75% PRECISION ACHIEVED
Architecture
Conversation Text → Subagent NER → V5 Validation → Clean Institutions (75% precision)
What Works
- ✅ Subagent NER: Clean, accurate names (no mangling)
- ✅ V5 Validation: 3 filters (country, organization, proper name)
- ✅ 75% precision: 3/4 correct (up from V4's 50%)
What Doesn't Work
- ❌ Pattern-based extraction: 0% precision (names mangled)
Commands
Run V5 demonstration:
bash /Users/kempersc/apps/glam/scripts/demo_v5_success.sh
Test subagent + V5 integration:
python /Users/kempersc/apps/glam/scripts/test_subagent_v5_integration.py
Subagent NER Prompt Template
Extract ALL heritage institutions from the following text.
Return JSON array with:
{
"name": "Full institution name",
"institution_type": "MUSEUM | ARCHIVE | LIBRARY | GALLERY",
"city": "City name",
"country": "2-letter ISO code",
"isil_code": "ISIL code if mentioned",
"confidence": 0.0-1.0
}
Rules:
1. Preserve full names (e.g., "Van Abbemuseum", not "Abbemuseum")
2. Classify by primary function
3. Determine country from city names or context
4. Exclude: organizations, networks, generic descriptors
Next Steps for Production
- Implement
extract_from_text_subagent()inInstitutionExtractor - Update batch extraction scripts
- Process 139 conversation files
Files
- Documentation:
output/V5_VALIDATION_SUMMARY.md - Session Summary:
SESSION_SUMMARY_V5.md - Test Scripts:
scripts/test_subagent_v5_integration.py - Demo:
scripts/demo_v5_success.sh
Result: V5 achieves 75% precision via subagent NER + validation filters