# V5 Extraction - Quick Reference ## Status: ✅ 75% PRECISION ACHIEVED ### Architecture ``` Conversation Text → Subagent NER → V5 Validation → Clean Institutions (75% precision) ``` ### What Works - ✅ **Subagent NER**: Clean, accurate names (no mangling) - ✅ **V5 Validation**: 3 filters (country, organization, proper name) - ✅ **75% precision**: 3/4 correct (up from V4's 50%) ### What Doesn't Work - ❌ **Pattern-based extraction**: 0% precision (names mangled) ### Commands **Run V5 demonstration:** ```bash bash /Users/kempersc/apps/glam/scripts/demo_v5_success.sh ``` **Test subagent + V5 integration:** ```bash python /Users/kempersc/apps/glam/scripts/test_subagent_v5_integration.py ``` ### Subagent NER Prompt Template ``` Extract ALL heritage institutions from the following text. Return JSON array with: { "name": "Full institution name", "institution_type": "MUSEUM | ARCHIVE | LIBRARY | GALLERY", "city": "City name", "country": "2-letter ISO code", "isil_code": "ISIL code if mentioned", "confidence": 0.0-1.0 } Rules: 1. Preserve full names (e.g., "Van Abbemuseum", not "Abbemuseum") 2. Classify by primary function 3. Determine country from city names or context 4. Exclude: organizations, networks, generic descriptors ``` ### Next Steps for Production 1. Implement `extract_from_text_subagent()` in `InstitutionExtractor` 2. Update batch extraction scripts 3. Process 139 conversation files ### Files - **Documentation**: `output/V5_VALIDATION_SUMMARY.md` - **Session Summary**: `SESSION_SUMMARY_V5.md` - **Test Scripts**: `scripts/test_subagent_v5_integration.py` - **Demo**: `scripts/demo_v5_success.sh` --- **Result:** V5 achieves 75% precision via subagent NER + validation filters