# Quick Reference: Wikidata Validation Workflow **Status**: 73/185 automatically validated | 75 need manual review **Time saved**: 5.2 hours (67.6%) | **Remaining**: 2.5 hours --- ## πŸš€ Quick Start (Manual Review) ### Option A: Streamlined Review (Recommended) ```bash # Open this file in spreadsheet software open data/review/denmark_wikidata_fuzzy_matches_needs_review.csv ``` **Contains**: Only 75 ambiguous rows requiring your judgment **Guide**: docs/PREFILLED_VALIDATION_GUIDE.md ### Option B: Full Review ```bash # Open full dataset (all 185 rows) open data/review/denmark_wikidata_fuzzy_matches_prefilled.csv ``` **Contains**: All 185 rows, 73 already validated, filter for empty `validation_status` --- ## ✍️ Fill These Columns For each row, add: | Column | Values | Example | |--------|--------|---------| | `validation_status` | CORRECT / INCORRECT / UNCERTAIN | INCORRECT | | `validation_notes` | Why? Evidence? | "City mismatch: Viborg vs Aalborg, checked Q21107842" | --- ## 🎯 Decision Guide ### Mark CORRECT when: - βœ… Branch library β†’ Main library match - βœ… Name variation (same institution) - βœ… ISIL codes match ### Mark INCORRECT when: - ❌ Different cities (already auto-marked: 71 cases) - ❌ Different types (library vs museum) - ❌ School library β‰  public library - ❌ Very different names (no relationship) ### Mark UNCERTAIN when: - ⁉️ Possible merger (need expert) - ⁉️ Missing info (need research) --- ## ⚑ After Review ```bash # Step 1: Apply your validation decisions python scripts/apply_wikidata_validation.py # Step 2: Check progress and results python scripts/check_validation_progress.py ``` --- ## πŸ“‚ Files at a Glance ``` data/review/ β”œβ”€β”€ denmark_wikidata_fuzzy_matches_needs_review.csv ← Review this (75 rows) β”œβ”€β”€ denmark_wikidata_fuzzy_matches_prefilled.csv ← Or this (185 rows) └── README.md ← Quick reference docs/ β”œβ”€β”€ PREFILLED_VALIDATION_GUIDE.md ← Complete workflow (read first!) β”œβ”€β”€ AUTOMATED_SPOT_CHECK_RESULTS.md ← How automation works └── WIKIDATA_VALIDATION_CHECKLIST.md ← Original detailed guide scripts/ β”œβ”€β”€ prefill_obvious_errors.py ← Already run βœ… β”œβ”€β”€ apply_wikidata_validation.py ← Run after review └── check_validation_progress.py ← Check results ``` --- ## πŸ” Common Patterns in Needs Review | Pattern | Count | Typical Decision | Time | |---------|-------|------------------|------| | Name similarity issues | 11 | 60% INCORRECT | 22 min | | Gymnasium libraries | 7 | 90% INCORRECT | 14 min | | Branch suffix | 10 | 70% CORRECT | 20 min | | Low confidence | 8 | 50/50 | 16 min | | Priority 1-2 check | 19 | 95% CORRECT | 38 min | | **Total** | **75** | - | **150 min** | --- ## πŸ’‘ Pro Tips 1. **Start with Priority 1** - Most important matches first 2. **Click Wikidata URLs** - Verify addresses, dates, locations 3. **Use batch validation** - Same error pattern β†’ Same decision 4. **Document well** - Future you will thank you 5. **When uncertain** - Mark UNCERTAIN, don't guess --- ## πŸ†˜ Need Help? - **Workflow unclear?** β†’ Read `docs/PREFILLED_VALIDATION_GUIDE.md` - **Decision uncertain?** β†’ Check decision guide (page 15) - **Found automation error?** β†’ Override it! Change status, add note - **Need examples?** β†’ See guide pages 20-25 --- ## πŸ“Š Progress Tracker ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Wikidata Validation Progress β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ βœ… Automated: 73/185 (39.5%) β”‚ β”‚ πŸ“ Needs review: 75/185 (40.5%) β”‚ β”‚ ⏸️ Lower priority: 37/185 (20.0%) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Time saved: 5.2h / 7.7h (67.6%) β”‚ β”‚ Remaining: 2.5h (manual review) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` --- ## βœ… Checklist Before running apply script: - [ ] Opened needs_review.csv or prefilled.csv - [ ] Reviewed all 75 rows (or filtered empty validation_status) - [ ] Filled `validation_status` for each row - [ ] Filled `validation_notes` with reasoning - [ ] Checked for any blank validation cells - [ ] Saved CSV file Ready to apply: ```bash python scripts/apply_wikidata_validation.py ``` --- **Last Updated**: November 19, 2025 **Next**: Manual review β†’ Apply validation β†’ Verify results **Goal**: ~95%+ accurate Wikidata links (769 β†’ ~680-700 high-quality links)