4.7 KiB
4.7 KiB
Quick Reference: Wikidata Validation Workflow
Status: 73/185 automatically validated | 75 need manual review
Time saved: 5.2 hours (67.6%) | Remaining: 2.5 hours
🚀 Quick Start (Manual Review)
Option A: Streamlined Review (Recommended)
# Open this file in spreadsheet software
open data/review/denmark_wikidata_fuzzy_matches_needs_review.csv
Contains: Only 75 ambiguous rows requiring your judgment
Guide: docs/PREFILLED_VALIDATION_GUIDE.md
Option B: Full Review
# Open full dataset (all 185 rows)
open data/review/denmark_wikidata_fuzzy_matches_prefilled.csv
Contains: All 185 rows, 73 already validated, filter for empty validation_status
✍️ Fill These Columns
For each row, add:
| Column | Values | Example |
|---|---|---|
validation_status |
CORRECT / INCORRECT / UNCERTAIN | INCORRECT |
validation_notes |
Why? Evidence? | "City mismatch: Viborg vs Aalborg, checked Q21107842" |
🎯 Decision Guide
Mark CORRECT when:
- ✅ Branch library → Main library match
- ✅ Name variation (same institution)
- ✅ ISIL codes match
Mark INCORRECT when:
- ❌ Different cities (already auto-marked: 71 cases)
- ❌ Different types (library vs museum)
- ❌ School library ≠ public library
- ❌ Very different names (no relationship)
Mark UNCERTAIN when:
- ⁉️ Possible merger (need expert)
- ⁉️ Missing info (need research)
⚡ After Review
# Step 1: Apply your validation decisions
python scripts/apply_wikidata_validation.py
# Step 2: Check progress and results
python scripts/check_validation_progress.py
📂 Files at a Glance
data/review/
├── denmark_wikidata_fuzzy_matches_needs_review.csv ← Review this (75 rows)
├── denmark_wikidata_fuzzy_matches_prefilled.csv ← Or this (185 rows)
└── README.md ← Quick reference
docs/
├── PREFILLED_VALIDATION_GUIDE.md ← Complete workflow (read first!)
├── AUTOMATED_SPOT_CHECK_RESULTS.md ← How automation works
└── WIKIDATA_VALIDATION_CHECKLIST.md ← Original detailed guide
scripts/
├── prefill_obvious_errors.py ← Already run ✅
├── apply_wikidata_validation.py ← Run after review
└── check_validation_progress.py ← Check results
🔍 Common Patterns in Needs Review
| Pattern | Count | Typical Decision | Time |
|---|---|---|---|
| Name similarity issues | 11 | 60% INCORRECT | 22 min |
| Gymnasium libraries | 7 | 90% INCORRECT | 14 min |
| Branch suffix | 10 | 70% CORRECT | 20 min |
| Low confidence | 8 | 50/50 | 16 min |
| Priority 1-2 check | 19 | 95% CORRECT | 38 min |
| Total | 75 | - | 150 min |
💡 Pro Tips
- Start with Priority 1 - Most important matches first
- Click Wikidata URLs - Verify addresses, dates, locations
- Use batch validation - Same error pattern → Same decision
- Document well - Future you will thank you
- When uncertain - Mark UNCERTAIN, don't guess
🆘 Need Help?
- Workflow unclear? → Read
docs/PREFILLED_VALIDATION_GUIDE.md - Decision uncertain? → Check decision guide (page 15)
- Found automation error? → Override it! Change status, add note
- Need examples? → See guide pages 20-25
📊 Progress Tracker
┌─────────────────────────────────────┐
│ Wikidata Validation Progress │
├─────────────────────────────────────┤
│ ✅ Automated: 73/185 (39.5%) │
│ 📝 Needs review: 75/185 (40.5%) │
│ ⏸️ Lower priority: 37/185 (20.0%) │
├─────────────────────────────────────┤
│ Time saved: 5.2h / 7.7h (67.6%) │
│ Remaining: 2.5h (manual review) │
└─────────────────────────────────────┘
✅ Checklist
Before running apply script:
- Opened needs_review.csv or prefilled.csv
- Reviewed all 75 rows (or filtered empty validation_status)
- Filled
validation_statusfor each row - Filled
validation_noteswith reasoning - Checked for any blank validation cells
- Saved CSV file
Ready to apply:
python scripts/apply_wikidata_validation.py
Last Updated: November 19, 2025
Next: Manual review → Apply validation → Verify results
Goal: ~95%+ accurate Wikidata links (769 → ~680-700 high-quality links)