Quick Reference: Wikidata Validation Workflow

Status: 73/185 automatically validated | 75 need manual review
Time saved: 5.2 hours (67.6%) | Remaining: 2.5 hours

🚀 Quick Start (Manual Review)

Option A: Streamlined Review (Recommended)

# Open this file in spreadsheet software
open data/review/denmark_wikidata_fuzzy_matches_needs_review.csv

Contains: Only 75 ambiguous rows requiring your judgment
Guide: docs/PREFILLED_VALIDATION_GUIDE.md

Option B: Full Review

# Open full dataset (all 185 rows)
open data/review/denmark_wikidata_fuzzy_matches_prefilled.csv

Contains: All 185 rows, 73 already validated, filter for empty validation_status

✍️ Fill These Columns

For each row, add:

Column	Values	Example
`validation_status`	CORRECT / INCORRECT / UNCERTAIN	INCORRECT
`validation_notes`	Why? Evidence?	"City mismatch: Viborg vs Aalborg, checked Q21107842"

🎯 Decision Guide

Mark CORRECT when:

✅ Branch library → Main library match
✅ Name variation (same institution)
✅ ISIL codes match

Mark INCORRECT when:

❌ Different cities (already auto-marked: 71 cases)
❌ Different types (library vs museum)
❌ School library ≠ public library
❌ Very different names (no relationship)

Mark UNCERTAIN when:

⁉️ Possible merger (need expert)
⁉️ Missing info (need research)

⚡ After Review

# Step 1: Apply your validation decisions
python scripts/apply_wikidata_validation.py

# Step 2: Check progress and results
python scripts/check_validation_progress.py

📂 Files at a Glance

data/review/
├── denmark_wikidata_fuzzy_matches_needs_review.csv  ← Review this (75 rows)
├── denmark_wikidata_fuzzy_matches_prefilled.csv     ← Or this (185 rows)
└── README.md                                         ← Quick reference

docs/
├── PREFILLED_VALIDATION_GUIDE.md  ← Complete workflow (read first!)
├── AUTOMATED_SPOT_CHECK_RESULTS.md ← How automation works
└── WIKIDATA_VALIDATION_CHECKLIST.md ← Original detailed guide

scripts/
├── prefill_obvious_errors.py          ← Already run ✅
├── apply_wikidata_validation.py       ← Run after review
└── check_validation_progress.py       ← Check results

🔍 Common Patterns in Needs Review

Pattern	Count	Typical Decision	Time
Name similarity issues	11	60% INCORRECT	22 min
Gymnasium libraries	7	90% INCORRECT	14 min
Branch suffix	10	70% CORRECT	20 min
Low confidence	8	50/50	16 min
Priority 1-2 check	19	95% CORRECT	38 min
Total	75	-	150 min

💡 Pro Tips

Start with Priority 1 - Most important matches first
Click Wikidata URLs - Verify addresses, dates, locations
Use batch validation - Same error pattern → Same decision
Document well - Future you will thank you
When uncertain - Mark UNCERTAIN, don't guess

🆘 Need Help?

Workflow unclear? → Read docs/PREFILLED_VALIDATION_GUIDE.md
Decision uncertain? → Check decision guide (page 15)
Found automation error? → Override it! Change status, add note
Need examples? → See guide pages 20-25

📊 Progress Tracker

┌─────────────────────────────────────┐
│ Wikidata Validation Progress       │
├─────────────────────────────────────┤
│ ✅ Automated:      73/185 (39.5%)  │
│ 📝 Needs review:   75/185 (40.5%)  │
│ ⏸️  Lower priority: 37/185 (20.0%)  │
├─────────────────────────────────────┤
│ Time saved:   5.2h / 7.7h (67.6%)  │
│ Remaining:    2.5h (manual review)  │
└─────────────────────────────────────┘

✅ Checklist

Before running apply script:

Opened needs_review.csv or prefilled.csv
Reviewed all 75 rows (or filtered empty validation_status)
Filled validation_status for each row
Filled validation_notes with reasoning
Checked for any blank validation cells
Saved CSV file

Ready to apply:

python scripts/apply_wikidata_validation.py

Last Updated: November 19, 2025
Next: Manual review → Apply validation → Verify results
Goal: ~95%+ accurate Wikidata links (769 → ~680-700 high-quality links)

4.7 KiB Raw Blame History