163 lines
4.7 KiB
Markdown
163 lines
4.7 KiB
Markdown
# Quick Reference: Wikidata Validation Workflow
|
|
|
|
**Status**: 73/185 automatically validated | 75 need manual review
|
|
**Time saved**: 5.2 hours (67.6%) | **Remaining**: 2.5 hours
|
|
|
|
---
|
|
|
|
## 🚀 Quick Start (Manual Review)
|
|
|
|
### Option A: Streamlined Review (Recommended)
|
|
|
|
```bash
|
|
# Open this file in spreadsheet software
|
|
open data/review/denmark_wikidata_fuzzy_matches_needs_review.csv
|
|
```
|
|
|
|
**Contains**: Only 75 ambiguous rows requiring your judgment
|
|
**Guide**: docs/PREFILLED_VALIDATION_GUIDE.md
|
|
|
|
### Option B: Full Review
|
|
|
|
```bash
|
|
# Open full dataset (all 185 rows)
|
|
open data/review/denmark_wikidata_fuzzy_matches_prefilled.csv
|
|
```
|
|
|
|
**Contains**: All 185 rows, 73 already validated, filter for empty `validation_status`
|
|
|
|
---
|
|
|
|
## ✍️ Fill These Columns
|
|
|
|
For each row, add:
|
|
|
|
| Column | Values | Example |
|
|
|--------|--------|---------|
|
|
| `validation_status` | CORRECT / INCORRECT / UNCERTAIN | INCORRECT |
|
|
| `validation_notes` | Why? Evidence? | "City mismatch: Viborg vs Aalborg, checked Q21107842" |
|
|
|
|
---
|
|
|
|
## 🎯 Decision Guide
|
|
|
|
### Mark CORRECT when:
|
|
- ✅ Branch library → Main library match
|
|
- ✅ Name variation (same institution)
|
|
- ✅ ISIL codes match
|
|
|
|
### Mark INCORRECT when:
|
|
- ❌ Different cities (already auto-marked: 71 cases)
|
|
- ❌ Different types (library vs museum)
|
|
- ❌ School library ≠ public library
|
|
- ❌ Very different names (no relationship)
|
|
|
|
### Mark UNCERTAIN when:
|
|
- ⁉️ Possible merger (need expert)
|
|
- ⁉️ Missing info (need research)
|
|
|
|
---
|
|
|
|
## ⚡ After Review
|
|
|
|
```bash
|
|
# Step 1: Apply your validation decisions
|
|
python scripts/apply_wikidata_validation.py
|
|
|
|
# Step 2: Check progress and results
|
|
python scripts/check_validation_progress.py
|
|
```
|
|
|
|
---
|
|
|
|
## 📂 Files at a Glance
|
|
|
|
```
|
|
data/review/
|
|
├── denmark_wikidata_fuzzy_matches_needs_review.csv ← Review this (75 rows)
|
|
├── denmark_wikidata_fuzzy_matches_prefilled.csv ← Or this (185 rows)
|
|
└── README.md ← Quick reference
|
|
|
|
docs/
|
|
├── PREFILLED_VALIDATION_GUIDE.md ← Complete workflow (read first!)
|
|
├── AUTOMATED_SPOT_CHECK_RESULTS.md ← How automation works
|
|
└── WIKIDATA_VALIDATION_CHECKLIST.md ← Original detailed guide
|
|
|
|
scripts/
|
|
├── prefill_obvious_errors.py ← Already run ✅
|
|
├── apply_wikidata_validation.py ← Run after review
|
|
└── check_validation_progress.py ← Check results
|
|
```
|
|
|
|
---
|
|
|
|
## 🔍 Common Patterns in Needs Review
|
|
|
|
| Pattern | Count | Typical Decision | Time |
|
|
|---------|-------|------------------|------|
|
|
| Name similarity issues | 11 | 60% INCORRECT | 22 min |
|
|
| Gymnasium libraries | 7 | 90% INCORRECT | 14 min |
|
|
| Branch suffix | 10 | 70% CORRECT | 20 min |
|
|
| Low confidence | 8 | 50/50 | 16 min |
|
|
| Priority 1-2 check | 19 | 95% CORRECT | 38 min |
|
|
| **Total** | **75** | - | **150 min** |
|
|
|
|
---
|
|
|
|
## 💡 Pro Tips
|
|
|
|
1. **Start with Priority 1** - Most important matches first
|
|
2. **Click Wikidata URLs** - Verify addresses, dates, locations
|
|
3. **Use batch validation** - Same error pattern → Same decision
|
|
4. **Document well** - Future you will thank you
|
|
5. **When uncertain** - Mark UNCERTAIN, don't guess
|
|
|
|
---
|
|
|
|
## 🆘 Need Help?
|
|
|
|
- **Workflow unclear?** → Read `docs/PREFILLED_VALIDATION_GUIDE.md`
|
|
- **Decision uncertain?** → Check decision guide (page 15)
|
|
- **Found automation error?** → Override it! Change status, add note
|
|
- **Need examples?** → See guide pages 20-25
|
|
|
|
---
|
|
|
|
## 📊 Progress Tracker
|
|
|
|
```
|
|
┌─────────────────────────────────────┐
|
|
│ Wikidata Validation Progress │
|
|
├─────────────────────────────────────┤
|
|
│ ✅ Automated: 73/185 (39.5%) │
|
|
│ 📝 Needs review: 75/185 (40.5%) │
|
|
│ ⏸️ Lower priority: 37/185 (20.0%) │
|
|
├─────────────────────────────────────┤
|
|
│ Time saved: 5.2h / 7.7h (67.6%) │
|
|
│ Remaining: 2.5h (manual review) │
|
|
└─────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## ✅ Checklist
|
|
|
|
Before running apply script:
|
|
|
|
- [ ] Opened needs_review.csv or prefilled.csv
|
|
- [ ] Reviewed all 75 rows (or filtered empty validation_status)
|
|
- [ ] Filled `validation_status` for each row
|
|
- [ ] Filled `validation_notes` with reasoning
|
|
- [ ] Checked for any blank validation cells
|
|
- [ ] Saved CSV file
|
|
|
|
Ready to apply:
|
|
```bash
|
|
python scripts/apply_wikidata_validation.py
|
|
```
|
|
|
|
---
|
|
|
|
**Last Updated**: November 19, 2025
|
|
**Next**: Manual review → Apply validation → Verify results
|
|
**Goal**: ~95%+ accurate Wikidata links (769 → ~680-700 high-quality links)
|