glam/data/instances/japan/validation_report.txt
2025-11-19 23:25:22 +01:00

71 lines
No EOL
2.5 KiB
Text
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

================================================================================
JAPAN ISIL DATASET VALIDATION REPORT
================================================================================
Generated: 2025-11-07T10:08:41.129274
Dataset: data/instances/japan/jp_institutions.yaml
VALIDATION SUMMARY
--------------------------------------------------------------------------------
Total Records: 12,065
Valid Records: 12,064 (99.99%)
Invalid Records: 1
INSTITUTION TYPE BREAKDOWN
--------------------------------------------------------------------------------
LIBRARY 7,607 (63.05%)
MUSEUM 4,356 (36.10%)
ARCHIVE 101 ( 0.84%)
GEOGRAPHIC COVERAGE
--------------------------------------------------------------------------------
Total Prefectures: 31
Top 10 Prefectures by Institution Count:
TO 2,080
KA 820
NA 780
FU 679
HO 641
AI 592
SH 591
MI 485
YA 468
SA 467
DATA QUALITY METRICS
--------------------------------------------------------------------------------
GHCID Coverage: 12,064 / 12,065 (99.99%)
Website URLs: 10,799 / 12,065 (89.51%)
Street Addresses: 12,045 / 12,065 (99.83%)
Postal Codes: 12,044 / 12,065 (99.83%)
VALIDATION ERRORS
--------------------------------------------------------------------------------
[ 1] Missing required fields: name
Sample Error Records (first 10):
1.
ID: JP-1003853
Error: Missing required fields: name
SCHEMA COMPLIANCE
--------------------------------------------------------------------------------
⚠️ WARN - Most records valid, minor issues detected
RECOMMENDATIONS
--------------------------------------------------------------------------------
⚠️ Website coverage at 89.5% - consider enriching from web
✅ Address coverage excellent (99.8%)
Only 31/47 prefectures represented - may be incomplete
NEXT STEPS
--------------------------------------------------------------------------------
1. Review sample error records if validation < 100%
2. Consider geocoding addresses to add lat/lon coordinates
3. Map prefecture codes to ISO 3166-2 (JP-01 through JP-47)
4. Merge with global dataset (NL, EUR, Latin America)
5. Export to GeoJSON for geographic visualization
================================================================================
END OF REPORT
================================================================================