glam/reports/kb_libraries_enrichment_report.md
kempersc 30162e6526 Add script to validate KB library entries and generate enrichment report
- Implemented a Python script to validate KB library YAML files for required fields and data quality.
- Analyzed enrichment coverage from Wikidata and Google Maps, generating statistics.
- Created a comprehensive markdown report summarizing validation results and enrichment quality.
- Included error handling for file loading and validation processes.
- Generated JSON statistics for further analysis.
2025-11-28 14:48:33 +01:00

165 lines
3.3 KiB
Markdown

# KB Netherlands Public Libraries - Enrichment Report
**Generated**: 2025-11-28 12:28:14 UTC
**Total Entries**: 149
## Executive Summary
The KB Netherlands library ISIL data has been successfully integrated and enriched with external data sources.
| Metric | Count | Percentage |
|--------|-------|------------|
| Total KB Library Entries | 149 | 100% |
| Valid Entries | 149 | 100.0% |
| Wikidata Enriched | 114 | 76.5% |
| Google Maps Enriched | 149 | 100.0% |
---
## Wikidata Enrichment
### Coverage
| Status | Count | Percentage |
|--------|-------|------------|
| Successfully enriched | 114 | 76.5% |
| Not found in Wikidata | 35 | 23.5% |
| Not attempted | 0 | 0.0% |
### Match Methods
| Method | Count |
|--------|-------|
| isil_code_match | 64 |
| fuzzy_name_match | 50 |
### Data Completeness (of 114 enriched)
| Field | Count | Percentage |
|-------|-------|------------|
| Coordinates | 68 | 59.6% |
| Inception Date | 11 | 9.6% |
| VIAF ID | 3 | 2.6% |
| Website | 114 | 100.0% |
---
## Google Maps Enrichment
### Coverage
| Status | Count | Percentage |
|--------|-------|------------|
| Successfully enriched | 149 | 100.0% |
| Not found | 0 | 0.0% |
| Not attempted | 0 | 0.0% |
### Data Completeness (of 149 enriched)
| Field | Count | Percentage |
|-------|-------|------------|
| Coordinates | 149 | 100.0% |
| Full Address | 149 | 100.0% |
| Phone Number | 146 | 98.0% |
| Website | 143 | 96.0% |
| Opening Hours | 145 | 97.3% |
| Rating | 147 | 98.7% |
### Business Status
| Status | Count |
|--------|-------|
| OPERATIONAL | 147 |
| CLOSED_TEMPORARILY | 1 |
| CLOSED_PERMANENTLY | 1 |
### Geographic Distribution by Province
| Province | Count |
|----------|-------|
| Zuid-Holland | 25 |
| Overijssel | 23 |
| Noord-Brabant | 18 |
| Gelderland | 18 |
| Noord-Holland | 16 |
| Limburg | 13 |
| Utrecht | 9 |
| Friesland | 6 |
| Drenthe | 5 |
| Zeeland | 4 |
| Groningen | 3 |
| Flevoland | 3 |
| Sint Eustatius | 1 |
| Saba | 1 |
| Bonaire | 1 |
---
## Geographic Distribution by City
Top 20 cities with most library entries:
| City | Count |
|------|-------|
| Deventer | 5 |
| Den Haag | 4 |
| Groningen | 3 |
| Assen | 3 |
| Middelburg | 2 |
| Leeuwarden | 2 |
| Heerlen | 2 |
| Hoofddorp | 2 |
| Lelystad | 2 |
| Rotterdam | 2 |
| Amsterdam | 1 |
| Tilburg | 1 |
| Houten | 1 |
| Utrecht | 1 |
| Grave | 1 |
| Schiedam | 1 |
| Maastricht | 1 |
| Haarlem | 1 |
| Eindhoven | 1 |
| Enschede | 1 |
---
## Validation Results
### Summary
- **Valid entries**: 149 (100.0%)
- **Entries with issues**: 0
- **Entries with warnings**: 0
- **File parsing errors**: 0
---
## Data Sources
1. **KB Netherlands Library Network** (Primary)
- Source file: `KB_Netherlands_ISIL_2025-04-01.xlsx`
- URL: https://www.bibliotheeknetwerk.nl/
- 149 library entries with ISIL codes
2. **Wikidata** (Enrichment)
- SPARQL endpoint: https://query.wikidata.org/sparql
- Match methods: ISIL code lookup, fuzzy name matching
- Coverage: 114/149 (76.5%)
3. **Google Maps Places API** (Enrichment)
- API: Places API (New)
- Coverage: 149/149 (100.0%)
---
## Files Generated
- Entry files: `data/nde/enriched/entries/{index}_kb_isil.yaml` (149 files)
- This report: `reports/kb_libraries_enrichment_report.md`
- Statistics JSON: `reports/kb_libraries_enrichment_stats.json`
---
*Report generated by validate_kb_libraries_report.py*