7.6 KiB
Czech ISIL Investigation - Wikidata Extraction Results
Date: 2025-11-20
Task: Priority 2, Task 6 - Extract ISIL codes from Wikidata
Status: ✅ COMPLETE (minimal results)
Executive Summary
Wikidata ISIL Coverage for Czech Institutions: MINIMAL
Queried all 6,719 Czech institutions with Wikidata Q-numbers for ISIL codes (P791 property).
Results:
- ISIL codes found: 2 (0.03%)
- In our dataset: 1 (Národní technická knihovna - CZ-PrSTK)
- Not in our dataset: 1 (Jihočeská vědecká knihovna - OCLC-CVK)
Coverage: 1 of 8,694 institutions (0.01%) - essentially zero
Conclusion: Wikidata is NOT a viable source for Czech ISIL codes. Must contact NK ČR directly.
Detailed Findings
SPARQL Query Results
Query: Czech heritage institutions (museums, libraries, archives, galleries) with ISIL codes
SELECT ?item ?itemLabel ?isil WHERE {
VALUES ?type { wd:Q33506 wd:Q7075 wd:Q166118 wd:Q1007870 }
?item wdt:P31/wdt:P279* ?type .
?item wdt:P17 wd:Q213 .
?item wdt:P791 ?isil .
SERVICE wikibase:label { bd:serviceParam wikibase:language "cs,en" }
}
Results: 2 institutions
| Institution | Wikidata | ISIL Code | In Dataset |
|---|---|---|---|
| Národní technická knihovna | Q630893 | CZ-PrSTK | ✅ Yes |
| Jihočeská vědecká knihovna | Q20057526 | OCLC-CVK | ❌ No |
Why So Few ISIL Codes in Wikidata?
Hypothesis 1: ISIL codes rarely added to Wikidata
- ISIL property (P791) added to Wikidata in 2016
- Low priority for Wikidata editors (not visible to end users)
- Most Wikidata focus on: website, location, founding date
Hypothesis 2: Czech institutions lack ISIL codes entirely
- ISIL system in Czech Republic may be underdeveloped
- NK ČR may not have assigned codes to most institutions
- Small libraries/archives may not qualify for ISIL codes
Hypothesis 3: Data entry gap
- NK ČR has ISIL registry, but not published on Wikidata
- No bulk import project (unlike VIAF, which has better Wikidata coverage)
Comparison to Original Expectation
Expected: 306 ISIL codes (from initial enrichment script comment)
Actual: 2 ISIL codes
Discrepancy: 99.3% fewer than expected
Root cause: Original enrichment script's SPARQL query included OPTIONAL { ?item wdt:P791 ?isil }, which meant it returned results even when ISIL was NULL. The 306 number was VIAF IDs, not ISIL codes.
Script Performance
Batch Query Efficiency
Processed: 6,719 Q-numbers in 135 batches (50 Q-numbers per batch)
Time: ~2 minutes
Rate: ~56 Q-numbers/second
API calls: 135 SPARQL queries
Code Quality
✅ Batch processing: Avoids URL length limits
✅ Error handling: Graceful fallback on API errors
✅ Provenance tracking: Records enrichment history
✅ Statistics: Clear before/after reporting
Current Czech Dataset Status
Identifier Coverage (After Wikidata ISIL Extraction)
| Identifier | Count | Coverage |
|---|---|---|
| Wikidata Q-numbers | 6,719 | 77.3% |
| GPS coordinates | 6,623 | 76.2% |
| VIAF IDs | 306 | 3.5% |
| ISIL codes | 1 | 0.01% ← CRITICAL GAP |
Implication
ISIL codes are the bottleneck for:
- ✅ GHCID generation (requires ISIL or Q-number for collision resolution)
- ✅ Library community interoperability
- ✅ Cross-system citation
- ✅ Cataloging standards compliance
Without ISIL codes, Czech institutions are:
- ❌ Not discoverable via ISIL.org search
- ❌ Not linkable to international library systems
- ❌ Missing standardized identifiers for archival citation
Next Steps - Contact NK ČR
Option 1: Email NK ČR ISIL Registry Team ⭐ RECOMMENDED
Contact: Národní knihovna České republiky (Czech National Library)
Department: ISIL agency (likely part of cataloging/standards division)
Email draft:
Subject: Žádost o přístup k registru ISIL kódů pro výzkumný projekt
Dobrý den,
Jsem výzkumný pracovník na projektu mapování dědičných institucí po celém světě.
Chtěl bych požádat o přístup k registru ISIL kódů pro české knihovny, archivy a muzea.
V současné době máme dataset 8 694 českých institucí (knihovny z ADR, archivy z ARON),
ale chybí nám ISIL kódy pro většinu z nich.
Bylo by možné získat:
1. Úplný export registru ISIL kódů (CSV, Excel, nebo jiný formát)
2. Pokyny, jak požádat o ISIL kódy pro instituce, které je ještě nemají
Dataset bude publikován jako otevřená data (CC0 licence) pro výzkumnou komunitu.
Děkuji za pomoc,
[Vaše jméno]
---
Subject: Request for access to ISIL code registry for research project
Dear Sir/Madam,
I am a researcher working on a project to map heritage institutions globally.
I would like to request access to the ISIL code registry for Czech libraries,
archives, and museums.
Currently, we have a dataset of 8,694 Czech institutions (libraries from ADR,
archives from ARON), but ISIL codes are missing for most of them.
Would it be possible to obtain:
1. A complete export of the ISIL code registry (CSV, Excel, or other format)
2. Instructions on how to request ISIL codes for institutions that don't have them yet
The dataset will be published as open data (CC0 license) for the research community.
Thank you for your assistance,
[Your name]
Contact info:
- Website: https://www.nkp.cz
- Email: info@nkp.cz (general inquiries)
- ISIL page: https://www.nkp.cz/sluzby/sluzby-knihovnam/isil (check for contact)
Expected outcome:
- 60% chance of response
- Possible export of 500-2,000 ISIL codes
- Coverage: 0.01% → 5-20%
Option 2: Query ISIL.org Directly
Website: https://isil.org
Search strategy:
- Manual search by country (Czech Republic)
- Filter by institution type
- Scrape results (with rate limiting)
Expected outcome:
- 100-500 ISIL codes
- Coverage: 0.01% → 1-5%
- Time: 2-3 hours manual work
Option 3: Generate Provisional ISIL Codes
Format: CZ-[City][InstitutionCode]
Process:
- Check NK ČR ISIL format conventions
- Generate codes for institutions without them
- Mark as
provisional_isil: truein metadata - Submit to NK ČR for official registration
Risk: Codes may conflict with existing assignments
Recommendation
Priority order:
- Email NK ČR ⭐ (15 min, high value)
- Wait 1-2 weeks for response
- Query ISIL.org (2-3 hours, medium value)
- Generate provisional codes (last resort, requires NK ČR approval)
Parallel work while waiting:
- Generate GHCIDs using Wikidata Q-numbers (6,719 institutions)
- GHCID will use Q-numbers as collision resolution (not ISIL codes)
- Can add ISIL codes to GHCID later when available
Files
Scripts
scripts/extract_isil_from_wikidata.py- Wikidata ISIL extraction (complete)scripts/enrich_czech_wikidata.py- Wikidata enrichment (complete)
Data
data/instances/czech_unified.yaml- 8,694 institutions (unchanged, 1 ISIL code)data/instances/czech_unified_isil.yaml- Same as above (no new ISIL codes)
Documentation
CZECH_ISIL_WIKIDATA_EXTRACTION.md(this file)
Decision Point
Question: Should we continue with Task 6 (contact NK ČR), or move to GHCID generation?
Option A: Contact NK ČR now (requires manual email action by user)
Option B: Skip to GHCID generation using Wikidata Q-numbers (automated)
Recommendation: Option A - Email NK ČR, then proceed with Option B while waiting for response.
Status: Task 6 investigation complete. Awaiting decision on next action.