7.9 KiB
Session Summary: Global ISIL Registry Mapping
Date: 2025-11-17
Focus: Mapping all global ISIL registries and preparing scraper inventory
Accomplishments Today
1. Completed Extractions ✅
Belarus ISIL Registry:
- 167 institutions extracted from National Library of Belarus
- Regional distribution: HO (29), MI (26), VI (25), MA (25), HM (23), BR (20), HR (19)
- Files:
/data/isil/BY/belarus_isil_all.csvand.json - Source: https://nlb.by/en/for-librarians/international-standard-identifier-for-libraries-and-related-organizations-isil/list-of-libraries-organizations-of-the-republic-of-belarus-and-their-isil-codes
KB Netherlands Public Libraries:
- 153 institutions extracted from Excel file (dated 2025-04-01)
- Source: https://www.bibliotheeknetwerk.nl/
- Parser:
scripts/scrapers/parse_kb_netherlands_isil.py - Files:
/data/isil/NL/kb_netherlands_public_libraries.csvand.json
Total from previous session:
- Belgium (KBR): 438 institutions
- Netherlands: 153 public libraries
- Belarus: 167 institutions
Major Discovery: Official ISIL Registration Authority Directory
Found the authoritative global ISIL directory maintained by Denmark (ISIL Registration Authority):
Source: https://slks.dk/english/work-areas/libraries-and-literature/library-standards/isil
Summary Statistics:
- 41 national agencies listed
- 30+ countries (73%) with searchable/accessible registries
- 8 countries without public access (AR, GL, IR, KZ, MD, NP, partial KR)
- 5 non-national agencies (EUR, GTB, OCLC, ZDB)
Documentation Created: /data/isil/GLOBAL_ISIL_AGENCIES_OFFICIAL.md
Key Findings
Denmark Registry Structure
- Direct CSV/Excel exports available at https://vip.dbc.dk/librarylists/
- Categories: Public libraries (main + branches), Research libraries (main + branches)
- Public interface documented but extraction not yet performed
Japan Registry Structure
- Comprehensive Excel/CSV downloads with English translations
- 4 institutional types separated:
- Public libraries (~4,000 estimated)
- NDL + university + special libraries (~2,000 estimated)
- Museums (newly added Oct 2022)
- Archives (newly added Oct 2022)
- Files dated 2025-11-05 (very recent!)
- Source: http://www.ndl.go.jp/en/library/isil/index.html
Registry Access Status
Tier 1: Direct CSV/Excel downloads (highest priority):
- 🇩🇰 Denmark - https://vip.dbc.dk/librarylists/
- 🇯🇵 Japan - http://www.ndl.go.jp/en/library/isil/index.html#isillist
- 🇳🇱 Netherlands (KB) - https://www.bibliotheeknetwerk.nl/ (✅ extracted)
- 🇧🇾 Belarus - https://nlb.by/en/... (✅ extracted)
- 🇧🇪 Belgium (KBR) - http://isil.kbr.be/ (✅ extracted)
Tier 2: Searchable interfaces (scraping required):
- 🇺🇸 USA - Library of Congress MARC Organizations (thousands)
- 🇩🇪 Germany - Staatsbibliothek zu Berlin Sigel (~10,000)
- 🇫🇷 France - SUDOC
- 🇮🇹 Italy - ICCU Anagrafe
- 🇨🇦 Canada - Library and Archives Canada
- 🇦🇺 Australia - National Library
- 🇨🇭 Switzerland - Swiss National Library
- 🇦🇹 Austria - Austrian Library Network
- Plus 15+ more countries
Tier 3: Unknown/Contact required:
- 🇦🇷 Argentina, 🇬🇱 Greenland, 🇮🇷 Iran, 🇰🇿 Kazakhstan, 🇲🇩 Moldova, 🇳🇵 Nepal
Blocked:
- 🇬🇧 UK - British Library offline due to cyber attack (Oct 2023)
Next Steps (Priority Order)
Priority 1: Easy Wins (Direct Downloads)
-
Denmark - Execute scraper for https://vip.dbc.dk/librarylists/
- Expected: 500+ public libraries, 200+ research libraries
- Method: Direct CSV download + parsing
-
Japan - Download all 4 Excel files
- Expected: 6,000+ institutions total
- Method: Direct Excel download + parsing (similar to KB Netherlands)
- Files available: Libraries (2 files), Museums, Archives, Deleted ISIL
Priority 2: Large Datasets (Scraping Required)
-
USA - Library of Congress
- URL: http://www.loc.gov/marc/organizations/
- Expected: 10,000+ institutions
- Method: Web scraping (HTML table parsing)
-
Germany - Staatsbibliothek zu Berlin
- URL: https://sigel.staatsbibliothek-berlin.de/suche
- Expected: ~10,000 institutions
- Method: Search interface scraping with pagination
-
France - SUDOC
- URL: http://www.sudoc.abes.fr/
- Expected: 3,000+ university/research libraries
- Method: Catalog scraping
Priority 3: Medium Datasets
- Canada - https://sigles-symbols.bac-lac.gc.ca/eng/Search
- Australia - http://www.nla.gov.au/apps/ilrs
- Switzerland - https://www.isil.nb.admin.ch/en/
- Austria - https://www.isil.at/
- Czech Republic - https://aleph.nkp.cz/...
Files Created/Modified
New Documents:
/data/isil/GLOBAL_ISIL_AGENCIES_OFFICIAL.md- Comprehensive directory of 41 national agencies/docs/sessions/SESSION_2025-11-17_ISIL_GLOBAL_MAPPING.md- This summary
New Parsers:
/scripts/scrapers/parse_kb_netherlands_isil.py- Excel parser for KB Netherlands
New Data Files:
/data/isil/BY/belarus_isil_all.csv+.json(167 institutions)/data/isil/NL/kb_netherlands_public_libraries.csv+.json(153 institutions)/data/isil/KB_Netherlands_ISIL_2025-04-01.xlsx(source file)
Technical Notes
Parser Design Patterns
Excel Parsing (KB Netherlands, Japan):
import openpyxl
workbook = openpyxl.load_workbook(excel_file)
sheet = workbook.active
# Handle multi-row headers
headers = [cell.value for cell in sheet[header_row]]
# Parse data rows
for row in sheet.iter_rows(min_row=data_start_row, values_only=True):
institution = standardize_fields(row, headers)
HTML Table Scraping (Belarus, Denmark):
from bs4 import BeautifulSoup
import requests
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find('table')
rows = table.find_all('tr')
CSV Direct Download (Denmark, Japan):
import pandas as pd
df = pd.read_csv(url_or_file, encoding='utf-8')
# or use csv.DictReader for pure stdlib
Standardized Output Schema
All parsers output consistent fields:
isil_code(required)name(required)city(optional)region/province(optional)country(required)postal_code(optional)street_address(optional)registry(data source)source_url(registry homepage)notes(additional info)
Statistics Summary
Institutions Extracted to Date:
- Belgium: 438
- Belarus: 167
- Netherlands: 153
- Total: 758 institutions
Potential Dataset Size (if all Tier 1 extracted):
- Denmark: ~700
- Japan: ~6,000
- Netherlands (complete): ~1,500 (with research libraries)
- Belgium: 438
- Belarus: 167
- Total potential: ~9,000 institutions from 5 countries
Global Potential (if all 41 national agencies extracted):
- Estimated 50,000-100,000+ institutions worldwide
- USA alone: 10,000+
- Germany alone: 10,000+
- France: 3,000+
- Italy: 2,000+
References
Primary Sources:
- Danish ISIL Registry Authority: https://slks.dk/english/work-areas/libraries-and-literature/library-standards/isil
- ISO 15511:2019 Standard: https://www.iso.org/standard/77849.html
Registries Documented:
- 41 national agencies
- 5 non-national agencies
- 46 total ISIL allocation agencies
Tools Used:
requests+BeautifulSoupfor web scrapingopenpyxlfor Excel parsingcsvmodule for CSV exportsjsonmodule for JSON exports
Next Session Priorities
- ✅ Extract Denmark ISIL registry (direct CSV download)
- ✅ Extract Japan ISIL registry (4 Excel files)
- ⏳ Create scraper for USA Library of Congress
- ⏳ Create scraper for Germany Staatsbibliothek Berlin
- ⏳ Investigate Rijksarchief Belgium (archives) - 0 records from previous attempt
Goal: Achieve 15,000+ institutions from 10 countries by end of week
Session End: 2025-11-17 14:02 UTC