7.3 KiB
Canadian ISIL Database Extraction - Session Summary
Date: November 18, 2025
Source: Library and Archives Canada - Canadian Library Directory
URL: https://sigles-symbols.bac-lac.gc.ca/eng/Search
Summary
Successfully extracted the complete Canadian ISIL database containing all heritage institutions with Canadian Library Symbols.
Statistics
- Total records extracted: 9,566
- Active libraries: 6,520
- Closed/Superseded: 3,046
Extraction Method
Created a Python scraper using Playwright that:
- Navigates through paginated search results (100 records per page)
- Extracts basic library information from list pages
- Generates ISIL codes in Canadian format (CA-[SYMBOL])
- Saves data in three JSON files
Output Files
All files saved to: /Users/kempersc/apps/glam/data/isil/canada/
-
canadian_libraries_active.json (2.2 MB)
- 6,520 active libraries
- Current, operational institutions
-
canadian_libraries_closed.json (1.1 MB)
- 3,046 closed/superseded libraries
- Historical records, merged institutions, closed facilities
-
canadian_libraries_all.json (3.3 MB)
- Combined dataset
- All 9,566 records
Data Structure
Each record contains:
{
"isil_code": "CA-AA",
"library_symbol": "AA",
"name": "Andrew Municipal Library",
"city": "Andrew",
"province": "Alberta",
"country": "CA",
"library_id": "3000",
"detail_url": "https://sigles-symbols.bac-lac.gc.ca/eng/Search/Details?Id=3000",
"status": "active"
}
Fields Extracted
- isil_code: Canadian ISIL code (format: CA-[symbol])
- library_symbol: Official Canadian Library Symbol (unique identifier)
- name: Institution name
- city: City location
- province: Province/territory
- country: Country code (CA)
- library_id: Internal database ID
- detail_url: Link to full detail page
- status: "active" or "closed"
Coverage by Province
The database covers all Canadian provinces and territories:
- Alberta (AB)
- British Columbia (BC)
- Manitoba (MB)
- New Brunswick (NB)
- Newfoundland and Labrador (NL)
- Northwest Territories (NT)
- Nova Scotia (NS)
- Nunavut (NU)
- Ontario (ON)
- Prince Edward Island (PE)
- Quebec (QC)
- Saskatchewan (SK)
- Yukon (YT)
Institution Types Included
Based on observation of the data, the database includes:
- Public libraries (municipal, regional)
- School libraries (elementary, secondary)
- Academic libraries (colleges, universities)
- Special libraries (government, corporate, research)
- Archives
- Museum libraries
- Religious institution libraries
Performance
- Extraction time: ~4 minutes (total)
- Active libraries (66 pages): ~2.5 minutes
- Closed libraries (31 pages): ~1.5 minutes
- Request rate: ~0.5 seconds delay between page requests (polite scraping)
- Success rate: 100% (all 97 pages successfully scraped)
Scripts Created
-
scrape_canadian_isil_fast.py
- Fast scraper that extracts list-level data only
- Located at:
/Users/kempersc/apps/glam/scripts/scrapers/scrape_canadian_isil_fast.py - Usage:
python3 scrape_canadian_isil_fast.py(full dataset) - Usage:
python3 scrape_canadian_isil_fast.py --test(first 2 pages only)
-
scrape_canadian_isil.py (slower, detailed version)
- Includes fetching individual detail pages for each library
- Located at:
/Users/kempersc/apps/glam/scripts/scrapers/scrape_canadian_isil.py - Not used for full dataset due to time constraints (~1.2 sec per detail page)
- Could be used later to enrich data with additional fields
Additional Data Available (Not Yet Extracted)
The detail pages contain much more information that could be extracted in a future pass:
- Full address (street, postal code)
- Telephone number
- Fax number
- Email address(es)
- Library type classification
- OCLC symbol
- Lending policies (monographs, serials)
- Photocopy policies
- ILL (Interlibrary Loan) policies
- Request methods (email, fax, web form)
- Library system membership
- Website URL
- Notes/comments
Next Steps (Optional)
-
Detail enrichment: Run the detailed scraper to fetch all additional fields from detail pages
- Estimated time: ~2.5 hours (9,566 records × 1 second per record)
- Would add contact info, policies, and other metadata
-
Convert to LinkML format: Transform JSON data into the project's LinkML schema
- Map to
HeritageCustodianclass - Assign institution types (LIBRARY, ARCHIVE, MUSEUM, etc.)
- Add provenance metadata
- Generate GHCIDs
- Map to
-
Geocoding: Add latitude/longitude coordinates for each institution
- Use Nominatim API or Google Maps API
- Would enable geographic visualization
-
Cross-reference: Link with other datasets
- Wikidata IDs
- Library websites
- Other Canadian heritage databases
-
Export: Convert to other formats
- CSV for spreadsheet analysis
- RDF/Turtle for semantic web
- Parquet for data warehousing
Data Quality Notes
Strengths:
- ✅ Complete dataset (all 9,566 records)
- ✅ Authoritative source (Library and Archives Canada)
- ✅ Well-structured data
- ✅ Includes historical records (closed/superseded)
- ✅ ISIL codes available for all institutions
- ✅ Covers all Canadian provinces/territories
Limitations:
- ⚠️ Basic data only (name, city, province, symbol)
- ⚠️ No contact information in current extract
- ⚠️ No geographic coordinates
- ⚠️ No library type classification
- ⚠️ No cross-references to Wikidata or other databases
- ⚠️ City names not fully standardized (some all-caps, some mixed case)
Data Tier Classification
According to the project's data quality framework:
- Data Source: CSV_REGISTRY (authoritative government source)
- Data Tier: TIER_1_AUTHORITATIVE
- Confidence Score: 0.95-1.0 (government registry data)
Canadian ISIL Format
Canadian ISIL codes follow the pattern: CA-[SYMBOL]
Examples:
CA-AA- Andrew Municipal LibraryCA-OONL- National Library of CanadaCA-OTU- University of TorontoCA-QMM- McGill University
The symbol portion is assigned by Library and Archives Canada and is unique within the Canadian system.
Integration with Global GLAM Project
This Canadian dataset should be integrated into the global heritage custodian database alongside:
- Dutch ISIL registry (364 institutions)
- Dutch organizations CSV (1,351 institutions)
- Conversation-extracted institutions from 139 JSON files
Total heritage custodians after integration: 11,281+ institutions worldwide
Commands Used
# Make scraper executable
chmod +x /Users/kempersc/apps/glam/scripts/scrapers/scrape_canadian_isil_fast.py
# Run full extraction
cd /Users/kempersc/apps/glam
python3 scripts/scrapers/scrape_canadian_isil_fast.py
# Test mode (first 2 pages only)
python3 scripts/scrapers/scrape_canadian_isil_fast.py --test
# Check output
ls -lh data/isil/canada/*.json
jq '.record_count // .total_records' data/isil/canada/*.json
Repository Location
- Data files:
/Users/kempersc/apps/glam/data/isil/canada/ - Scripts:
/Users/kempersc/apps/glam/scripts/scrapers/ - Project root:
/Users/kempersc/apps/glam/
Status: ✅ COMPLETE
Quality: HIGH (authoritative source, complete dataset)
Extraction Date: 2025-11-18T20:23:09
Last Updated: 2024-11-05 (source database)