6.3 KiB
Bosnia ISIL Automation Script - Created
Date: November 18, 2025
Status: Script ready, awaiting execution
Script Created
Location: /Users/kempersc/apps/glam/scripts/bosnia_isil_scraper.py
What It Does
Automates the manual fallback process of checking all 80 COBISS.BH libraries for ISIL codes.
Strategy
For each of the 80 libraries:
-
Check COBISS Library Pages
- Try multiple COBISS URL patterns
- Search page content for ISIL code patterns
-
Check Institutional Websites
- Navigate to library homepage (if available)
- Check main page and "About"/"Contact" sections
- Search for ISIL codes, ISO 15511 mentions
-
Pattern Matching
BA-*codes (ISO 3166-1 alpha-2 format)BO-*codes (legacy format from Danish registry)- "ISIL: XX-XXXXX" mentions
- ISO 15511 standard references
Output
File: data/isil/bosnia/bosnia_isil_codes_found.json
Format:
[
{
"number": 1,
"name": "Library Name",
"city": "City",
"acronym": "ACRONYM",
"homepage": "www.example.ba",
"isil_found": true/false,
"isil_codes": ["BA-SA-CODE", "BO-CODE"],
"sources_checked": ["COBISS library pages", "Website: www.example.ba"],
"notes": ["Found in COBISS: https://...", "Found on website: https://..."]
}
]
Performance
- Manual Estimate: 80 libraries × 5 min/library = ~6.5 hours
- Automated Estimate: ~10-20 minutes (including wait times, retries)
- Intermediate Saves: Every 10 libraries (fault tolerance)
- Logging: Real-time progress to
scraper_log.txt
How to Run
Option 1: Run Directly
cd /Users/kempersc/apps/glam
python scripts/bosnia_isil_scraper.py
Option 2: Run in Background
cd /Users/kempersc/apps/glam
nohup python scripts/bosnia_isil_scraper.py > data/isil/bosnia/scraper_output.txt 2>&1 &
# Monitor progress
tail -f data/isil/bosnia/scraper_log.txt
Option 3: Test on First 5 Libraries
# Edit script to limit to 5 libraries for testing
python scripts/bosnia_isil_scraper.py
Expected Outcomes
Scenario 1: ISIL Codes Found ✅
If ISIL codes ARE included in COBISS records or institutional websites:
- Script extracts 50-80 ISIL codes
- Validates country code format (BA- vs. BO-)
- Creates complete mapping: COBISS acronym → ISIL code
Scenario 2: No ISIL Codes Found ❌
If ISIL codes are NOT publicly accessible:
- Confirms exhaustive search (COBISS + websites)
- Validates that ISIL codes require direct contact with NUBBiH
- Provides evidence for email request to Registration Authority
Scenario 3: Partial Results ⚠️
If some libraries have ISIL codes but not all:
- Identifies which libraries publish their ISIL codes
- Reveals inconsistencies in COBISS data entry
- Prioritizes which libraries to contact directly
Dependencies
Python Packages:
pip install playwright
playwright install chromium
Already Installed (based on project structure):
- Python 3.11+
- Playwright (used earlier in session)
Monitoring Progress
Real-Time Log
tail -f data/isil/bosnia/scraper_log.txt
Example Output:
2025-11-18 15:30:00 - Starting Bosnia ISIL scraper...
2025-11-18 15:30:01 - Loaded 80 libraries
2025-11-18 15:30:05 - [1/80] Checking: Agronomski i prehrambeno-tehnološki fakultet, Mostar (APFMO)
2025-11-18 15:30:15 - ✓ Found ISIL codes in COBISS: ['BA-MO-APFMO']
2025-11-18 15:30:20 - [2/80] Checking: Akademija likovnih umjetnosti (ALU)
...
Intermediate Results
Check progress every 10 libraries:
cat data/isil/bosnia/bosnia_isil_codes_found.json | jq 'length'
Risk Mitigation
Fault Tolerance
- Intermediate Saves: Results saved every 10 libraries
- Error Handling: Script continues if individual pages fail
- Logging: All errors logged to
scraper_log.txt - Timeout Protection: 10-15 second timeouts per page
Rate Limiting
- 2-second delay between libraries
- Prevents overwhelming COBISS servers
- Respects website terms of service
After Completion
Analyze Results
# Count how many ISIL codes were found
jq '[.[] | select(.isil_found == true)] | length' data/isil/bosnia/bosnia_isil_codes_found.json
# List all unique ISIL codes
jq '[.[].isil_codes[]] | unique' data/isil/bosnia/bosnia_isil_codes_found.json
# Find libraries without ISIL codes
jq '[.[] | select(.isil_found == false) | .name]' data/isil/bosnia/bosnia_isil_codes_found.json
Next Steps Based on Results
If ISIL Codes Found:
- Validate code format (BA- vs. BO-)
- Create LinkML instance files
- Update investigation report with findings
If No ISIL Codes Found:
- Confirm exhaustive search completed
- Send email to NUBBiH (template in FINAL_REPORT.md)
- Document that ISIL codes require direct contact
Comparison: Manual vs. Automated
| Aspect | Manual | Automated Script |
|---|---|---|
| Time | ~6.5 hours | ~10-20 minutes |
| Coverage | 80 libraries | 80 libraries |
| Accuracy | Human error possible | Consistent pattern matching |
| Documentation | Manual notes | Structured JSON + logs |
| Reproducibility | Low (fatigue) | High (repeatable) |
| Intermediate Saves | Manual | Every 10 libraries |
| Error Recovery | Start over | Resume from last save |
Script Code Overview
# Key functions:
- search_for_isil(text): Pattern matching for ISIL codes
- check_cobiss_library_page(page, acronym): Check COBISS pages
- check_institution_website(page, homepage): Check library websites
- scrape_all_libraries(): Main orchestration loop
# Output:
- bosnia_isil_codes_found.json: Structured results
- scraper_log.txt: Real-time progress log
Decision Point
You have three options:
- Run the script now → Complete automation (10-20 min)
- Test on 5 libraries → Validate approach before full run
- Skip automation → Proceed with email contact strategy
Recommendation: Run the script. Even if ISIL codes aren't found, it provides conclusive evidence for the email request to NUBBiH, demonstrating due diligence.
Status: ⏳ AWAITING USER DECISION TO EXECUTE
Next Command (if executing):
cd /Users/kempersc/apps/glam && python scripts/bosnia_isil_scraper.py