# Bosnia ISIL Automation Script - Created **Date**: November 18, 2025 **Status**: Script ready, awaiting execution --- ## Script Created **Location**: `/Users/kempersc/apps/glam/scripts/bosnia_isil_scraper.py` ### What It Does Automates the manual fallback process of checking all 80 COBISS.BH libraries for ISIL codes. ### Strategy For each of the 80 libraries: 1. **Check COBISS Library Pages** - Try multiple COBISS URL patterns - Search page content for ISIL code patterns 2. **Check Institutional Websites** - Navigate to library homepage (if available) - Check main page and "About"/"Contact" sections - Search for ISIL codes, ISO 15511 mentions 3. **Pattern Matching** - `BA-*` codes (ISO 3166-1 alpha-2 format) - `BO-*` codes (legacy format from Danish registry) - "ISIL: XX-XXXXX" mentions - ISO 15511 standard references ### Output **File**: `data/isil/bosnia/bosnia_isil_codes_found.json` **Format**: ```json [ { "number": 1, "name": "Library Name", "city": "City", "acronym": "ACRONYM", "homepage": "www.example.ba", "isil_found": true/false, "isil_codes": ["BA-SA-CODE", "BO-CODE"], "sources_checked": ["COBISS library pages", "Website: www.example.ba"], "notes": ["Found in COBISS: https://...", "Found on website: https://..."] } ] ``` ### Performance - **Manual Estimate**: 80 libraries × 5 min/library = **~6.5 hours** - **Automated Estimate**: **~10-20 minutes** (including wait times, retries) - **Intermediate Saves**: Every 10 libraries (fault tolerance) - **Logging**: Real-time progress to `scraper_log.txt` --- ## How to Run ### Option 1: Run Directly ```bash cd /Users/kempersc/apps/glam python scripts/bosnia_isil_scraper.py ``` ### Option 2: Run in Background ```bash cd /Users/kempersc/apps/glam nohup python scripts/bosnia_isil_scraper.py > data/isil/bosnia/scraper_output.txt 2>&1 & # Monitor progress tail -f data/isil/bosnia/scraper_log.txt ``` ### Option 3: Test on First 5 Libraries ```bash # Edit script to limit to 5 libraries for testing python scripts/bosnia_isil_scraper.py ``` --- ## Expected Outcomes ### Scenario 1: ISIL Codes Found ✅ If ISIL codes ARE included in COBISS records or institutional websites: - Script extracts 50-80 ISIL codes - Validates country code format (BA- vs. BO-) - Creates complete mapping: COBISS acronym → ISIL code ### Scenario 2: No ISIL Codes Found ❌ If ISIL codes are NOT publicly accessible: - Confirms exhaustive search (COBISS + websites) - Validates that ISIL codes require direct contact with NUBBiH - Provides evidence for email request to Registration Authority ### Scenario 3: Partial Results ⚠️ If some libraries have ISIL codes but not all: - Identifies which libraries publish their ISIL codes - Reveals inconsistencies in COBISS data entry - Prioritizes which libraries to contact directly --- ## Dependencies **Python Packages**: ```bash pip install playwright playwright install chromium ``` **Already Installed** (based on project structure): - Python 3.11+ - Playwright (used earlier in session) --- ## Monitoring Progress ### Real-Time Log ```bash tail -f data/isil/bosnia/scraper_log.txt ``` **Example Output**: ``` 2025-11-18 15:30:00 - Starting Bosnia ISIL scraper... 2025-11-18 15:30:01 - Loaded 80 libraries 2025-11-18 15:30:05 - [1/80] Checking: Agronomski i prehrambeno-tehnološki fakultet, Mostar (APFMO) 2025-11-18 15:30:15 - ✓ Found ISIL codes in COBISS: ['BA-MO-APFMO'] 2025-11-18 15:30:20 - [2/80] Checking: Akademija likovnih umjetnosti (ALU) ... ``` ### Intermediate Results Check progress every 10 libraries: ```bash cat data/isil/bosnia/bosnia_isil_codes_found.json | jq 'length' ``` --- ## Risk Mitigation ### Fault Tolerance 1. **Intermediate Saves**: Results saved every 10 libraries 2. **Error Handling**: Script continues if individual pages fail 3. **Logging**: All errors logged to `scraper_log.txt` 4. **Timeout Protection**: 10-15 second timeouts per page ### Rate Limiting - 2-second delay between libraries - Prevents overwhelming COBISS servers - Respects website terms of service --- ## After Completion ### Analyze Results ```bash # Count how many ISIL codes were found jq '[.[] | select(.isil_found == true)] | length' data/isil/bosnia/bosnia_isil_codes_found.json # List all unique ISIL codes jq '[.[].isil_codes[]] | unique' data/isil/bosnia/bosnia_isil_codes_found.json # Find libraries without ISIL codes jq '[.[] | select(.isil_found == false) | .name]' data/isil/bosnia/bosnia_isil_codes_found.json ``` ### Next Steps Based on Results **If ISIL Codes Found**: 1. Validate code format (BA- vs. BO-) 2. Create LinkML instance files 3. Update investigation report with findings **If No ISIL Codes Found**: 1. Confirm exhaustive search completed 2. Send email to NUBBiH (template in FINAL_REPORT.md) 3. Document that ISIL codes require direct contact --- ## Comparison: Manual vs. Automated | Aspect | Manual | Automated Script | |--------|--------|------------------| | **Time** | ~6.5 hours | ~10-20 minutes | | **Coverage** | 80 libraries | 80 libraries | | **Accuracy** | Human error possible | Consistent pattern matching | | **Documentation** | Manual notes | Structured JSON + logs | | **Reproducibility** | Low (fatigue) | High (repeatable) | | **Intermediate Saves** | Manual | Every 10 libraries | | **Error Recovery** | Start over | Resume from last save | --- ## Script Code Overview ```python # Key functions: - search_for_isil(text): Pattern matching for ISIL codes - check_cobiss_library_page(page, acronym): Check COBISS pages - check_institution_website(page, homepage): Check library websites - scrape_all_libraries(): Main orchestration loop # Output: - bosnia_isil_codes_found.json: Structured results - scraper_log.txt: Real-time progress log ``` --- ## Decision Point **You have three options**: 1. **Run the script now** → Complete automation (10-20 min) 2. **Test on 5 libraries** → Validate approach before full run 3. **Skip automation** → Proceed with email contact strategy **Recommendation**: Run the script. Even if ISIL codes aren't found, it provides conclusive evidence for the email request to NUBBiH, demonstrating due diligence. --- **Status**: ⏳ AWAITING USER DECISION TO EXECUTE **Next Command** (if executing): ```bash cd /Users/kempersc/apps/glam && python scripts/bosnia_isil_scraper.py ```