# ISIL Enrichment Strategy for Latin American Institutions **Date**: 2025-11-06 **Dataset**: 304 Latin American institutions (Brazil: 97, Chile: 90, Mexico: 117) **Current ISIL Coverage**: 0% (0/304 institutions) ## Research Findings ### ISIL System Architecture **Decentralized Model**: ISIL operates through national registration agencies, not a global database. **Key Authorities**: - **Denmark**: Slots- og Kulturstyrelsen (international registration authority for ISO 15511) - **USA**: Library of Congress (US national agency, uses MARC org codes) - **Europe**: data.europa.eu, data.overheid.nl (Dutch ISIL registry - we have this!) - **National**: Each country's national library/archive maintains their own registry **Important Discovery**: No single global ISIL database exists. Each country maintains its own. ### Country-Specific Findings #### Brazil (BR-) - **Expected Agency**: Biblioteca Nacional do Brasil or IBICT - **Status**: No public ISIL registry found via web search - **Challenge**: Search results dominated by ISIL (terrorism) references, not library codes - **Recommendation**: Direct contact with Biblioteca Nacional do Brasil required #### Mexico (MX-) - **Expected Agency**: Biblioteca Nacional de México (under UNAM) - **Alternate Contact**: Asociación Mexicana de Bibliotecarios (AMBAC) - **Status**: No public ISIL registry found - **Search Strategy**: Try .gob.mx or .unam.mx domain-specific searches for "código ISIL" - **Recommendation**: Contact BNM/UNAM directly #### Chile (CL-) - **Expected Agency**: Biblioteca Nacional de Chile or DIBAM (now Servicio Nacional del Patrimonio Cultural) - **Status**: No public ISIL registry found - **Recommendation**: Contact Biblioteca Nacional de Chile ## Alternative Enrichment Strategy ### Phase 1: Leverage Existing Identifiers (IMMEDIATE - This Week) Since ISIL codes are unavailable, we'll enrich using identifiers we CAN access: #### 1.1 Wikidata Enrichment **What We Have**: Some institutions already have Wikidata IDs in the dataset. **What We Can Get**: - ISIL codes (Wikidata Property P791) - if they exist - Additional identifiers (GND, VIAF, Library of Congress) - Geographic coordinates - Parent organizations - Website URLs **Action**: Run Wikidata SPARQL queries ```python # Script: scripts/enrich_from_wikidata.py import requests from SPARQLWrapper import SPARQLWrapper, JSON def query_wikidata_isil(country_qid): """ Query Wikidata for institutions with ISIL codes Args: country_qid: Wikidata QID for country (Q155=Brazil, Q96=Mexico, Q298=Chile) """ sparql = SPARQLWrapper("https://query.wikidata.org/sparql") query = f""" SELECT DISTINCT ?org ?orgLabel ?isil ?viaf ?website ?coords WHERE {{ ?org wdt:P17 wd:{country_qid} . # Country OPTIONAL {{ ?org wdt:P791 ?isil . }} # ISIL code OPTIONAL {{ ?org wdt:P214 ?viaf . }} # VIAF ID OPTIONAL {{ ?org wdt:P856 ?website . }} # Official website OPTIONAL {{ ?org wdt:P625 ?coords . }} # Coordinates # Filter for GLAM institutions VALUES ?type {{ wd:Q7075 # Library wd:Q166118 # Archive wd:Q33506 # Museum wd:Q1007870 # Art gallery }} ?org wdt:P31/wdt:P279* ?type . SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en,pt,es". }} }} LIMIT 1000 """ sparql.setQuery(query) sparql.setReturnFormat(JSON) results = sparql.query().convert() return results ``` **Expected Output**: - Brazil: ~20-30 institutions with Wikidata records - Mexico: ~15-25 institutions - Chile: ~10-20 institutions - ISIL codes: <10 total (realistic estimate) #### 1.2 VIAF Enrichment **What VIAF Provides**: - Authority records for organizations - ISIL codes (when available) - Cross-references to national authority files **Action**: For each institution with a VIAF ID, fetch the full VIAF record ```python # Script: scripts/enrich_from_viaf.py import requests import xml.etree.ElementTree as ET def fetch_viaf_record(viaf_id): """ Fetch VIAF record and extract ISIL code if present Args: viaf_id: VIAF identifier (e.g., "123556639") Returns: dict with extracted data including ISIL if available """ url = f"https://viaf.org/viaf/{viaf_id}/viaf.xml" response = requests.get(url) if response.status_code == 200: root = ET.fromstring(response.content) # Look for ISIL in various VIAF fields # VIAF may include ISIL in organizational identifiers data = { 'viaf_id': viaf_id, 'isil': None, # Extract if found 'alt_names': [], 'related_ids': {} } # Parse XML for ISIL references # Format varies - may be in or return data return None ``` **Current Dataset Status**: Check how many institutions already have VIAF IDs ```bash grep -c "VIAF" data/instances/latin_american_institutions.yaml ``` #### 1.3 OpenStreetMap (OSM) Enrichment **What OSM Provides**: - Building coordinates (better than city-level) - Opening hours, contact info - Sometimes website URLs **Action**: For institutions with addresses, query Nominatim/Overpass API ```python # Script: scripts/enrich_from_osm.py import requests def search_osm_by_name_and_location(institution_name, city, country): """ Search OpenStreetMap for institution details Returns: - Precise coordinates - OSM tags (amenity=library, tourism=museum, etc.) - Website URL (if tagged) """ overpass_url = "https://overpass-api.de/api/interpreter" # Overpass QL query query = f""" [out:json]; area["name"="{country}"]->.country; ( node(area.country)["name"~"{institution_name}",i]; way(area.country)["name"~"{institution_name}",i]; relation(area.country)["name"~"{institution_name}",i]; ); out body; """ response = requests.post(overpass_url, data={'data': query}) if response.status_code == 200: return response.json() return None ``` ### Phase 2: Document the Gap (IMMEDIATE) **Action**: Add provenance notes to all 304 institutions documenting ISIL absence ```yaml # Example update to each institution record provenance: data_source: CONVERSATION_NLP data_tier: TIER_4_INFERRED extraction_date: "2025-11-05T14:30:00Z" extraction_method: "AI agent comprehensive extraction" confidence_score: 0.85 conversation_id: "..." notes: >- ISIL code not available. Brazil lacks publicly accessible ISIL registry as of 2025-11-06. National registration agency (Biblioteca Nacional do Brasil or IBICT) has not published ISIL directory. Alternative identifiers used: Wikidata QID, website URL. See docs/isil_enrichment_strategy.md for details. ``` **Script**: `scripts/add_isil_gap_notes.py` ```python import yaml from datetime import datetime def add_isil_gap_documentation(input_file, output_file): """ Add standardized notes about ISIL unavailability to all institutions """ with open(input_file, 'r') as f: institutions = yaml.safe_load(f) gap_notes = { 'BR': "ISIL code not available. Brazil lacks publicly accessible ISIL registry as of 2025-11-06.", 'MX': "ISIL code not available. Mexico lacks publicly accessible ISIL registry as of 2025-11-06.", 'CL': "ISIL code not available. Chile lacks publicly accessible ISIL registry as of 2025-11-06." } for inst in institutions: country = inst.get('locations', [{}])[0].get('country', 'Unknown') if country in gap_notes and inst.get('provenance'): existing_notes = inst['provenance'].get('notes', '') # Append ISIL gap note if not already present if 'ISIL code not available' not in existing_notes: inst['provenance']['notes'] = ( existing_notes + '\n' + gap_notes[country] ).strip() with open(output_file, 'w') as f: yaml.dump(institutions, f, allow_unicode=True, sort_keys=False) print(f"Updated {len(institutions)} institutions with ISIL gap documentation") ``` ### Phase 3: National Agency Outreach (2-4 Weeks) **Action**: Contact national libraries directly #### Outreach Email Template ``` Subject: ISIL Registry Access Request - Global GLAM Dataset Research Project Dear [National Library/Archive], I am writing from the Global GLAM Dataset project, an open research initiative documenting heritage institutions worldwide using LinkML schema and Linked Open Data. We have extracted and geocoded data for [NUMBER] [COUNTRY] institutions from archival research, including: - [Institution examples] To enhance data quality, we seek access to official ISIL (ISO 15511) codes. Questions: 1. Is [INSTITUTION NAME] the designated ISIL national registration agency for [COUNTRY]? 2. Does a public ISIL registry or directory exist for [COUNTRY] institutions? 3. If available, can we access it for academic research purposes? 4. What is the process for obtaining ISIL codes for institutions lacking them? Our dataset is published under CC-BY 4.0 and will credit all data sources. Project repository: https://github.com/[your-repo] Schema documentation: [link to LinkML schema] Thank you for your assistance in advancing global cultural heritage data. Best regards, [Your Name] Global GLAM Dataset Project ``` **Targets**: **Brazil**: - Biblioteca Nacional do Brasil (https://www.bn.gov.br/) - Email: Contact form on website - IBICT (https://www.ibict.br/) - Email: ibict@ibict.br **Mexico**: - Biblioteca Nacional de México (https://www.bnm.unam.mx/) - Email: Through UNAM contact - AMBAC (Asociación Mexicana de Bibliotecarios) **Chile**: - Biblioteca Nacional de Chile (https://www.bibliotecanacional.gob.cl/) - Email: Contact form on website - Servicio Nacional del Patrimonio Cultural (https://www.patrimoniocultural.gob.cl/) **Timeline**: Send emails by 2025-11-13, follow up after 2 weeks ### Phase 4: Generate Unofficial ISIL-Like Codes (Optional) **Only if**: National agencies confirm no registry exists OR no response after 4 weeks **Purpose**: Internal linking and future-proofing **Format**: `{COUNTRY}-Unofficial-{CityCode}{InstitutionType}{Sequence}` **Example**: ```yaml # Museu de Arte do Rio, Brazil identifiers: - identifier_scheme: "ISIL-Unofficial" identifier_value: "BR-Unofficial-RioM001" identifier_url: null notes: >- Unofficial ISIL-like code generated for internal dataset use. Format: BR (country) + Rio (city code) + M (museum) + 001 (sequence). Not an official ISO 15511 ISIL code. Created 2025-11-06 due to absence of public Brazilian ISIL registry. ``` **Provenance Tracking**: ```yaml provenance: data_tier: TIER_4_INFERRED notes: >- Unofficial ISIL-like identifier created for internal use only. Official ISIL code unavailable as Brazil lacks public registry. Code follows ISO 15511 format but is NOT officially registered. ``` **Warning**: Clearly document these are NOT official ISIL codes ### Phase 5: Publish Enriched Dataset (4-6 Weeks) **Deliverables**: 1. Updated `latin_american_institutions.yaml` with: - Wikidata-sourced ISIL codes (if any found) - VIAF-sourced ISIL codes (if any found) - OSM-enriched coordinates and URLs - Documentation of ISIL gap in provenance notes 2. Report: `docs/latin_american_isil_findings.md` - Summary of enrichment results - National agency responses (if any) - Recommendations for future work 3. Statistics: - ISIL codes found: [number]/304 - Wikidata matches: [number]/304 - VIAF enrichments: [number]/304 - OSM coordinate improvements: [number]/304 ## Implementation Checklist **Week 1 (Nov 6-12)**: - [x] Research ISIL availability (COMPLETED) - [ ] Run Wikidata SPARQL queries for BR, MX, CL - [ ] Extract ISIL codes from Wikidata results - [ ] Count existing VIAF IDs in dataset - [ ] Fetch VIAF records for institutions with VIAF IDs **Week 2 (Nov 13-19)**: - [ ] Send outreach emails to national libraries - [ ] Run OSM enrichment for institutions with addresses - [ ] Add ISIL gap documentation to all 304 provenance records - [ ] Update PROGRESS.md with findings **Week 3-4 (Nov 20-Dec 3)**: - [ ] Process responses from national libraries (if any) - [ ] Compile enrichment results - [ ] Generate updated exports (JSON-LD, CSV, GeoJSON) - [ ] Write final report **Optional (if no registries found)**: - [ ] Decide: Generate unofficial ISIL-like codes? (Discuss with stakeholders) - [ ] If yes: Implement unofficial code generator - [ ] Document clearly in schema and exports ## Expected Outcomes ### Realistic Scenario - **ISIL codes found**: 5-15 (via Wikidata/VIAF, ~2-5% coverage) - **Wikidata matches**: 50-80 institutions (~16-26%) - **VIAF enrichments**: 30-50 institutions (~10-16%) - **OSM improvements**: 100-150 institutions (~33-49%) - **National agency responses**: 0-2 (low response rate expected) ### Success Metrics - ✅ Document ISIL gap comprehensively - ✅ Enrich with alternative authoritative identifiers - ✅ Contact national agencies (attempt made) - ✅ Provide clear provenance for all data - ❓ Obtain official ISIL codes (low probability but worth trying) ## Lessons for Future Datasets **Key Insights**: 1. ISIL is decentralized - no global database exists 2. Public registries vary by country (Netherlands: excellent, Brazil/Mexico/Chile: none found) 3. Alternative identifiers (Wikidata, VIAF) are more globally accessible 4. Direct outreach to national agencies is required for ISIL access **Recommendations**: - Prioritize Wikidata/VIAF as primary identifiers for global datasets - Use ISIL when available (e.g., Netherlands, USA, some EU countries) - Document identifier gaps transparently - Consider unofficial codes only as last resort with clear labeling --- **Status**: Strategy defined, implementation starting **Next Update**: After Week 1 enrichment scripts complete **Owner**: Global GLAM Dataset Project Team