14 KiB
ISIL Enrichment Strategy for Latin American Institutions
Date: 2025-11-06
Dataset: 304 Latin American institutions (Brazil: 97, Chile: 90, Mexico: 117)
Current ISIL Coverage: 0% (0/304 institutions)
Research Findings
ISIL System Architecture
Decentralized Model: ISIL operates through national registration agencies, not a global database.
Key Authorities:
- Denmark: Slots- og Kulturstyrelsen (international registration authority for ISO 15511)
- USA: Library of Congress (US national agency, uses MARC org codes)
- Europe: data.europa.eu, data.overheid.nl (Dutch ISIL registry - we have this!)
- National: Each country's national library/archive maintains their own registry
Important Discovery: No single global ISIL database exists. Each country maintains its own.
Country-Specific Findings
Brazil (BR-)
- Expected Agency: Biblioteca Nacional do Brasil or IBICT
- Status: No public ISIL registry found via web search
- Challenge: Search results dominated by ISIL (terrorism) references, not library codes
- Recommendation: Direct contact with Biblioteca Nacional do Brasil required
Mexico (MX-)
- Expected Agency: Biblioteca Nacional de México (under UNAM)
- Alternate Contact: Asociación Mexicana de Bibliotecarios (AMBAC)
- Status: No public ISIL registry found
- Search Strategy: Try .gob.mx or .unam.mx domain-specific searches for "código ISIL"
- Recommendation: Contact BNM/UNAM directly
Chile (CL-)
- Expected Agency: Biblioteca Nacional de Chile or DIBAM (now Servicio Nacional del Patrimonio Cultural)
- Status: No public ISIL registry found
- Recommendation: Contact Biblioteca Nacional de Chile
Alternative Enrichment Strategy
Phase 1: Leverage Existing Identifiers (IMMEDIATE - This Week)
Since ISIL codes are unavailable, we'll enrich using identifiers we CAN access:
1.1 Wikidata Enrichment
What We Have: Some institutions already have Wikidata IDs in the dataset.
What We Can Get:
- ISIL codes (Wikidata Property P791) - if they exist
- Additional identifiers (GND, VIAF, Library of Congress)
- Geographic coordinates
- Parent organizations
- Website URLs
Action: Run Wikidata SPARQL queries
# Script: scripts/enrich_from_wikidata.py
import requests
from SPARQLWrapper import SPARQLWrapper, JSON
def query_wikidata_isil(country_qid):
"""
Query Wikidata for institutions with ISIL codes
Args:
country_qid: Wikidata QID for country (Q155=Brazil, Q96=Mexico, Q298=Chile)
"""
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
query = f"""
SELECT DISTINCT ?org ?orgLabel ?isil ?viaf ?website ?coords WHERE {{
?org wdt:P17 wd:{country_qid} . # Country
OPTIONAL {{ ?org wdt:P791 ?isil . }} # ISIL code
OPTIONAL {{ ?org wdt:P214 ?viaf . }} # VIAF ID
OPTIONAL {{ ?org wdt:P856 ?website . }} # Official website
OPTIONAL {{ ?org wdt:P625 ?coords . }} # Coordinates
# Filter for GLAM institutions
VALUES ?type {{
wd:Q7075 # Library
wd:Q166118 # Archive
wd:Q33506 # Museum
wd:Q1007870 # Art gallery
}}
?org wdt:P31/wdt:P279* ?type .
SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en,pt,es". }}
}}
LIMIT 1000
"""
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
return results
Expected Output:
- Brazil: ~20-30 institutions with Wikidata records
- Mexico: ~15-25 institutions
- Chile: ~10-20 institutions
- ISIL codes: <10 total (realistic estimate)
1.2 VIAF Enrichment
What VIAF Provides:
- Authority records for organizations
- ISIL codes (when available)
- Cross-references to national authority files
Action: For each institution with a VIAF ID, fetch the full VIAF record
# Script: scripts/enrich_from_viaf.py
import requests
import xml.etree.ElementTree as ET
def fetch_viaf_record(viaf_id):
"""
Fetch VIAF record and extract ISIL code if present
Args:
viaf_id: VIAF identifier (e.g., "123556639")
Returns:
dict with extracted data including ISIL if available
"""
url = f"https://viaf.org/viaf/{viaf_id}/viaf.xml"
response = requests.get(url)
if response.status_code == 200:
root = ET.fromstring(response.content)
# Look for ISIL in various VIAF fields
# VIAF may include ISIL in organizational identifiers
data = {
'viaf_id': viaf_id,
'isil': None, # Extract if found
'alt_names': [],
'related_ids': {}
}
# Parse XML for ISIL references
# Format varies - may be in <ns1:sources> or <ns1:mainHeadings>
return data
return None
Current Dataset Status: Check how many institutions already have VIAF IDs
grep -c "VIAF" data/instances/latin_american_institutions.yaml
1.3 OpenStreetMap (OSM) Enrichment
What OSM Provides:
- Building coordinates (better than city-level)
- Opening hours, contact info
- Sometimes website URLs
Action: For institutions with addresses, query Nominatim/Overpass API
# Script: scripts/enrich_from_osm.py
import requests
def search_osm_by_name_and_location(institution_name, city, country):
"""
Search OpenStreetMap for institution details
Returns:
- Precise coordinates
- OSM tags (amenity=library, tourism=museum, etc.)
- Website URL (if tagged)
"""
overpass_url = "https://overpass-api.de/api/interpreter"
# Overpass QL query
query = f"""
[out:json];
area["name"="{country}"]->.country;
(
node(area.country)["name"~"{institution_name}",i];
way(area.country)["name"~"{institution_name}",i];
relation(area.country)["name"~"{institution_name}",i];
);
out body;
"""
response = requests.post(overpass_url, data={'data': query})
if response.status_code == 200:
return response.json()
return None
Phase 2: Document the Gap (IMMEDIATE)
Action: Add provenance notes to all 304 institutions documenting ISIL absence
# Example update to each institution record
provenance:
data_source: CONVERSATION_NLP
data_tier: TIER_4_INFERRED
extraction_date: "2025-11-05T14:30:00Z"
extraction_method: "AI agent comprehensive extraction"
confidence_score: 0.85
conversation_id: "..."
notes: >-
ISIL code not available. Brazil lacks publicly accessible ISIL registry
as of 2025-11-06. National registration agency (Biblioteca Nacional do
Brasil or IBICT) has not published ISIL directory. Alternative identifiers
used: Wikidata QID, website URL. See docs/isil_enrichment_strategy.md
for details.
Script: scripts/add_isil_gap_notes.py
import yaml
from datetime import datetime
def add_isil_gap_documentation(input_file, output_file):
"""
Add standardized notes about ISIL unavailability to all institutions
"""
with open(input_file, 'r') as f:
institutions = yaml.safe_load(f)
gap_notes = {
'BR': "ISIL code not available. Brazil lacks publicly accessible ISIL registry as of 2025-11-06.",
'MX': "ISIL code not available. Mexico lacks publicly accessible ISIL registry as of 2025-11-06.",
'CL': "ISIL code not available. Chile lacks publicly accessible ISIL registry as of 2025-11-06."
}
for inst in institutions:
country = inst.get('locations', [{}])[0].get('country', 'Unknown')
if country in gap_notes and inst.get('provenance'):
existing_notes = inst['provenance'].get('notes', '')
# Append ISIL gap note if not already present
if 'ISIL code not available' not in existing_notes:
inst['provenance']['notes'] = (
existing_notes + '\n' + gap_notes[country]
).strip()
with open(output_file, 'w') as f:
yaml.dump(institutions, f, allow_unicode=True, sort_keys=False)
print(f"Updated {len(institutions)} institutions with ISIL gap documentation")
Phase 3: National Agency Outreach (2-4 Weeks)
Action: Contact national libraries directly
Outreach Email Template
Subject: ISIL Registry Access Request - Global GLAM Dataset Research Project
Dear [National Library/Archive],
I am writing from the Global GLAM Dataset project, an open research initiative
documenting heritage institutions worldwide using LinkML schema and Linked Open Data.
We have extracted and geocoded data for [NUMBER] [COUNTRY] institutions from
archival research, including:
- [Institution examples]
To enhance data quality, we seek access to official ISIL (ISO 15511) codes.
Questions:
1. Is [INSTITUTION NAME] the designated ISIL national registration agency for [COUNTRY]?
2. Does a public ISIL registry or directory exist for [COUNTRY] institutions?
3. If available, can we access it for academic research purposes?
4. What is the process for obtaining ISIL codes for institutions lacking them?
Our dataset is published under CC-BY 4.0 and will credit all data sources.
Project repository: https://github.com/[your-repo]
Schema documentation: [link to LinkML schema]
Thank you for your assistance in advancing global cultural heritage data.
Best regards,
[Your Name]
Global GLAM Dataset Project
Targets:
Brazil:
- Biblioteca Nacional do Brasil (https://www.bn.gov.br/)
- Email: Contact form on website
- IBICT (https://www.ibict.br/)
- Email: ibict@ibict.br
Mexico:
- Biblioteca Nacional de México (https://www.bnm.unam.mx/)
- Email: Through UNAM contact
- AMBAC (Asociación Mexicana de Bibliotecarios)
Chile:
- Biblioteca Nacional de Chile (https://www.bibliotecanacional.gob.cl/)
- Email: Contact form on website
- Servicio Nacional del Patrimonio Cultural (https://www.patrimoniocultural.gob.cl/)
Timeline: Send emails by 2025-11-13, follow up after 2 weeks
Phase 4: Generate Unofficial ISIL-Like Codes (Optional)
Only if: National agencies confirm no registry exists OR no response after 4 weeks
Purpose: Internal linking and future-proofing
Format: {COUNTRY}-Unofficial-{CityCode}{InstitutionType}{Sequence}
Example:
# Museu de Arte do Rio, Brazil
identifiers:
- identifier_scheme: "ISIL-Unofficial"
identifier_value: "BR-Unofficial-RioM001"
identifier_url: null
notes: >-
Unofficial ISIL-like code generated for internal dataset use.
Format: BR (country) + Rio (city code) + M (museum) + 001 (sequence).
Not an official ISO 15511 ISIL code. Created 2025-11-06 due to
absence of public Brazilian ISIL registry.
Provenance Tracking:
provenance:
data_tier: TIER_4_INFERRED
notes: >-
Unofficial ISIL-like identifier created for internal use only.
Official ISIL code unavailable as Brazil lacks public registry.
Code follows ISO 15511 format but is NOT officially registered.
Warning: Clearly document these are NOT official ISIL codes
Phase 5: Publish Enriched Dataset (4-6 Weeks)
Deliverables:
-
Updated
latin_american_institutions.yamlwith:- Wikidata-sourced ISIL codes (if any found)
- VIAF-sourced ISIL codes (if any found)
- OSM-enriched coordinates and URLs
- Documentation of ISIL gap in provenance notes
-
Report:
docs/latin_american_isil_findings.md- Summary of enrichment results
- National agency responses (if any)
- Recommendations for future work
-
Statistics:
- ISIL codes found: [number]/304
- Wikidata matches: [number]/304
- VIAF enrichments: [number]/304
- OSM coordinate improvements: [number]/304
Implementation Checklist
Week 1 (Nov 6-12):
- Research ISIL availability (COMPLETED)
- Run Wikidata SPARQL queries for BR, MX, CL
- Extract ISIL codes from Wikidata results
- Count existing VIAF IDs in dataset
- Fetch VIAF records for institutions with VIAF IDs
Week 2 (Nov 13-19):
- Send outreach emails to national libraries
- Run OSM enrichment for institutions with addresses
- Add ISIL gap documentation to all 304 provenance records
- Update PROGRESS.md with findings
Week 3-4 (Nov 20-Dec 3):
- Process responses from national libraries (if any)
- Compile enrichment results
- Generate updated exports (JSON-LD, CSV, GeoJSON)
- Write final report
Optional (if no registries found):
- Decide: Generate unofficial ISIL-like codes? (Discuss with stakeholders)
- If yes: Implement unofficial code generator
- Document clearly in schema and exports
Expected Outcomes
Realistic Scenario
- ISIL codes found: 5-15 (via Wikidata/VIAF, ~2-5% coverage)
- Wikidata matches: 50-80 institutions (~16-26%)
- VIAF enrichments: 30-50 institutions (~10-16%)
- OSM improvements: 100-150 institutions (~33-49%)
- National agency responses: 0-2 (low response rate expected)
Success Metrics
- ✅ Document ISIL gap comprehensively
- ✅ Enrich with alternative authoritative identifiers
- ✅ Contact national agencies (attempt made)
- ✅ Provide clear provenance for all data
- ❓ Obtain official ISIL codes (low probability but worth trying)
Lessons for Future Datasets
Key Insights:
- ISIL is decentralized - no global database exists
- Public registries vary by country (Netherlands: excellent, Brazil/Mexico/Chile: none found)
- Alternative identifiers (Wikidata, VIAF) are more globally accessible
- Direct outreach to national agencies is required for ISIL access
Recommendations:
- Prioritize Wikidata/VIAF as primary identifiers for global datasets
- Use ISIL when available (e.g., Netherlands, USA, some EU countries)
- Document identifier gaps transparently
- Consider unofficial codes only as last resort with clear labeling
Status: Strategy defined, implementation starting
Next Update: After Week 1 enrichment scripts complete
Owner: Global GLAM Dataset Project Team