glam/SESSION_SUMMARY_ARGENTINA_Z3950_INVESTIGATION.md
2025-11-19 23:25:22 +01:00

7.4 KiB

Session Summary: Argentina Z39.50 Investigation & Path Forward

Date: 2025-11-18
Status: Investigation Complete - Recommendation Ready

What We Accomplished

1. AGN Scraper Executed Successfully

File: scripts/scrapers/scrape_agn_argentina.py
Output: data/isil/AR/agn_argentina_archives.json

Results:

  • 1 institution: Archivo General de la Nación (National Archive)
  • 2 collections: Main archive + document collections
  • KOHA catalog URLs not accessible (expected)

Data Extracted:

{
  "name": "Archivo General de la Nación",
  "name_en": "National Archive of Argentina",
  "type": "ARCHIVE",
  "country": "AR",
  "city": "Buenos Aires",
  "province": "Ciudad Autónoma de Buenos Aires",
  "url": "https://argentina.gob.ar/interior/archivo-general-de-la-nacion"
}

2. Z39.50 Client Framework Created

File: scripts/query_biblioteca_nacional_z3950.py

Confirmed:

  • Z39.50 server is accessible: 200.123.191.9:9991
  • Database: BNA10 (Authority records)
  • Authentication: Username Z39.50 / Password Z39.50
  • Format: MARC21, UTF-8 encoding

3. Biblioteca Nacional Web Catalog Investigated

URL: https://catalogo.bn.gov.ar/

Findings:

  • Modern web interface with "Catálogo de Autoridades" (Authority Catalog)
  • Contains MARC21 authority records accessible via web browser
  • Z39.50 connection details prominently displayed
  • However: Most authority records appear to be for foreign institutions (e.g., Spanish archives)
  • Search for "archivo" returned mainly: Archivo General de Indias (Spain), Korean Film Archive, etc.
  • Few Argentine institutions found in authority catalog browsing

Critical Discovery: Authority Catalog Limitation

Problem Identified

The Biblioteca Nacional's authority catalog (BNA10 database) appears to contain primarily:

  • Foreign institutions referenced in Argentine bibliographic records
  • International archives and libraries (Spanish, Korean, etc.)
  • Not a comprehensive registry of Argentine institutions

Why This Matters

The investigation document (data/isil/AR/ARGENTINA_ISIL_INVESTIGATION.md) suggested:

"MARC field 024 (Standard Identifier) in authority records contains ISIL codes"

However, the authority catalog is designed for bibliographic authority control (standardizing how foreign institutions are cited), not as a directory of Argentine heritage institutions.

Current Argentina Data Status

Dataset Count Type Coverage Status
CONABIP Libraries 288 Popular libraries Nationwide Scraped + Wikidata enriched
AGN 1 National archive Buenos Aires Scraped (this session)
BN Authority Catalog ~10-50? Mixed (mostly foreign) Limited Argentine ⚠️ Not suitable for bulk extraction

Recommendation: Pivot Strategy

Do NOT Invest in Z39.50 Implementation

Reasons:

  1. Authority catalog contains mainly foreign institutions, not Argentine ones
  2. Estimated yield of Argentine ISIL codes: < 50 institutions (not 200-500 as hoped)
  3. High implementation effort (2-3 hours for PyZ3950 client) for minimal return
  4. Web scraping the authority catalog would be tedious with old OPAC system

INSTEAD: Contact IRAM Directly

Best approach identified in investigation document:

Email: iram-iso@iram.org.ar
Subject: Solicitud de acceso al registro ISIL de Argentina

Rationale:

  • IRAM is the official ISIL agency for Argentina
  • They maintain the authoritative registry of 500-1,000 institutions
  • Direct data export is the most efficient path
  • Precedent: Other countries (Netherlands, Belgium) provide ISIL registries

Email Template (from investigation document):

Estimados,

Soy investigador trabajando en un proyecto de patrimonio cultural global 
y estoy recopilando datos sobre instituciones GLAM en Argentina.

¿Sería posible acceder al registro completo de códigos ISIL asignados 
en Argentina? Cualquier formato (CSV, Excel, PDF) sería útil.

Muchas gracias,
[Your name]

Alternative: University Library Networks

If IRAM doesn't respond, pursue:

  1. SISBI-UBA (University of Buenos Aires library system)

  2. JUBIUNA (National Universities Library Network)

    • Consortium of Argentine university libraries
    • Potential ISIL code aggregation

Technical Artifacts Created

Scripts Created This Session

  1. scripts/scrapers/scrape_agn_argentina.py (264 lines)
  2. scripts/query_biblioteca_nacional_z3950.py (285 lines - framework only)

Data Files Created

  1. data/isil/AR/agn_argentina_archives.json (1 institution + 2 collections)

Z39.50 Client Status

  • Framework: Complete (connection, data structures, MARC parsing)
  • Implementation: Incomplete (requires PyZ3950 library)
  • Recommendation: Do not complete - authority catalog doesn't contain target data

Next Steps (Priority Order)

Immediate (Today)

  1. Send email to IRAM requesting ISIL registry export
  2. Send email to Biblioteca Nacional (dpt@bn.gov.ar) asking for guidance on accessing Argentine ISIL codes

Short-term (This Week)

  1. Complete CONABIP pipeline - Export 288 libraries to LinkML YAML
  2. Add AGN to instances - Convert AGN JSON to LinkML YAML
  3. Document Argentina coverage in main PROGRESS.md

If IRAM Responds Positively

  1. Parse ISIL registry CSV/Excel
  2. Cross-reference with CONABIP + AGN data
  3. Geocode addresses
  4. Export to LinkML YAML

If IRAM Doesn't Respond (2-week timeout)

  1. Investigate SISBI-UBA library directory
  2. Investigate JUBIUNA network
  3. Consider manual extraction from ministerial websites

Files for Next Session

Ready to Process

  • data/isil/AR/conabip_libraries_wikidata_enriched.json (288 libraries)
  • data/isil/AR/agn_argentina_archives.json (1 archive)

Parser Available

  • src/glam_extractor/parsers/argentina_conabip.py

Investigation Complete

  • data/isil/AR/ARGENTINA_ISIL_INVESTIGATION.md (comprehensive research)

Lessons Learned

Authority Catalogs ≠ Institutional Directories

Key Insight: Library authority catalogs are designed for bibliographic control (standardizing citations), not as comprehensive directories of heritage institutions.

Implication: Z39.50 access to authority records is useful for international citation standardization, not for discovering domestic institutions with ISIL codes.

Best Data Sources for ISIL Codes (in priority order)

  1. Official ISIL agency registries (IRAM for Argentina)
  2. National library consortia (SISBI-UBA, JUBIUNA)
  3. Ministry of Culture directories
  4. Web scraping institutional websites
  5. Authority catalogs (foreign institutions only)

Summary

AGN Scraper: Success - 1 institution extracted
Z39.50 Client: ⚠️ Framework created but not implemented
Authority Catalog: Not suitable for Argentine ISIL extraction
Recommended Action: ✉️ Contact IRAM directly for ISIL registry

Estimated Argentina Coverage:

  • Current: 289 institutions (288 CONABIP + 1 AGN)
  • Potential with IRAM registry: 500-1,000 institutions
  • Data quality: TIER_1_AUTHORITATIVE (if IRAM responds)

Next session: Send IRAM email, complete CONABIP LinkML export, add AGN to instances