glam/data/isil/AR/ARGENTINA_CONABIP_README.md
2025-11-19 23:25:22 +01:00

133 lines
4.4 KiB
Markdown

# Argentina CONABIP Popular Libraries Dataset
## Overview
**Source**: CONABIP (Comisión Nacional de Bibliotecas Populares)
**URL**: https://www.conabip.gob.ar/buscador_bp
**Date Scraped**: November 17, 2025
**Total Institutions**: 288
**Coverage**: 22 provinces, 220 cities
## Files
### Basic Dataset (Complete)
- **conabip_libraries.csv** (47 KB) - Basic institution data
- **conabip_libraries.json** (115 KB) - JSON with metadata
**Fields**:
- Registration number (REG)
- Institution name
- Province
- City/Locality
- Neighborhood
- Street address
- Profile URL
### Enhanced Dataset (Sample Only)
- **conabip_libraries_with_profiles_test.csv** (12 KB) - 32 institutions
- **conabip_libraries_with_profiles_test.json** (24 KB) - 32 institutions
**Additional Fields**:
- Latitude/longitude (from Google Maps)
- Google Maps URL
- Services offered (WiFi, computers, workshops, etc.)
**Note**: Full enhanced dataset (288 institutions) not yet complete due to scraping timeout constraints. See session summary for details.
## Geographic Distribution
### Top 10 Provinces
1. Buenos Aires: 82 institutions (28.5%)
2. Santa Fe: 61 institutions (21.2%)
3. Entre Ríos: 27 institutions (9.4%)
4. Córdoba: 18 institutions (6.3%)
5. Corrientes: 13 institutions (4.5%)
6. La Pampa: 12 institutions (4.2%)
7. Ciudad Autónoma de Buenos Aires: 10 institutions (3.5%)
8. Jujuy: 8 institutions (2.8%)
9. Santiago del Estero: 7 institutions (2.4%)
10. San Juan: 6 institutions (2.1%)
### Geographic Spread
- **220 unique cities** represented
- **Average**: 13.1 institutions per province
- **Concentration**: ~50% of institutions in Buenos Aires and Santa Fe provinces
## Most Common Institution Names
Popular library names honor Argentine historical figures:
1. **Domingo Faustino Sarmiento**: 41 institutions
- 7th President of Argentina (1868-1874)
- Champion of public education and libraries
2. **Bernardino Rivadavia**: 21 institutions
- 1st President of Argentina (1826-1827)
- Founding father, education reformer
3. **Juan Bautista Alberdi**: 14 institutions
- Political theorist, author of Argentine Constitution basis
4. **Mariano Moreno**: 11 institutions
- Revolutionary leader, journalist
5. **Florentino Ameghino**: 7 institutions
- Naturalist, paleontologist, anthropologist
6. **Bartolomé Mitre**: 7 institutions
- President (1862-1868), historian, writer
## Data Quality
### Strengths
- ✅ Official government source (authoritative)
- ✅ Clean, structured data
- ✅ Consistent formatting
- ✅ Zero parsing errors
- ✅ Geographic coordinates available (via profile pages)
### Limitations
- ⚠️ Only 288 institutions (may not be comprehensive)
- ⚠️ Profile data requires additional scraping (slow)
- ⚠️ Registration numbers not consistently extracted
- ⚠️ Some fields may be incomplete or empty
## Data Tier Classification
Per GLAM project schema:
- **Data Source**: `WEB_SCRAPING`
- **Data Tier**: `TIER_2_VERIFIED` (official government website)
- **Institution Type**: `LIBRARY` (popular libraries)
- **Country Code**: `AR` (Argentina, ISO 3166-1 alpha-2)
## Usage Notes
### For GLAM Project Integration
1. Parse CSV/JSON into LinkML `HeritageCustodian` instances
2. Set `institution_type: LIBRARY`
3. Map provinces to ISO 3166-2 region codes (AR-B, AR-X, etc.)
4. Geocode addresses using Nominatim API (if coordinates not available)
5. Generate GHCIDs: `AR-{ProvinceCode}-{CityCode}-L-{Abbrev}`
6. Enrich with Wikidata Q-numbers where available
### Known Issues
- Full enhanced dataset incomplete (requires long-running background scrape)
- Server occasionally slow/unreliable (4 timeouts in 288 requests)
- Some duplicate names across different cities (need location-based deduplication)
## Next Steps
1. Complete profile scraping for all 288 institutions (run in background)
2. Parse into LinkML heritage_custodian instances
3. Enrich with Wikidata/VIAF identifiers
4. Integrate with global GLAM dataset
5. Export to RDF/JSON-LD for semantic web
## Related Documents
- `SESSION_SUMMARY_ARGENTINA_CONABIP.md` - Detailed session notes
- `ARGENTINA_ISIL_INVESTIGATION.md` - Argentina ISIL registry investigation
- `/scripts/scrapers/scrape_conabip_argentina.py` - Web scraper source code
- `/tests/scrapers/test_conabip_scraper.py` - Test suite (19 tests, 100% pass)
---
**Scraper**: OpenCODE AI Agent
**License**: Dataset public domain (Argentine government data)
**Last Updated**: 2025-11-17