133 lines
4.4 KiB
Markdown
133 lines
4.4 KiB
Markdown
# Argentina CONABIP Popular Libraries Dataset
|
|
|
|
## Overview
|
|
**Source**: CONABIP (Comisión Nacional de Bibliotecas Populares)
|
|
**URL**: https://www.conabip.gob.ar/buscador_bp
|
|
**Date Scraped**: November 17, 2025
|
|
**Total Institutions**: 288
|
|
**Coverage**: 22 provinces, 220 cities
|
|
|
|
## Files
|
|
|
|
### Basic Dataset (Complete)
|
|
- **conabip_libraries.csv** (47 KB) - Basic institution data
|
|
- **conabip_libraries.json** (115 KB) - JSON with metadata
|
|
|
|
**Fields**:
|
|
- Registration number (REG)
|
|
- Institution name
|
|
- Province
|
|
- City/Locality
|
|
- Neighborhood
|
|
- Street address
|
|
- Profile URL
|
|
|
|
### Enhanced Dataset (Sample Only)
|
|
- **conabip_libraries_with_profiles_test.csv** (12 KB) - 32 institutions
|
|
- **conabip_libraries_with_profiles_test.json** (24 KB) - 32 institutions
|
|
|
|
**Additional Fields**:
|
|
- Latitude/longitude (from Google Maps)
|
|
- Google Maps URL
|
|
- Services offered (WiFi, computers, workshops, etc.)
|
|
|
|
**Note**: Full enhanced dataset (288 institutions) not yet complete due to scraping timeout constraints. See session summary for details.
|
|
|
|
## Geographic Distribution
|
|
|
|
### Top 10 Provinces
|
|
1. Buenos Aires: 82 institutions (28.5%)
|
|
2. Santa Fe: 61 institutions (21.2%)
|
|
3. Entre Ríos: 27 institutions (9.4%)
|
|
4. Córdoba: 18 institutions (6.3%)
|
|
5. Corrientes: 13 institutions (4.5%)
|
|
6. La Pampa: 12 institutions (4.2%)
|
|
7. Ciudad Autónoma de Buenos Aires: 10 institutions (3.5%)
|
|
8. Jujuy: 8 institutions (2.8%)
|
|
9. Santiago del Estero: 7 institutions (2.4%)
|
|
10. San Juan: 6 institutions (2.1%)
|
|
|
|
### Geographic Spread
|
|
- **220 unique cities** represented
|
|
- **Average**: 13.1 institutions per province
|
|
- **Concentration**: ~50% of institutions in Buenos Aires and Santa Fe provinces
|
|
|
|
## Most Common Institution Names
|
|
|
|
Popular library names honor Argentine historical figures:
|
|
|
|
1. **Domingo Faustino Sarmiento**: 41 institutions
|
|
- 7th President of Argentina (1868-1874)
|
|
- Champion of public education and libraries
|
|
|
|
2. **Bernardino Rivadavia**: 21 institutions
|
|
- 1st President of Argentina (1826-1827)
|
|
- Founding father, education reformer
|
|
|
|
3. **Juan Bautista Alberdi**: 14 institutions
|
|
- Political theorist, author of Argentine Constitution basis
|
|
|
|
4. **Mariano Moreno**: 11 institutions
|
|
- Revolutionary leader, journalist
|
|
|
|
5. **Florentino Ameghino**: 7 institutions
|
|
- Naturalist, paleontologist, anthropologist
|
|
|
|
6. **Bartolomé Mitre**: 7 institutions
|
|
- President (1862-1868), historian, writer
|
|
|
|
## Data Quality
|
|
|
|
### Strengths
|
|
- ✅ Official government source (authoritative)
|
|
- ✅ Clean, structured data
|
|
- ✅ Consistent formatting
|
|
- ✅ Zero parsing errors
|
|
- ✅ Geographic coordinates available (via profile pages)
|
|
|
|
### Limitations
|
|
- ⚠️ Only 288 institutions (may not be comprehensive)
|
|
- ⚠️ Profile data requires additional scraping (slow)
|
|
- ⚠️ Registration numbers not consistently extracted
|
|
- ⚠️ Some fields may be incomplete or empty
|
|
|
|
## Data Tier Classification
|
|
|
|
Per GLAM project schema:
|
|
- **Data Source**: `WEB_SCRAPING`
|
|
- **Data Tier**: `TIER_2_VERIFIED` (official government website)
|
|
- **Institution Type**: `LIBRARY` (popular libraries)
|
|
- **Country Code**: `AR` (Argentina, ISO 3166-1 alpha-2)
|
|
|
|
## Usage Notes
|
|
|
|
### For GLAM Project Integration
|
|
1. Parse CSV/JSON into LinkML `HeritageCustodian` instances
|
|
2. Set `institution_type: LIBRARY`
|
|
3. Map provinces to ISO 3166-2 region codes (AR-B, AR-X, etc.)
|
|
4. Geocode addresses using Nominatim API (if coordinates not available)
|
|
5. Generate GHCIDs: `AR-{ProvinceCode}-{CityCode}-L-{Abbrev}`
|
|
6. Enrich with Wikidata Q-numbers where available
|
|
|
|
### Known Issues
|
|
- Full enhanced dataset incomplete (requires long-running background scrape)
|
|
- Server occasionally slow/unreliable (4 timeouts in 288 requests)
|
|
- Some duplicate names across different cities (need location-based deduplication)
|
|
|
|
## Next Steps
|
|
1. Complete profile scraping for all 288 institutions (run in background)
|
|
2. Parse into LinkML heritage_custodian instances
|
|
3. Enrich with Wikidata/VIAF identifiers
|
|
4. Integrate with global GLAM dataset
|
|
5. Export to RDF/JSON-LD for semantic web
|
|
|
|
## Related Documents
|
|
- `SESSION_SUMMARY_ARGENTINA_CONABIP.md` - Detailed session notes
|
|
- `ARGENTINA_ISIL_INVESTIGATION.md` - Argentina ISIL registry investigation
|
|
- `/scripts/scrapers/scrape_conabip_argentina.py` - Web scraper source code
|
|
- `/tests/scrapers/test_conabip_scraper.py` - Test suite (19 tests, 100% pass)
|
|
|
|
---
|
|
**Scraper**: OpenCODE AI Agent
|
|
**License**: Dataset public domain (Argentine government data)
|
|
**Last Updated**: 2025-11-17
|