glam/docs/BELGIAN_ISIL_DATASET_FINDINGS.md
2025-11-19 23:25:22 +01:00

498 lines
15 KiB
Markdown

# Belgian ISIL Dataset Findings
**Date**: November 17, 2025
**Status**: ✅ REGISTRIES FOUND - NO BULK DOWNLOAD AVAILABLE
---
## Executive Summary
Belgium has **TWO separate ISIL registries** managed by different authorities, but **NEITHER offers a public bulk CSV/Excel download**. However, both have web search interfaces that can be systematically queried.
---
## Belgian ISIL Registry Structure
### Authority Split
Belgium uses a **dual delegation model**:
| Sector | Authority | Website | ISIL Prefix |
|--------|-----------|---------|-------------|
| **Libraries** (bibliotheken/bibliothèques) | KBR - Koninklijke Bibliotheek / Bibliothèque royale de Belgique | https://isil.kbr.be/ | BE-[A-Z]{3}[0-9]{2} |
| **Archives** (archieven/archives) | Rijksarchief / Archives de l'État | http://isil.arch.be/ | BE-A[0-9]{4} |
**Note**: Museums can register with either authority depending on their primary function.
---
## Dataset 1: KBR ISIL Registry (Libraries)
### Overview
- **Managing Authority**: Koninklijke Bibliotheek van België / Bibliothèque royale de Belgique (KBR)
- **Primary Coverage**: Public libraries, research libraries, museum libraries, specialized libraries
- **Format**: Web database (no bulk export)
- **Access**: https://isil.kbr.be/search.php?lang=en
### Institution Types Covered
- Public libraries (openbare bibliotheken, bibliothèques publiques)
- University libraries (universiteitsbibliotheken, bibliothèques universitaires)
- Research center libraries
- Museum libraries and documentation centers
- Specialized libraries (law, medicine, art, etc.)
### Sample ISIL Codes (Extracted from Search)
```
BE-ANN00 - Antwerp (various institutions)
BE-ANN01 through BE-ANN24 - Antwerp sub-institutions
BE-BRL00 - Brussels institutions
BE-BRL07 - Archives & Bibliothèques de l'Université libre de Bruxelles
BE-BRL10 - Agentschap Kunsten en Erfgoed - Collectie van de Vlaamse Gemeenschap
BE-BRL13 - Archives-Musée du CPAS de Bruxelles
BE-GET08 - Arteveldehogeschool (AHS)
BE-TOI01 - Bibliothèque d'architecture (Tournai)
BE-EKN00 - Antwerp International School (AIS)
BE-TEN00 - Koninklijk Museum voor Midden-Afrika (Tervuren)
```
### Data Fields Available
Based on search results, each ISIL record includes:
- ISIL code (BE-XXXXX format)
- Institution name (official name)
- Alternative names/acronyms
- Walk-up address (street address)
- City/location
- Telephone (area code + number)
- Institution type
- Communication details
### Access Methods
**✅ Available**:
- **Web Search Interface**: https://isil.kbr.be/search.php?lang=en
- Search by: ISIL code, name, location
- Results displayed in HTML tables
- Individual record detail pages
**❌ NOT Available**:
- Bulk CSV/Excel export
- API endpoint
- RSS/Atom feed
- SPARQL endpoint
**🔧 Possible Approach**:
- Systematic web scraping of search results
- Empty search query returns all records: `https://isil.kbr.be/search.php?lang=en&query=`
- Estimated records: 500-1,000 institutions
---
## Dataset 2: Rijksarchief ISIL Registry (Archives)
### Overview
- **Managing Authority**: Rijksarchief in België / Archives de l'État en Belgique
- **Primary Coverage**: Archives (government, corporate, ecclesiastical, private)
- **Format**: Web database (no bulk export)
- **Access**: http://isil.arch.be/
### Institution Types Covered
- National archives (Algemeen Rijksarchief / Archives générales du Royaume)
- Regional/provincial state archives (Rijksarchief in de Provinciën)
- Municipal archives (Stadsarchieven / Archives communales)
- Ecclesiastical archives (church, abbey, seminary archives)
- Corporate archives (bedrijfsarchieven / archives d'entreprise)
- Private archives (particuliere collecties / archives privées)
### Sample ISIL Codes (Extracted from Search)
```
BE-A0510 - Algemeen Rijksarchief (ARA) - National Archives
BE-A2000 through BE-A2003 - Regional state archives
BE-A3000 through BE-A3006 - Provincial state archives
BE-A4000 - Archief en Documentatiecentrum voor het Vlaams-nationalisme (ADVN)
BE-A4001 - Archive (unspecified)
BE-A4002 - Archive (unspecified)
BE-A4003 - Archief voor Hedendaagse Kunst in België, KMSKB (Contemporary Art Archive)
BE-A5000 - Archives de l'Evêché de Liège (Liège Diocesan Archives)
BE-A5001 - Archief Grootseminarie Brugge (AGSB) - Bruges Seminary Archives
BE-A5002 - Aartsbisschoppelijk Archief Mechelen (AAM) - Archdiocese of Mechelen Archives
```
### Data Fields Available
Based on web interface analysis:
- ISIL code (BE-AXXXX format)
- Institution name (Naam/Nom)
- Location (Plaats/Localité)
- Institution type (archive designation)
- Contact information (login required)
### Access Methods
**✅ Available**:
- **Web Search Interface**: http://isil.arch.be/?view=searchisil
- Public search by ISIL code or name
- Login system for institutions to manage their records
- "Already have an ISIL code?" page: http://isil.arch.be/?view=bestaande
**❌ NOT Available**:
- Bulk CSV/Excel export
- API endpoint
- RSS/Atom feed
- Full directory listing
**🔧 Possible Approach**:
- Web scraping of public search results
- Institution login required for full details
- Estimated records: 100-300 archives
---
## ISIL Code Format Patterns
### KBR Library Codes
**Format**: `BE-[CITY_CODE][INST_NUM]`
Examples:
- `BE-ANN00` - Antwerp (ANN = Antwerpen) base institution
- `BE-BRL07` - Brussels (BRL = Bruxelles) institution #7
- `BE-GET08` - Ghent (GET = Gent) institution #8
- `BE-TEN00` - Tervuren (TEN = Tervuren) base institution
**Pattern**:
- 3-letter city code (geographic identifier)
- 2-digit institution number (00 = primary institution, 01+ = additional)
### Rijksarchief Archive Codes
**Format**: `BE-A[TYPE][NUM]`
Examples:
- `BE-A0510` - National Archives (05 = national level, 10 = sequence)
- `BE-A2000` - Regional state archives (20 = regional level)
- `BE-A3000` - Provincial state archives (30 = provincial level)
- `BE-A4000` - Specialized/private archives (40 = non-governmental)
- `BE-A5000` - Ecclesiastical archives (50 = church institutions)
**Pattern**:
- Letter 'A' = Archive
- First 1-2 digits = Archive type/level
- Remaining digits = Sequential number
---
## Comparison with Dutch ISIL System
| Feature | Netherlands | Belgium |
|---------|-------------|---------|
| **Managing Authority** | Nationaal Archief (archives), KB (libraries) | Rijksarchief (archives), KBR (libraries) |
| **ISIL Format** | NL-[CityCode][InstType] (alphanumeric) | BE-[Code][Num] (mixed alphanumeric/numeric) |
| **Bulk Export** | ✅ CSV available (archives) | ❌ NOT AVAILABLE |
| **Public Search** | ✅ Available | ✅ Available |
| **Documentation** | ✅ Good | ⚠️ Limited |
---
## Data Acquisition Strategy
### Option 1: Contact Authorities (RECOMMENDED)
**KBR Contact**:
- Email: isil@kbr.be
- Contact person: Imke Hansen
- Address: Keizerslaan 4, 1000 Brussel
- Phone: +32 (0)2 519 57 41
**Request**:
> Subject: Request for Belgian ISIL Registry Dataset
>
> Dear KBR ISIL Office,
>
> I am working on a global heritage custodian data project (https://github.com/kempersc/glam)
> that aims to create a comprehensive, open dataset of GLAM institutions worldwide using
> LinkML and Linked Data standards.
>
> We have successfully integrated ISIL datasets from the Netherlands (Nationaal Archief)
> and are now seeking to include Belgian heritage institutions.
>
> Could you provide a bulk export (CSV, Excel, or JSON) of the Belgian ISIL registry
> maintained by KBR? We would also appreciate any similar export from the Rijksarchief
> archive registry.
>
> If bulk export is not available, would you permit systematic web scraping of the
> public search interface at https://isil.kbr.be/search.php for research purposes?
>
> The resulting dataset will be published under an open license with full attribution
> to KBR and Rijksarchief as authoritative sources.
>
> Thank you for your consideration.
**Rijksarchief Contact**:
- Website: http://arch.arch.be
- General inquiry: http://arch.arch.be/index.php?l=en&m=contact
- Same request as above, adapted for archives
### Option 2: Web Scraping (Fallback)
**Technical Approach**:
1. **KBR Library Registry**:
```python
# Scrape all ISIL records from KBR
base_url = "https://isil.kbr.be/search.php?lang=en&query="
# Empty query returns all records
response = requests.get(base_url)
# Parse HTML tables
soup = BeautifulSoup(response.text, 'html.parser')
# Extract ISIL codes, names, locations
for row in soup.find_all('a', href=re.compile('data.php')):
isil_code = extract_code(row)
fetch_detail_page(isil_code)
```
2. **Rijksarchief Archive Registry**:
```python
# Scrape archive ISIL records
base_url = "http://isil.arch.be/?view=searchisil"
# Search form submissions
# Extract BE-A#### codes from results
```
3. **Rate Limiting**:
- Respectful delay: 2-3 seconds between requests
- User-Agent header identifying project
- Log all requests for reproducibility
4. **Data Validation**:
- Cross-check with Wikidata (P791 ISIL code property)
- Geocode addresses using Nominatim
- Verify institution types
### Option 3: Wikidata Enrichment
**Query Wikidata for Belgian institutions with ISIL codes**:
```sparql
SELECT ?item ?itemLabel ?isil ?viaf ?location WHERE {
?item wdt:P791 ?isil . # Has ISIL code
FILTER(STRSTARTS(?isil, "BE-")) # Belgian institutions
OPTIONAL { ?item wdt:P214 ?viaf } # VIAF ID
OPTIONAL { ?item wdt:P131 ?location } # Located in
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,nl,fr" }
}
```
**Current Wikidata Coverage** (as of Nov 2025):
- Estimated: 50-100 Belgian heritage institutions with ISIL codes
- Quality: Variable (some records incomplete)
- Usefulness: Good starting point, but not comprehensive
---
## Estimated Dataset Size
| Registry | Estimated Institutions | Completeness |
|----------|----------------------|--------------|
| KBR Libraries | 500-1,000 | High (required for public libraries) |
| Rijksarchief Archives | 100-300 | Medium (many archives unregistered) |
| **Total** | **600-1,300** | **Medium-High** |
**Comparison**:
- Netherlands ISIL registry: ~364 institutions (Nationaal Archief archives only)
- Belgium would add significant European coverage
---
## Integration with GLAM Project
### Schema Mapping
Belgian ISIL records map to the GLAM project's LinkML schema as follows:
**Core HeritageCustodian Fields**:
```yaml
- id: https://w3id.org/heritage/custodian/be/ann00
name: "Institution Name" (from ISIL registry)
institution_type: LIBRARY | ARCHIVE (from registry sector)
alternative_names: ["Acronym", "Alternative Name"]
locations:
- city: "Brussels" (from ISIL Plaats/Localité field)
street_address: "Walk-up address" (from ISIL registry)
country: "BE"
geonames_id: null # To be geocoded
identifiers:
- identifier_scheme: "ISIL"
identifier_value: "BE-BRL07"
identifier_url: "https://isil.kbr.be/data.php?lang=en&id=BRL07"
provenance:
data_source: CSV_REGISTRY
data_tier: TIER_1_AUTHORITATIVE
extraction_date: "2025-11-17T..."
source_url: "https://isil.kbr.be/"
verified_by: "KBR ISIL Office"
```
### GHCID Generation
Belgian GHCID format: `BE-[REGION]-[CITY]-[TYPE]-[ABBREV]`
Examples:
- `BE-BRU-BRX-L-KBR` - KBR, Brussels (Library)
- `BE-VLG-ANT-M-MAF` - Museum aan de Stroom, Antwerp (Museum)
- `BE-WAL-LIE-A-AEL` - Archives de l'Evêché de Liège (Archive)
**Region Codes**:
- BRU = Brussels-Capital Region (Brussels Hoofdstedelijk Gewest)
- VLG = Flemish Region (Vlaams Gewest)
- WAL = Walloon Region (Région wallonne)
---
## Timeline and Milestones
### Phase 1: Data Request (Week 1)
- ✅ Contact KBR ISIL Office (isil@kbr.be)
- ✅ Contact Rijksarchief
- ⏳ Await response (1-2 weeks)
### Phase 2: Acquisition (Weeks 2-3)
- If bulk export provided: Process CSV/Excel
- If scraping required: Implement scraper scripts
- Fallback: Enhance Wikidata SPARQL query
### Phase 3: Processing (Week 4)
- Parse Belgian ISIL records
- Geocode addresses
- Generate GHCIDs
- Cross-link with Wikidata
### Phase 4: Validation (Week 5)
- Validate against schema
- Check for duplicates
- Compare with OpenStreetMap data
- Manual review of uncertain records
### Phase 5: Integration (Week 6)
- Merge with existing GLAM dataset
- Export to RDF/JSON-LD
- Update documentation
- Create Belgium-specific report
---
## Related Resources
### Belgian Heritage Ecosystem
**National Aggregators**:
- **MetaBelgica**: Linked Data platform for Federal Scientific Institutes (https://www.kbr.be/en/projects/metabelgica/)
- **Erfgoedkaart**: Heritage map of Belgium (archives, museums, libraries)
- **Archives Portal Europe**: Includes Belgian archives
**Museum Networks**:
- FARO - Vlaams steunpunt voor cultureel erfgoed (Flemish cultural heritage)
- Federation Wallonie-Bruxelles heritage services
**University Libraries**:
- LIBIS (Flemish university libraries consortium)
- CIUF (Walloon-Brussels university libraries)
### Identifier Systems
| System | Coverage | Belgian Usage |
|--------|----------|---------------|
| ISIL | Libraries, archives, museums | ✅ Widely adopted |
| VIAF | Libraries, archives | ✅ Used by KBR, university libraries |
| Wikidata | All heritage institutions | ⚠️ Partial coverage |
| GRID | Research institutions | ✅ Universities covered |
| ROR | Research organizations | ✅ Growing adoption |
---
## Key Findings Summary
### ✅ What We Found
1. **Two Official ISIL Registries**:
- KBR (libraries): https://isil.kbr.be/
- Rijksarchief (archives): http://isil.arch.be/
2. **Web Search Interfaces Available**:
- Both registries have public search
- Individual record detail pages
- Estimated 600-1,300 total institutions
3. **Clear Authority Split**:
- Libraries → KBR
- Archives → Rijksarchief
- Museums → Either registry (depends on function)
4. **Multilingual Data**:
- Dutch (Flemish), French, German, English
- Institution names in multiple languages
### ❌ What's Missing
1. **No Bulk Export**:
- No CSV/Excel download available
- No public API endpoint
- No SPARQL/Linked Data access
2. **Limited Public Access**:
- Full details may require institution login
- Some contact information restricted
- No data license specified
3. **Incomplete Coverage**:
- Not all Belgian archives registered
- Small museums may lack ISIL codes
- Private collections underrepresented
### 🔧 Recommended Next Steps
1. **Immediate**: Send data request emails to KBR and Rijksarchief
2. **Parallel**: Query Wikidata for existing Belgian ISIL codes
3. **Fallback**: Prepare web scraping scripts if bulk export denied
4. **Long-term**: Contribute enhanced data back to Wikidata
---
## Questions for User
1. **Should I send the data request emails to KBR and Rijksarchief now?**
- Draft emails are ready above
- Can be sent via automated email or you can send manually
2. **Should I implement the web scraping fallback?**
- Respectful scraping with rate limiting
- Only if bulk export request is denied
- Requires ~2-3 hours development time
3. **Should I query Wikidata for Belgian institutions now?**
- Quick SPARQL query to get existing data
- Can enrich with ISIL codes later
- Good baseline for comparison
4. **Which Belgian regions should we prioritize?**
- Brussels-Capital (central, multilingual)
- Flanders (largest population, most institutions)
- Wallonia (French-speaking region)
- All regions (comprehensive coverage)
---
**Next Steps**: Awaiting your decision on data acquisition strategy.