glam/scripts/scrapers
kempersc 38354539a6 feat: Add comprehensive harvester for Thüringen archives
- Implemented a new script to extract full metadata from 149 archive detail pages on archive-in-thueringen.de.
- Extracted data includes addresses, emails, phones, directors, collection sizes, opening hours, histories, and more.
- Introduced structured data parsing and error handling for robust data extraction.
- Added rate limiting to respect server load and improve scraping efficiency.
- Results are saved in a JSON format with detailed metadata about the extraction process.
2025-11-20 00:25:45 +01:00
..
batch_scrape_conabip.sh add isil entries 2025-11-19 23:25:22 +01:00
bulgarian_isil_scraper.py add isil entries 2025-11-19 23:25:22 +01:00
consolidate_austrian_data.py add isil entries 2025-11-19 23:25:22 +01:00
create_german_unified_dataset.py add isil entries 2025-11-19 23:25:22 +01:00
crossreference_german_data.py add isil entries 2025-11-19 23:25:22 +01:00
geocode_json_harvest.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_archivportal_d.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_archivportal_d_api.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_ddb_institutions.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_german_isil.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_german_isil_sru.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_nrw_archives.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_nrw_archives_complete.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_nrw_archives_fast.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_swiss_isil.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_thueringen_archives.py add isil entries 2025-11-19 23:25:22 +01:00
harvest_thueringen_archives_comprehensive.py feat: Add comprehensive harvester for Thüringen archives 2025-11-20 00:25:45 +01:00
merge_archivportal_isil.py add isil entries 2025-11-19 23:25:22 +01:00
merge_nrw_to_german_dataset.py add isil entries 2025-11-19 23:25:22 +01:00
merge_thueringen_to_german_dataset.py add isil entries 2025-11-19 23:25:22 +01:00
parse_kb_netherlands_isil.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_agn_argentina.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_belarus_isil.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_belgian_isil.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_belgian_isil_detailed.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_belgian_isil_fast.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_canadian_isil.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_canadian_isil_fast.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_conabip_argentina.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_conabip_resume.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_czech_archives_aron.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_danish_archives_arkivdk.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_danish_archives_playwright.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_switzerland_isil.py add isil entries 2025-11-19 23:25:22 +01:00
scrape_switzerland_isil_resume.py add isil entries 2025-11-19 23:25:22 +01:00