glam/data/isil
kempersc 38354539a6 feat: Add comprehensive harvester for Thüringen archives
- Implemented a new script to extract full metadata from 149 archive detail pages on archive-in-thueringen.de.
- Extracted data includes addresses, emails, phones, directors, collection sizes, opening hours, histories, and more.
- Introduced structured data parsing and error handling for robust data extraction.
- Added rate limiting to respect server load and improve scraping efficiency.
- Results are saved in a JSON format with detailed metadata about the extraction process.
2025-11-20 00:25:45 +01:00
..
AR add isil entries 2025-11-19 23:25:22 +01:00
austria add isil entries 2025-11-19 23:25:22 +01:00
belgium add isil entries 2025-11-19 23:25:22 +01:00
bosnia add isil entries 2025-11-19 23:25:22 +01:00
bulgaria add isil entries 2025-11-19 23:25:22 +01:00
BY add isil entries 2025-11-19 23:25:22 +01:00
czech_republic add isil entries 2025-11-19 23:25:22 +01:00
denmark add isil entries 2025-11-19 23:25:22 +01:00
EUR add isil entries 2025-11-19 23:25:22 +01:00
germany feat: Add comprehensive harvester for Thüringen archives 2025-11-20 00:25:45 +01:00
japan add isil entries 2025-11-19 23:25:22 +01:00
JP add isil entries 2025-11-19 23:25:22 +01:00
nl add isil entries 2025-11-19 23:25:22 +01:00
switzerland add isil entries 2025-11-19 23:25:22 +01:00
ARGENTINA_ENRICHMENT_COMPLETE.md add isil entries 2025-11-19 23:25:22 +01:00
argentina_enrichments.json add isil entries 2025-11-19 23:25:22 +01:00
argentina_wikidata_institutions.json add isil entries 2025-11-19 23:25:22 +01:00
austria_isil_scraped.csv add isil entries 2025-11-19 23:25:22 +01:00
austria_isil_scraped.json add isil entries 2025-11-19 23:25:22 +01:00
austria_isils_extracted.txt add isil entries 2025-11-19 23:25:22 +01:00
austria_search_page.html add isil entries 2025-11-19 23:25:22 +01:00
BELARUS_ENRICHMENT_SUMMARY.md add isil entries 2025-11-19 23:25:22 +01:00
belarus_enrichments.json add isil entries 2025-11-19 23:25:22 +01:00
BELARUS_FINAL_REPORT.md add isil entries 2025-11-19 23:25:22 +01:00
belarus_isil_complete_dataset.md add isil entries 2025-11-19 23:25:22 +01:00
BELARUS_NEXT_SESSION.md add isil entries 2025-11-19 23:25:22 +01:00
belarus_osm_libraries.json add isil entries 2025-11-19 23:25:22 +01:00
belgian_isil_combined.csv add isil entries 2025-11-19 23:25:22 +01:00
belgian_isil_combined.json add isil entries 2025-11-19 23:25:22 +01:00
belgian_isil_detailed.csv add isil entries 2025-11-19 23:25:22 +01:00
belgian_isil_detailed.json add isil entries 2025-11-19 23:25:22 +01:00
belgian_isil_kbr_libraries.csv add isil entries 2025-11-19 23:25:22 +01:00
belgian_isil_kbr_libraries.json add isil entries 2025-11-19 23:25:22 +01:00
bulgarian_isil_registry.csv add isil entries 2025-11-19 23:25:22 +01:00
bulgarian_isil_registry.json add isil entries 2025-11-19 23:25:22 +01:00
GLOBAL_ISIL_AGENCIES_OFFICIAL.md add isil entries 2025-11-19 23:25:22 +01:00
global_isil_agencies_raw.txt add isil entries 2025-11-19 23:25:22 +01:00
HARVEST_PROGRESS_SUMMARY.md add isil entries 2025-11-19 23:25:22 +01:00
KB_Netherlands_ISIL_2025-04-01.xlsx add isil entries 2025-11-19 23:25:22 +01:00
MASTER_HARVEST_PLAN.md add isil entries 2025-11-19 23:25:22 +01:00
metadata.json add isil entries 2025-11-19 23:25:22 +01:00
NETHERLANDS_ENRICHMENT_COMPLETE.md add isil entries 2025-11-19 23:25:22 +01:00
netherlands_enrichments.json add isil entries 2025-11-19 23:25:22 +01:00
netherlands_wikidata_institutions.json add isil entries 2025-11-19 23:25:22 +01:00
SCRAPER_INVENTORY.md add isil entries 2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_HARVEST_CONTINUATION.md add isil entries 2025-11-19 23:25:22 +01:00
WHAT_WE_DID_TODAY.md add isil entries 2025-11-19 23:25:22 +01:00