glam/data
kempersc 38354539a6 feat: Add comprehensive harvester for Thüringen archives
- Implemented a new script to extract full metadata from 149 archive detail pages on archive-in-thueringen.de.
- Extracted data includes addresses, emails, phones, directors, collection sizes, opening hours, histories, and more.
- Introduced structured data parsing and error handling for robust data extraction.
- Added rate limiting to respect server load and improve scraping efficiency.
- Results are saved in a JSON format with detailed metadata about the extraction process.
2025-11-20 00:25:45 +01:00
..
examples add isil entries 2025-11-19 23:25:22 +01:00
instances add isil entries 2025-11-19 23:25:22 +01:00
isil feat: Add comprehensive harvester for Thüringen archives 2025-11-20 00:25:45 +01:00
jsonld add isil entries 2025-11-19 23:25:22 +01:00
manual_enrichment add isil entries 2025-11-19 23:25:22 +01:00
nde add isil entries 2025-11-19 23:25:22 +01:00
ontology add isil entries 2025-11-19 23:25:22 +01:00
raw add isil entries 2025-11-19 23:25:22 +01:00
rdf add isil entries 2025-11-19 23:25:22 +01:00
reference add isil entries 2025-11-19 23:25:22 +01:00
review add isil entries 2025-11-19 23:25:22 +01:00
wikidata add isil entries 2025-11-19 23:25:22 +01:00
collision_edge_case_analysis.md add isil entries 2025-11-19 23:25:22 +01:00
deduplication_improvement_summary.md add isil entries 2025-11-19 23:25:22 +01:00
dutch_collision_report.txt add isil entries 2025-11-19 23:25:22 +01:00
dutch_collision_stats.json add isil entries 2025-11-19 23:25:22 +01:00
dutch_deduplication_report.txt add isil entries 2025-11-19 23:25:22 +01:00
dutch_institutions_with_ghcids.yaml add isil entries 2025-11-19 23:25:22 +01:00
ISIL-codes_2025-08-01.csv add isil entries 2025-11-19 23:25:22 +01:00
mexican_geography_analysis.yaml add isil entries 2025-11-19 23:25:22 +01:00
temp_conv1_artifact2.md add isil entries 2025-11-19 23:25:22 +01:00
temp_conv2_artifact1.md add isil entries 2025-11-19 23:25:22 +01:00
temp_mexican_conv1.json add isil entries 2025-11-19 23:25:22 +01:00
temp_mexican_conv2.json add isil entries 2025-11-19 23:25:22 +01:00