GLAM heritage institution data extraction and management

Find a file

kempersc 55e2cd2340 feat: implement LLM-based extraction for Archives Lab content - Introduced `llm_extract_archiveslab.py` script for entity and relationship extraction using LLMAnnotator with GLAM-NER v1.7.0. - Replaced regex-based extraction with generative LLM inference. - Added functions for loading markdown content, converting annotation sessions to dictionaries, and generating extraction statistics. - Implemented comprehensive logging of extraction results, including counts of entities, relationships, and specific types like heritage institutions and persons. - Results and statistics are saved in JSON format for further analysis.		2025-12-05 23:16:21 +01:00
.github/workflows	update entries	2025-11-30 23:30:29 +01:00
.opencode	update enriched entries	2025-12-03 17:38:46 +01:00
archive	annotation standards added	2025-12-05 15:30:23 +01:00
data	feat: implement LLM-based extraction for Archives Lab content	2025-12-05 23:16:21 +01:00
docs	improve annotator	2025-12-05 16:25:39 +01:00
exa-mcp-server-source@4aeb0543f9	enrich entries	2025-12-01 00:37:24 +01:00
examples	add isil entries	2025-11-19 23:25:22 +01:00
frontend	feat: implement LLM-based extraction for Archives Lab content	2025-12-05 23:16:21 +01:00
infrastructure	validate enrichments	2025-12-02 14:36:01 +01:00
mcp-wikidata@230e0456d2	add isil entries	2025-11-19 23:25:22 +01:00
mcp_servers	feat: Add script to generate Mermaid ER diagrams with instance data from LinkML schemas	2025-12-01 16:58:03 +01:00
ontology	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
package	add isil entries	2025-11-19 23:25:22 +01:00
reports	improve annotation prompt	2025-12-05 15:51:39 +01:00
schemas	annotation standards added	2025-12-05 15:30:23 +01:00
scripts	feat: implement LLM-based extraction for Archives Lab content	2025-12-05 23:16:21 +01:00
src/glam_extractor	improve annotator	2025-12-05 16:25:39 +01:00
tests	improve annotator	2025-12-05 16:25:39 +01:00
.gitignore	update enriched entries	2025-12-03 17:38:46 +01:00
.ignore	add pids	2025-12-01 23:55:55 +01:00
ADVANCED_LAYOUT_OPTIONS_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
AGENTS.md	update enriched entries	2025-12-03 17:38:46 +01:00
analyze_brazil_batch13_candidates.py	add isil entries	2025-11-19 23:25:22 +01:00
APPELLATION_IDENTIFIER_REFACTORING_20251122.md	Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.	2025-11-22 15:35:35 +01:00
APPELLATION_REFACTORING_PHASE2_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
archive_log.txt	update entries	2025-11-30 23:30:29 +01:00
AUSTRIAN_ISIL_DEDUPLICATION_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
AUSTRIAN_ISIL_QUICK_START.md	add isil entries	2025-11-19 23:25:22 +01:00
AUSTRIAN_ISIL_SESSION_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
AUSTRIAN_ISIL_SESSION_COMPLETE_BATCH1.md	add isil entries	2025-11-19 23:25:22 +01:00
AUSTRIAN_ISIL_SESSION_CONTINUED_20251118.md	add isil entries	2025-11-19 23:25:22 +01:00
AUSTRIAN_ISIL_SESSION_HANDOFF_20251118.md	add isil entries	2025-11-19 23:25:22 +01:00
AUSTRIAN_ISIL_SESSION_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
AUXILIARY_CLASSES_COMPLETE.md	feat: Add platform_description slot for detailed auxiliary platform information	2025-11-26 10:18:16 +01:00
BATCH12_ENRICHMENT_REPORT.md	add isil entries	2025-11-19 23:25:22 +01:00
BATCH13_ENRICHMENT_REPORT.md	add isil entries	2025-11-19 23:25:22 +01:00
BATCH14_ENRICHMENT_REPORT.md	add isil entries	2025-11-19 23:25:22 +01:00
BEFORE_AFTER_MERMAID_COMPARISON.md	add classes	2025-11-25 12:48:07 +01:00
BELGIAN_ISIL_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
BRAZILIAN_CURATION_SESSION_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
BULGARIAN_ISIL_EXTRACTION_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
CANADIAN_ENRICHMENT_GUIDE.md	add isil entries	2025-11-19 23:25:22 +01:00
CANADIAN_GEOCODING_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
CANADIAN_INTEGRATION_REPORT.md	add isil entries	2025-11-19 23:25:22 +01:00
CANADIAN_ISIL_SUCCESS.md	add isil entries	2025-11-19 23:25:22 +01:00
CHANGES_SUMMARY_20251122.txt	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
check_geocoding_progress.py	add isil entries	2025-11-19 23:25:22 +01:00
check_scraper_status.sh	add isil entries	2025-11-19 23:25:22 +01:00
CHILEAN_BATCH1_REPORT.md	add isil entries	2025-11-19 23:25:22 +01:00
CLEANUP_MERMAID_FILES.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
COLLECTION_DEPARTMENT_INTEGRATION_COMPLETE_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
compare_dutch_datasets.py	add isil entries	2025-11-19 23:25:22 +01:00
COMPLETE_SCHEMA_DIAGRAM_SESSION_SUMMARY.md	add classes	2025-11-25 12:48:07 +01:00
COMPLETE_SCHEMA_MERMAID_GENERATION.md	add classes	2025-11-25 12:48:07 +01:00
COMPLETE_SESSION_OVERVIEW_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
CONTRIBUTING.md	add isil entries	2025-11-19 23:25:22 +01:00
convert_canadian_to_linkml.py	add isil entries	2025-11-19 23:25:22 +01:00
COUNTRY_CLASS_IMPLEMENTATION_COMPLETE.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
COUNTRY_RESTRICTION_IMPLEMENTATION.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
COUNTRY_RESTRICTION_QUICKSTART.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
CRITICAL_ARCHITECTURAL_FIX_PROV.md	Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.	2025-11-22 15:35:35 +01:00
CRITICAL_FIX_TYPED_RANGES.md	updated schemata	2025-11-21 22:12:33 +01:00
crosslink_dutch_datasets.py	add isil entries	2025-11-19 23:25:22 +01:00
curate_brazilian_institutions.py	add isil entries	2025-11-19 23:25:22 +01:00
curate_chilean_institutions.md	add isil entries	2025-11-19 23:25:22 +01:00
CURATION_STATUS.md	add isil entries	2025-11-19 23:25:22 +01:00
CUSTODIAN_COLLECTION_ADDITION_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
CUSTODIAN_MULTI_ASPECT_REFACTORING.md	feat: Refactor Heritage Custodian Ontology to Multi-Aspect Model	2025-11-22 15:40:17 +01:00
CUSTODIAN_TYPE_ONTOLOGY_ALIGNMENT.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
CUSTODIAN_TYPE_ONTOLOGY_ALIGNMENT_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
CUSTODIAN_TYPE_PHASE2_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
CUSTODIAN_TYPE_PHASE2_PROGRESS_20251123.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
CUSTODIAN_TYPE_PHASE2_SESSION3_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
CUSTODIAN_TYPE_PHASE2_SESSION4_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
CUSTODIAN_TYPE_PHASE2_SESSION5_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
CUSTODIAN_TYPE_PHASE2_SESSION5_EXTENDED_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
CUSTODIAN_TYPE_PHASE2_SESSION_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
CZECH_ARCHIVES_INVESTIGATION.md	add isil entries	2025-11-19 23:25:22 +01:00
CZECH_ARCHIVES_NEXT_ACTIONS.md	add isil entries	2025-11-19 23:25:22 +01:00
CZECH_ARON_API_INVESTIGATION.md	add isil entries	2025-11-19 23:25:22 +01:00
CZECH_CROSSLINK_REPORT.md	add isil entries	2025-11-19 23:25:22 +01:00
CZECH_ISIL_COMPLETE_REPORT.md	add isil entries	2025-11-19 23:25:22 +01:00
CZECH_ISIL_HARVEST_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
CZECH_ISIL_NEXT_STEPS.md	add isil entries	2025-11-19 23:25:22 +01:00
CZECH_ISIL_WIKIDATA_EXTRACTION.md	updated schemata	2025-11-21 22:12:33 +01:00
CZECH_PRIORITY1_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
CZECH_WIKIDATA_ENRICHMENT_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
D3JS_UML_VISUALIZATION_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
DAGRE_GRID_LAYOUT_IMPLEMENTATION.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
DAGRE_RANKER_EXPLAINED.md	add classes	2025-11-25 12:48:07 +01:00
deduplicate_brazilian_institutions.py	Deduplicate Brazilian institutions (212→121)	2025-11-11 22:08:34 +01:00
DENMARK_QUICK_REFERENCE.md	add isil entries	2025-11-19 23:25:22 +01:00
DIGITAL_PLATFORM_CLASS_COMPLETE.md	add classes	2025-11-25 12:48:07 +01:00
DIGITAL_PLATFORM_CLASS_COMPLETE_v1.md	add classes	2025-11-25 12:48:07 +01:00
EDGE_DIRECTIONALITY_IMPLEMENTATION.md	add classes	2025-11-25 12:48:07 +01:00
EDGE_DIRECTIONALITY_QUICK_GUIDE.md	add classes	2025-11-25 12:48:07 +01:00
EDGE_DIRECTIONALITY_SESSION_COMPLETE.md	add classes	2025-11-25 12:48:07 +01:00
EDGE_TESTING_MCP_ANALYSIS.md	add classes	2025-11-25 12:48:07 +01:00
EDGE_TESTING_MCP_ANALYSIS_SUMMARY.md	add classes	2025-11-25 12:48:07 +01:00
ENCOMPASSING_BODY_FIXES_COMPLETE.md	add classes	2025-11-25 12:48:07 +01:00
ENCOMPASSING_BODY_IMPLEMENTATION_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
ENCOMPASSING_BODY_INTEGRATION_STATUS.md	add classes	2025-11-25 12:48:07 +01:00
ENCOMPASSING_BODY_RDF_UML_GENERATION.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
enrich_brazil_batch11.py	add isil entries	2025-11-19 23:25:22 +01:00
enrich_brazil_batch12.py	add isil entries	2025-11-19 23:25:22 +01:00
enrich_brazil_batch13.py	add isil entries	2025-11-19 23:25:22 +01:00
enrich_brazil_batch17.py	add isil entries	2025-11-19 23:25:22 +01:00
enrich_bulgaria_isil.py	add isil entries	2025-11-19 23:25:22 +01:00
enrich_geocoding.py	add isil entries	2025-11-19 23:25:22 +01:00
enrich_japan_fast.py	add isil entries	2025-11-19 23:25:22 +01:00
enrich_japan_isil.py	add isil entries	2025-11-19 23:25:22 +01:00
enrichment_force_log.txt	add pids	2025-12-01 23:55:55 +01:00
enrichment_log.txt	Add new enums for PersonalCollectionType, ResearchCenterType, and TasteScentHeritage classifications; implement validation script for custodian names against authoritative sources	2025-12-01 18:39:22 +01:00
enrichment_log_fixed.txt	annotation standards added	2025-12-05 15:30:23 +01:00
EXA_BUG_FIX.md	add isil entries	2025-11-19 23:25:22 +01:00
EXECUTIVE_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
EXECUTIVE_SUMMARY_UML_EDGE_DIRECTIONALITY.md	add classes	2025-11-25 12:48:07 +01:00
export_bulgaria_rdf.py	add isil entries	2025-11-19 23:25:22 +01:00
EXPORT_FUNCTIONALITY_IMPLEMENTATION.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
extract_brazilian_institutions.py	add isil entries	2025-11-19 23:25:22 +01:00
extract_brazilian_institutions_v2.py	add isil entries	2025-11-19 23:25:22 +01:00
extract_conversations_batch.py	add isil entries	2025-11-19 23:25:22 +01:00
extract_mexican_glams.py	add isil entries	2025-11-19 23:25:22 +01:00
extract_mexican_glams_v2.py	add isil entries	2025-11-19 23:25:22 +01:00
extraction_log.txt	validate enrichments	2025-12-02 14:36:01 +01:00
extraction_log_session3.txt	validate enrichments	2025-12-02 14:36:01 +01:00
FEATUREPLACE_IMPLEMENTATION_COMPLETE.md	Add SHACL validation shapes and validation script for Heritage Custodian Ontology	2025-11-22 23:22:10 +01:00
FEATUREPLACE_ONTOLOGY_MAPPING_COMPLETE.md	Add SHACL validation shapes and validation script for Heritage Custodian Ontology	2025-11-22 23:22:10 +01:00
FEATUREPLACE_ONTOLOGY_MAPPING_STRATEGY.md	Add SHACL validation shapes and validation script for Heritage Custodian Ontology	2025-11-22 23:22:10 +01:00
FINAL_CLARIFICATION_MERMAID_OUTPUTS.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
FINAL_SESSION_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
find_brazil_bonus.py	add isil entries	2025-11-19 23:25:22 +01:00
find_brazil_institutions.py	add isil entries	2025-11-19 23:25:22 +01:00
fix_heritage_linked_pubs.py	add isil entries	2025-11-19 23:25:22 +01:00
FOUR_ASPECT_ARCHITECTURE_QUICK_REF.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
generate_comparison_report.py	add isil entries	2025-11-19 23:25:22 +01:00
generate_geocoding_report.py	add isil entries	2025-11-19 23:25:22 +01:00
GEOCODING_SESSION_2025-11-07.md	add isil entries	2025-11-19 23:25:22 +01:00
GEOCODING_SESSION_2025-11-07_RESUMED.md	add isil entries	2025-11-19 23:25:22 +01:00
GEOGRAPHIC_RESTRICTION_COMPLETE.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
GEOGRAPHIC_RESTRICTION_QUICK_STATUS.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
GEOGRAPHIC_RESTRICTION_SESSION_COMPLETE.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
GERMAN_HARVEST_STATUS.md	updated schemata	2025-11-21 22:12:33 +01:00
GERMAN_REGIONAL_ARCHIVE_PORTALS_DISCOVERY.md	add isil entries	2025-11-19 23:25:22 +01:00
GERMAN_STATE_EXTRACTION_PATTERN.md	updated schemata	2025-11-21 22:12:33 +01:00
HUB_ARCHITECTURE_DIAGRAM.md	Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.	2025-11-22 15:35:35 +01:00
HYPERNYMS_REMOVAL_COMPLETE.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
IMPLEMENTATION_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
ISIL_HARVEST_STATUS_20251119.md	add isil entries	2025-11-19 23:25:22 +01:00
JAPAN_WIKIDATA_ENRICHMENT_STRATEGY.md	updated schemata	2025-11-21 22:12:33 +01:00
LAYOUT_OPTIONS_QUICK_REFERENCE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
LEGAL_RESPONSIBILITY_COLLECTION_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
LEGAL_RESPONSIBILITY_COLLECTION_QUICKSTART.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
LIBYA_ENRICHMENT_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
LIBYA_WIKIDATA_CLEANUP_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
LIBYA_WIKIDATA_CREATION_STATUS.md	add isil entries	2025-11-19 23:25:22 +01:00
LIBYA_WIKIDATA_ENRICHMENT_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
LICENSE	add isil entries	2025-11-19 23:25:22 +01:00
LINKML_CONSTRAINTS_COMPLETE_20251122.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
LINKML_VISUALIZATION_SESSION_COMPLETE_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
MAIN_SCHEMA_RDF_GENERATION_COMPLETE.md	add classes	2025-11-25 12:48:07 +01:00
MANUAL_TESTING_RESULTS.md	add classes	2025-11-25 12:48:07 +01:00
merge_batch13_corrected.py	add isil entries	2025-11-19 23:25:22 +01:00
merge_batch14.py	add isil entries	2025-11-19 23:25:22 +01:00
merge_batch15.py	add isil entries	2025-11-19 23:25:22 +01:00
merge_brazil_batch13.py	add isil entries	2025-11-19 23:25:22 +01:00
MERMAID_GENERATORS_EXPLAINED.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
mexican_glam_1.json	add isil entries	2025-11-19 23:25:22 +01:00
mexican_glam_2.json	add isil entries	2025-11-19 23:25:22 +01:00
mexican_glam_extracted.json	add isil entries	2025-11-19 23:25:22 +01:00
MIGRATION_CHECKLIST_ISO20275.md	updated schemata	2025-11-21 22:12:33 +01:00
MIGRATION_COMPLETED_v0.2.2.md	add isil entries	2025-11-19 23:25:22 +01:00
MNEMONIC_CORRECTION.md	add isil entries	2025-11-19 23:25:22 +01:00
NEXT_AGENT_HANDOFF_NRW_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
NEXT_AGENT_HANDOFF_SAXONY_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
NEXT_SESSION_HANDOFF.md	updated schemata	2025-11-21 22:12:33 +01:00
NEXT_STEPS.md	add isil entries	2025-11-19 23:25:22 +01:00
NEXT_STEPS_Mexican_Geocoding.md	add isil entries	2025-11-19 23:25:22 +01:00
NRW_HARVEST_COMPLETE_20251119.md	add isil entries	2025-11-19 23:25:22 +01:00
ONTOLOGY_CONSULTATION_REPORT_CUSTODIAN_TYPE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
ONTOLOGY_ENRICHMENT_PLAN.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
ONTOLOGY_RULES_SUMMARY.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
ORGANIZATIONAL_CHANGE_EVENT_COMPLETE_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
ORGANIZATIONAL_STRUCTURE_ADDITION_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
ORGANIZATIONAL_STRUCTURE_COMPLETE_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
ORGANIZATIONAL_STRUCTURE_EXAMPLES.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
osm_resume_log.txt	add isil entries	2025-11-19 23:25:22 +01:00
parse_eu_isil.py	add isil entries	2025-11-19 23:25:22 +01:00
parse_japan_isil.py	add isil entries	2025-11-19 23:25:22 +01:00
PHASE1_QUICK_WINS_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
PICO_STAFF_ROLES_COMPLETE_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
process_chilean_institutions.py	add isil entries	2025-11-19 23:25:22 +01:00
process_mexican_institutions.py	add isil entries	2025-11-19 23:25:22 +01:00
PROGRESS.md	update entries	2025-11-30 23:30:29 +01:00
pyproject.toml	annotation standards added	2025-12-05 15:30:23 +01:00
QUERY_BUILDER_LAYOUT_FIX.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
QUICK_ACTION_PLAN_GERMAN_REGIONAL_HARVESTS.md	add isil entries	2025-11-19 23:25:22 +01:00
QUICK_ACTION_PLAN_UML_TESTING.md	add classes	2025-11-25 12:48:07 +01:00
QUICK_REFERENCE_SESSION_COMPLETE.md	add classes	2025-11-25 12:48:07 +01:00
QUICK_REFERENCE_VALIDATION.md	updated schemata	2025-11-21 22:12:33 +01:00
QUICK_START_AUSTRALIA.md	add isil entries	2025-11-19 23:25:22 +01:00
QUICK_START_DAGRE_TESTING.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
QUICK_STATUS_20251119.md	add isil entries	2025-11-19 23:25:22 +01:00
QUICK_STATUS_20251119_POST_NRW.md	add isil entries	2025-11-19 23:25:22 +01:00
QUICK_STATUS_APPELLATION_IDENTIFIER_COMPLETE.md	Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.	2025-11-22 15:35:35 +01:00
QUICK_STATUS_BAVARIA_DECISION.md	updated schemata	2025-11-21 22:12:33 +01:00
QUICK_STATUS_COMPLETE_MERMAID_GENERATION.md	add classes	2025-11-25 12:48:07 +01:00
QUICK_STATUS_COUNTRY_CLASS_20251122.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
QUICK_STATUS_CUSTODIAN_SCHEMA_20251121.md	updated schemata	2025-11-21 22:12:33 +01:00
QUICK_STATUS_CUSTODIAN_SCHEMA_MOD	updated schemata	2025-11-21 22:12:33 +01:00
QUICK_STATUS_CUSTODIAN_SCHEMA_MOD_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
QUICK_STATUS_CUSTODIAN_TYPE_20251123.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
QUICK_STATUS_EDGE_TESTING.md	add classes	2025-11-25 12:48:07 +01:00
QUICK_STATUS_EXPORT_COMPLETE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
QUICK_STATUS_FEATUREPLACE_COMPLETE.md	Add SHACL validation shapes and validation script for Heritage Custodian Ontology	2025-11-22 23:22:10 +01:00
QUICK_STATUS_HYPERNYMS_REMOVAL_20251122.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
QUICK_STATUS_LEGAL_ENTITY_20251122.md	Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.	2025-11-22 15:35:35 +01:00
QUICK_STATUS_MAIN_SCHEMA_RDF_20251124.md	add classes	2025-11-25 12:48:07 +01:00
QUICK_STATUS_ORGANIZATIONAL_COMPLETE_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
QUICK_STATUS_ORGANIZATIONAL_STRUCTURE_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
QUICK_STATUS_SCHEMA_MODULARIZATION_DONE_20251121.md	updated schemata	2025-11-21 22:12:33 +01:00
QUICK_STATUS_SLOT_USAGE_COMPLETE_20251121.md	updated schemata	2025-11-21 22:12:33 +01:00
QUICK_STATUS_TOOIONT_20251121.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
QUICK_STATUS_UML_GENERATION_20251123.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
RDF_GENERATION_SUMMARY.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
RDF_UML_GENERATION_COMPLETE_20251122_155319.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
RDF_UML_GENERATION_COMPLETE_20251122_old.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
README.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
RECORD_COMPARISON.md	add isil entries	2025-11-19 23:25:22 +01:00
RESUME_CHILEAN_ENRICHMENT.md	add isil entries	2025-11-19 23:25:22 +01:00
run_scraper_background.sh	add isil entries	2025-11-19 23:25:22 +01:00
RUNNING_THE_APPLICATION.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
SACHSEN_ANHALT_96_PERCENT_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
sachsen_anhalt_100percent_log.txt	updated schemata	2025-11-21 22:12:33 +01:00
SACHSEN_ANHALT_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
sachsen_anhalt_enrichment_v2_log.txt	updated schemata	2025-11-21 22:12:33 +01:00
SAXONY_HARVEST_STRATEGY.md	updated schemata	2025-11-21 22:12:33 +01:00
SCHEMA_AUTHORITY_CHECKLIST.md	updated schemata	2025-11-21 22:12:33 +01:00
SCRAPER_COMPLETION_INSTRUCTIONS.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION-RESUME.md	add isil entries	2025-11-19 23:25:22 +01:00
session-ses_52a6.md	update entries	2025-11-30 23:30:29 +01:00
session-ses_52ff.md	archive websites	2025-11-29 18:05:16 +01:00
SESSION_2025-11-09_SCHEMA_ONTOLOGY_UPDATE.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
SESSION_COMPLETE.md	update entries	2025-11-30 23:30:29 +01:00
SESSION_COMPLETE_20251122_APPELLATION_PHASE2.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
SESSION_COMPLETE_20251122_COLLECTION.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
SESSION_COMPLETE_ARGENTINA_ENRICHMENT.txt	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_COMPLETE_COMPLETE_MERMAID_EXTENSION.md	add classes	2025-11-25 12:48:07 +01:00
SESSION_COMPLETE_ENCOMPASSING_BODY.md	add classes	2025-11-25 12:48:07 +01:00
SESSION_COMPLETE_ENCOMPASSING_BODY_MAIN_SCHEMA.md	add classes	2025-11-25 12:48:07 +01:00
SESSION_COMPLETION_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_CONTINUATION_SUMMARY_20251119.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_2025-11-05.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
SESSION_SUMMARY_2025-11-05_batch_processing.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_2025-11-06_Chilean_Geocoding.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_2025-11-07.md	update entries	2025-11-30 23:30:29 +01:00
SESSION_SUMMARY_2025-11-08.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_2025-11-08_LATAM.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_2025-11-09.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251111_BRAZIL_MERGE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251112_BRAZIL_DOCUMENTATION.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251113_MEXICO_BATCH2.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251113_MEXICO_RECONCILIATION.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251118_ARGENTINA_LINKML_EXPORT.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251118_AUSTRALIA_TROVE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251118_ISIL_PROCESSING.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_ARCHIVES_DISCOVERY.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_AUSTRIAN_CONSOLIDATION.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_AUTOMATED_SPOT_CHECKS.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251119_CANADIAN_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_CZECH_ARCHIVES_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_CZECH_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_CZECH_WIKIDATA_ENRICHMENT_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251119_DDB_HARVEST_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_DENMARK_ARCHIVES_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_DENMARK_COMPLETE.md	update entries	2025-11-30 23:30:29 +01:00
SESSION_SUMMARY_20251119_DENMARK_ISIL_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_NRW_MERGE_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_PREFILL_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251119_PRIORITY1_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_RDF_WIKIDATA_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_UNIFICATION_COMPLETE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251119_WIKIDATA_VALIDATION_PACKAGE.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_20251120_BAVARIA_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251120_BAVARIA_ENRICHMENT.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251120_BAVARIA_ENRICHMENT_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251120_FINLAND_UNIFIED.md	update entries	2025-11-30 23:30:29 +01:00
SESSION_SUMMARY_20251120_JAPAN_SYNTHETIC_QNUMBER_CLEANUP.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251120_JAPAN_WIKIDATA_ENRICHMENT_COMPLETION.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251120_PHASE2_CRITICAL_FIXES.md	update entries	2025-11-30 23:30:29 +01:00
SESSION_SUMMARY_20251120_SACHSEN_ANHALT_STARTED.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251120_SACHSEN_ARCHIVES.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251120_SAXONY_FOUNDATION.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251120_SAXONY_MUSEUMS_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251120_THUERINGEN_100_PERCENT.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_CUSTODIAN_RENAMING.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
SESSION_SUMMARY_20251121_DBPEDIA_INTEGRATION_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_ENUM_SLOT_USAGE_MAPPINGS.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_ISO20275_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_LINKML_HUB_ARCHITECTURE_COMPLETE.md	Add UML diagrams for Custodian Hub v2 in Mermaid and PlantUML formats	2025-11-21 22:30:07 +01:00
SESSION_SUMMARY_20251121_NAME_ENTITY_FOUNDATION_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_NARROW_MAPPINGS_EXTENSION.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_OBSERVATION_RECONSTRUCTION_CONTINUATION.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_OBSERVATION_RECONSTRUCTION_PATTERN.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
SESSION_SUMMARY_20251121_PLANTUML_BUG_FIX.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_SCHEMA_AUTHORITY_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_SCHEMA_CONSOLIDATION.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_SCHEMA_METADATA_REFINEMENT.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
SESSION_SUMMARY_20251121_SCHEMA_MODULARIZATION_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_SLOT_URI_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_SLOT_USAGE_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_TIMESPAN_INTEGRATION.md	updated schemata	2025-11-21 22:12:33 +01:00
SESSION_SUMMARY_20251121_TOOIONT_INTEGRATION.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
SESSION_SUMMARY_20251122_APPELLATION_IDENTIFIER_REFACTORING.md	Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.	2025-11-22 15:35:35 +01:00
SESSION_SUMMARY_20251122_CUSTODIAN_MULTI_ASPECT.md	feat: Refactor Heritage Custodian Ontology to Multi-Aspect Model	2025-11-22 15:40:17 +01:00
SESSION_SUMMARY_20251122_LEGAL_ENTITY_IMPLEMENTATION.md	Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.	2025-11-22 15:35:35 +01:00
SESSION_SUMMARY_20251122_LEGAL_ENTITY_REFACTORING.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
SESSION_SUMMARY_20251122_LEGAL_ENTITY_REFACTORING_COMPLETE.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
SESSION_SUMMARY_20251122_SOURCEDOCUMENT_ONTOLOGY_ENRICHMENT.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
SESSION_SUMMARY_20251123.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
SESSION_SUMMARY_20251125_UML_EDGE_DIRECTIONALITY.md	add classes	2025-11-25 12:48:07 +01:00
SESSION_SUMMARY_ARGENTINA_CONABIP.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_ARGENTINA_Z3950_INVESTIGATION.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_BATCH7.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_COLLECTION_DEPT_PHASE4_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
SESSION_SUMMARY_DAGRE_IMPLEMENTATION.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
SESSION_SUMMARY_LINKML_PHASE8_20251122.md	feat: Complete Country Class Implementation and Hypernyms Removal	2025-11-23 13:09:38 +01:00
SESSION_SUMMARY_NETHERLANDS_ARGENTINA.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_NOV7_DUTCH_VALIDATION.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_ORGANIZATIONAL_MODELING_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
SESSION_SUMMARY_PICO_PHASE3_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
SESSION_SUMMARY_RDF_PARTNERSHIPS.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_SHACL_PHASE7_20251122.md	Add SHACL validation shapes and validation script for Heritage Custodian Ontology	2025-11-22 23:22:10 +01:00
SESSION_SUMMARY_SPARQL_PHASE6_20251122.md	Add SHACL validation shapes and validation script for Heritage Custodian Ontology	2025-11-22 23:22:10 +01:00
SESSION_SUMMARY_SWITZERLAND_ISIL.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_v3_geocoding.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_V5.md	add isil entries	2025-11-19 23:25:22 +01:00
SESSION_SUMMARY_VALIDATION_PHASE5_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
SHACL_SHAPES_COMPLETE_20251122.md	Add SHACL validation shapes and validation script for Heritage Custodian Ontology	2025-11-22 23:22:10 +01:00
SPARQL_QUERY_LIBRARY_COMPLETE_20251122.md	Add SHACL validation shapes and validation script for Heritage Custodian Ontology	2025-11-22 23:22:10 +01:00
TASTE_SMELL_CLASS_ADDITION.md	add isil entries	2025-11-19 23:25:22 +01:00
TAXONOMY_UPDATE_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
test-edge-directionality.sh	add classes	2025-11-25 12:48:07 +01:00
test_canadian_parser.py	add isil entries	2025-11-19 23:25:22 +01:00
TEST_EDGE_DIRECTIONALITY.md	add classes	2025-11-25 12:48:07 +01:00
test_real_dutch_orgs.py	add isil entries	2025-11-19 23:25:22 +01:00
test_real_isil.py	add isil entries	2025-11-19 23:25:22 +01:00
TESTING_SUMMARY.md	add classes	2025-11-25 12:48:07 +01:00
THUERINGEN_100_PERCENT_EXTRACTION_ACHIEVED.md	updated schemata	2025-11-21 22:12:33 +01:00
THUERINGEN_COMPREHENSIVE_HARVEST_SESSION_20251120.md	updated schemata	2025-11-21 22:12:33 +01:00
THUERINGEN_HARVEST_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
THUERINGEN_V4_ENRICHMENT_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
THUERINGEN_V4_MERGE_COMPLETE.md	updated schemata	2025-11-21 22:12:33 +01:00
UML_GENERATION_COMPLETE.md	add classes	2025-11-25 12:48:07 +01:00
UML_GENERATION_COMPLETE_20251123.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
UML_GENERATION_LINKML_AUTO.md	add classes	2025-11-25 12:48:07 +01:00
UML_VIEWER_FIX.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
UML_VIEWER_VS_MERMAID_ANALYSIS.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00
UNIFICATION_SUMMARY.md	add isil entries	2025-11-19 23:25:22 +01:00
V5_QUICK_REFERENCE.md	add isil entries	2025-11-19 23:25:22 +01:00
validate_curated.py	add isil entries	2025-11-19 23:25:22 +01:00
validate_instances.py	add isil entries	2025-11-19 23:25:22 +01:00
VALIDATION_FRAMEWORK_COMPLETE_20251122.md	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams	2025-11-22 23:01:13 +01:00
validation_output.txt	add isil entries	2025-11-19 23:25:22 +01:00
VERIFICATION_CHECKLIST_20251122.md	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats	2025-11-22 14:33:51 +01:00
verify_batch13_ids.py	add isil entries	2025-11-19 23:25:22 +01:00
wget-log	archive websites	2025-11-29 18:05:16 +01:00
wget-log.1	archive websites	2025-11-29 18:05:16 +01:00
wget-log.2	archive websites	2025-11-29 18:05:16 +01:00
wget-log.3	archive websites	2025-11-29 18:05:16 +01:00
wget-log.4	archive websites	2025-11-29 18:05:16 +01:00
wget-log.5	archive websites	2025-11-29 18:05:16 +01:00
wget-log.6	archive websites	2025-11-29 18:05:16 +01:00
wget-log.7	archive websites	2025-11-29 18:05:16 +01:00
wget-log.8	archive websites	2025-11-29 18:05:16 +01:00
wget-log.9	archive websites	2025-11-29 18:05:16 +01:00
wget-log.10	archive websites	2025-11-29 18:05:16 +01:00
wget-log.11	archive websites	2025-11-29 18:05:16 +01:00
wget-log.12	archive websites	2025-11-29 18:05:16 +01:00
wget-log.13	archive websites	2025-11-29 18:05:16 +01:00
wget-log.14	archive websites	2025-11-29 18:05:16 +01:00
wget-log.15	archive websites	2025-11-29 18:05:16 +01:00
wget-log.16	archive websites	2025-11-29 18:05:16 +01:00
wget-log.17	archive websites	2025-11-29 18:05:16 +01:00
wget-log.18	archive websites	2025-11-29 18:05:16 +01:00
wget-log.19	archive websites	2025-11-29 18:05:16 +01:00
wget-log.20	archive websites	2025-11-29 18:05:16 +01:00
wget-log.21	archive websites	2025-11-29 18:05:16 +01:00
wget-log.22	archive websites	2025-11-29 18:05:16 +01:00
wget-log.23	archive websites	2025-11-29 18:05:16 +01:00
wget-log.24	archive websites	2025-11-29 18:05:16 +01:00
wget-log.25	archive websites	2025-11-29 18:05:16 +01:00
wget-log.26	archive websites	2025-11-29 18:05:16 +01:00
wget-log.27	archive websites	2025-11-29 18:05:16 +01:00
wget-log.28	archive websites	2025-11-29 18:05:16 +01:00
wget-log.29	archive websites	2025-11-29 18:05:16 +01:00
wget-log.30	archive websites	2025-11-29 18:05:16 +01:00
wget-log.31	archive websites	2025-11-29 18:05:16 +01:00
wget-log.32	archive websites	2025-11-29 18:05:16 +01:00
wget-log.33	archive websites	2025-11-29 18:05:16 +01:00
wget-log.34	archive websites	2025-11-29 18:05:16 +01:00
wget-log.35	archive websites	2025-11-29 18:05:16 +01:00
wget-log.36	archive websites	2025-11-29 18:05:16 +01:00
wget-log.37	archive websites	2025-11-29 18:05:16 +01:00
wget-log.38	archive websites	2025-11-29 18:05:16 +01:00
wget-log.39	archive websites	2025-11-29 18:05:16 +01:00
wget-log.40	archive websites	2025-11-29 18:05:16 +01:00
wget-log.41	archive websites	2025-11-29 18:05:16 +01:00
wget-log.42	archive websites	2025-11-29 18:05:16 +01:00
wget-log.43	archive websites	2025-11-29 18:05:16 +01:00
wget-log.44	archive websites	2025-11-29 18:05:16 +01:00
wget-log.45	archive websites	2025-11-29 18:05:16 +01:00
wget-log.46	archive websites	2025-11-29 18:05:16 +01:00
wget-log.47	archive websites	2025-11-29 18:05:16 +01:00
wget-log.48	archive websites	2025-11-29 18:05:16 +01:00
wget-log.49	archive websites	2025-11-29 18:05:16 +01:00
wget-log.50	archive websites	2025-11-29 18:05:16 +01:00
wget-log.51	archive websites	2025-11-29 18:05:16 +01:00
wget-log.52	archive websites	2025-11-29 18:05:16 +01:00
wget-log.53	archive websites	2025-11-29 18:05:16 +01:00
wget-log.54	archive websites	2025-11-29 18:05:16 +01:00
wget-log.55	archive websites	2025-11-29 18:05:16 +01:00
wget-log.56	archive websites	2025-11-29 18:05:16 +01:00
wget-log.57	archive websites	2025-11-29 18:05:16 +01:00
wget-log.58	archive websites	2025-11-29 18:05:16 +01:00
wget-log.59	archive websites	2025-11-29 18:05:16 +01:00
wget-log.60	archive websites	2025-11-29 18:05:16 +01:00
wget-log.61	archive websites	2025-11-29 18:05:16 +01:00
wget-log.62	archive websites	2025-11-29 18:05:16 +01:00
wget-log.63	archive websites	2025-11-29 18:05:16 +01:00
wget-log.64	archive websites	2025-11-29 18:05:16 +01:00
wget-log.65	archive websites	2025-11-29 18:05:16 +01:00
wget-log.66	archive websites	2025-11-29 18:05:16 +01:00
wget-log.67	archive websites	2025-11-29 18:05:16 +01:00
wget-log.68	archive websites	2025-11-29 18:05:16 +01:00
wget-log.69	archive websites	2025-11-29 18:05:16 +01:00
wget-log.70	archive websites	2025-11-29 18:05:16 +01:00
wget-log.71	archive websites	2025-11-29 18:05:16 +01:00
wget-log.72	archive websites	2025-11-29 18:05:16 +01:00
wget-log.73	archive websites	2025-11-29 18:05:16 +01:00
wget-log.74	archive websites	2025-11-29 18:05:16 +01:00
wget-log.75	archive websites	2025-11-29 18:05:16 +01:00
wget-log.76	archive websites	2025-11-29 18:05:16 +01:00
wget-log.77	archive websites	2025-11-29 18:05:16 +01:00
wget-log.78	archive websites	2025-11-29 18:05:16 +01:00
wget-log.79	archive websites	2025-11-29 18:05:16 +01:00
wget-log.80	archive websites	2025-11-29 18:05:16 +01:00
wget-log.81	archive websites	2025-11-29 18:05:16 +01:00
wget-log.82	archive websites	2025-11-29 18:05:16 +01:00
wget-log.83	archive websites	2025-11-29 18:05:16 +01:00
wget-log.84	archive websites	2025-11-29 18:05:16 +01:00
wget-log.85	archive websites	2025-11-29 18:05:16 +01:00
wget-log.86	archive websites	2025-11-29 18:05:16 +01:00
wget-log.87	archive websites	2025-11-29 18:05:16 +01:00
wget-log.88	archive websites	2025-11-29 18:05:16 +01:00
wget-log.89	archive websites	2025-11-29 18:05:16 +01:00
wget-log.90	archive websites	2025-11-29 18:05:16 +01:00
wget-log.91	archive websites	2025-11-29 18:05:16 +01:00
wget-log.92	archive websites	2025-11-29 18:05:16 +01:00
wget-log.93	archive websites	2025-11-29 18:05:16 +01:00
wget-log.94	archive websites	2025-11-29 18:05:16 +01:00
wget-log.95	archive websites	2025-11-29 18:05:16 +01:00
wget-log.96	archive websites	2025-11-29 18:05:16 +01:00
wget-log.97	archive websites	2025-11-29 18:05:16 +01:00
wget-log.98	archive websites	2025-11-29 18:05:16 +01:00
wget-log.99	archive websites	2025-11-29 18:05:16 +01:00
wget-log.100	archive websites	2025-11-29 18:05:16 +01:00
wget-log.101	archive websites	2025-11-29 18:05:16 +01:00
wget-log.102	archive websites	2025-11-29 18:05:16 +01:00
wget-log.103	archive websites	2025-11-29 18:05:16 +01:00
wget-log.104	archive websites	2025-11-29 18:05:16 +01:00
wget-log.105	archive websites	2025-11-29 18:05:16 +01:00
wget-log.106	archive websites	2025-11-29 18:05:16 +01:00
wget-log.107	archive websites	2025-11-29 18:05:16 +01:00
wget-log.108	archive websites	2025-11-29 18:05:16 +01:00
wget-log.109	archive websites	2025-11-29 18:05:16 +01:00
wget-log.110	archive websites	2025-11-29 18:05:16 +01:00
wget-log.111	archive websites	2025-11-29 18:05:16 +01:00
wget-log.112	archive websites	2025-11-29 18:05:16 +01:00
wget-log.113	archive websites	2025-11-29 18:05:16 +01:00
wget-log.114	archive websites	2025-11-29 18:05:16 +01:00
wget-log.115	archive websites	2025-11-29 18:05:16 +01:00
wget-log.116	archive websites	2025-11-29 18:05:16 +01:00
wget-log.117	archive websites	2025-11-29 18:05:16 +01:00
wget-log.118	archive websites	2025-11-29 18:05:16 +01:00
wget-log.119	archive websites	2025-11-29 18:05:16 +01:00
wget-log.120	archive websites	2025-11-29 18:05:16 +01:00
wget-log.121	archive websites	2025-11-29 18:05:16 +01:00
wget-log.122	archive websites	2025-11-29 18:05:16 +01:00
wget-log.123	archive websites	2025-11-29 18:05:16 +01:00
wget-log.124	archive websites	2025-11-29 18:05:16 +01:00
wget-log.125	archive websites	2025-11-29 18:05:16 +01:00
wget-log.126	archive websites	2025-11-29 18:05:16 +01:00
wget-log.127	archive websites	2025-11-29 18:05:16 +01:00
wget-log.128	archive websites	2025-11-29 18:05:16 +01:00
wget-log.129	archive websites	2025-11-29 18:05:16 +01:00
wget-log.130	archive websites	2025-11-29 18:05:16 +01:00
wget-log.131	archive websites	2025-11-29 18:05:16 +01:00
wget-log.132	archive websites	2025-11-29 18:05:16 +01:00
wget-log.133	archive websites	2025-11-29 18:05:16 +01:00
wget-log.134	archive websites	2025-11-29 18:05:16 +01:00
wget-log.135	archive websites	2025-11-29 18:05:16 +01:00
wget-log.136	archive websites	2025-11-29 18:05:16 +01:00
wget-log.137	archive websites	2025-11-29 18:05:16 +01:00
wget-log.138	archive websites	2025-11-29 18:05:16 +01:00
WIKIDATA_CREATION_PLAN.md	add isil entries	2025-11-19 23:25:22 +01:00
WIKIDATA_MANUAL_CREATION_GUIDE.md	add isil entries	2025-11-19 23:25:22 +01:00
youtube_enrichment_log.txt	add pids	2025-12-01 23:55:55 +01:00
youtube_enrichment_log_v5.txt	validate enrichments	2025-12-02 14:36:01 +01:00
youtube_enrichment_log_v6.txt	validate enrichments	2025-12-02 14:36:01 +01:00
youtube_enrichment_log_v7.txt	validate enrichments	2025-12-02 14:36:01 +01:00
youtube_enrichment_log_v8.txt	validate enrichments	2025-12-02 14:36:01 +01:00
ZOOM_CAMERA_PERSISTENCE.md	Add UML diagrams and scripts for custodian schema	2025-11-23 23:05:33 +01:00

README.md

GLAM Extractor

Extract and standardize global GLAM (Galleries, Libraries, Archives, Museums) institutional data from conversation transcripts and authoritative registries.

🚀 How to Run the Application - Complete guide for starting frontend, backend, and servers.

Overview

This project extracts structured heritage institution data from 139+ Claude conversation JSON files covering worldwide GLAM research, integrates with authoritative CSV datasets (Dutch ISIL registry, Dutch heritage organizations), validates against a comprehensive LinkML schema, and exports to multiple formats (RDF/Turtle, JSON-LD, CSV, Parquet, SQLite).

Features

Multi-source data integration: Conversation transcripts, CSV registries, web crawling, Wikidata
NLP extraction: spaCy NER, transformers-based classification, pattern matching
LinkML validation: Comprehensive schema with TOOI, Schema.org, CPOC, ISIL, RiC-O, BIBFRAME
Provenance tracking: Every data point tracks source, confidence, and verification status
Multi-format export: RDF/Turtle, JSON-LD, CSV, Parquet, SQLite
Geocoding: Nominatim integration for location enrichment
Multilingual support: Handles 60+ countries and languages

Interactive Frontend (React + TypeScript + D3.js)

UML Viewer 🎨 - Interactive D3.js visualization of heritage custodian ontology diagrams (docs)
- Mermaid class diagrams, ER diagrams, PlantUML, GraphViz
- Zoom, pan, drag nodes, click for details
- 14 schema diagrams from schemas/20251121/uml/
Query Builder 🔍 - Visual SPARQL query constructor
- Add variables, triple patterns, filters
- Live SPARQL generation
- Execute against endpoints
Graph Visualizer 🕸️ - RDF graph exploration with D3.js
- Upload RDF/Turtle files
- Interactive force-directed layout
- SPARQL queries
- Node metadata inspection
Database 🗄️ - TypeDB integration (optional)
NDE House Style 🎨 - Netwerk Digitaal Erfgoed branding throughout

Start the frontend: cd frontend && npm run dev

Quick Start

Installation

# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -

# Clone repository and install dependencies
cd glam-extractor
poetry install

# Download spaCy models
poetry run python -m spacy download en_core_web_trf
poetry run python -m spacy download nl_core_news_lg
poetry run python -m spacy download xx_ent_wiki_sm

Basic Usage

# Extract from conversation JSON
poetry run glam extract conversations/Brazilian_GLAM.json -o output.jsonld

# Extract from Dutch CSV
poetry run glam extract data/ISIL-codes_2025-08-01.csv --csv -o dutch_isil.jsonld

# Validate extracted data
poetry run glam validate output.jsonld -s schemas/heritage_custodian.yaml

# Export to RDF
poetry run glam export output.jsonld -o output.ttl -f rdf

# Crawl institutional website
poetry run glam crawl https://www.rijksmuseum.nl -o rijksmuseum.jsonld

Linked Open Data

The project publishes heritage institution data as W3C-compliant RDF aligned with international ontologies.

Schema RDF Formats (8 Serializations)

The LinkML schema is available in 8 RDF formats (generated from schemas/20251121/linkml/01_custodian_name_modular.yaml):

Format	File	Size	Use Case
Turtle	`01_custodian_name.owl.ttl`	77KB	Human-readable, Git-friendly
N-Triples	`01_custodian_name.nt`	233KB	Line-oriented processing
JSON-LD	`01_custodian_name.jsonld`	191KB	Web APIs, JavaScript
RDF/XML	`01_custodian_name.rdf`	165KB	Legacy systems, Java
Notation3	`01_custodian_name.n3`	77KB	Logic rules, reasoning
TriG	`01_custodian_name.trig`	103KB	Named graphs, datasets
TriX	`01_custodian_name.trix`	348KB	XML with named graphs
N-Quads	`01_custodian_name.nq`	288KB	Quad-based processing

All formats located in schemas/20251121/rdf/

Published Datasets

Denmark 🇩🇰 - ✅ COMPLETE (November 2025)

2,348 institutions (555 libraries, 594 archives, 1,199 branches)
43,429 RDF triples across 9 ontologies
769 Wikidata links (32.8% coverage)
Formats: Turtle, RDF/XML, JSON-LD, N-Triples

See data/rdf/README.md for SPARQL examples and usage.

Ontology Alignment

Ontology	Purpose	Coverage
CPOV (Core Public Organisation Vocabulary)	EU public sector standard	All institutions
Schema.org	Web semantics (Library, ArchiveOrganization)	All institutions
RICO (Records in Contexts)	Archival description	Archives
ORG (W3C Organization Ontology)	Hierarchical relationships	Branches
PROV-O (Provenance Ontology)	Data provenance tracking	All institutions
OWL	Semantic equivalence (Wikidata links)	32.8% Denmark

SPARQL Examples

# Find all libraries in Copenhagen
PREFIX schema: <http://schema.org/>
PREFIX cpov: <http://data.europa.eu/m8g/>

SELECT ?library ?name ?address WHERE {
  ?library a cpov:PublicOrganisation, schema:Library .
  ?library schema:name ?name .
  ?library schema:address ?addrNode .
  ?addrNode schema:addressLocality "København K" .
  ?addrNode schema:streetAddress ?address .
}

# Find all institutions with Wikidata links
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX schema: <http://schema.org/>

SELECT ?institution ?name ?wikidataID WHERE {
  ?institution schema:name ?name .
  ?institution owl:sameAs ?wikidataURI .
  FILTER(STRSTARTS(STR(?wikidataURI), "http://www.wikidata.org/entity/Q"))
  BIND(STRAFTER(STR(?wikidataURI), "http://www.wikidata.org/entity/") AS ?wikidataID)
}

See data/rdf/README.md for more examples.

Project Structure

glam-extractor/
├── pyproject.toml           # Poetry configuration
├── README.md                # This file
├── AGENTS.md                # AI agent instructions
├── .opencode/               # AI agent documentation
│   ├── HYPER_MODULAR_STRUCTURE.md
│   └── SLOT_NAMING_CONVENTIONS.md
├── src/glam_extractor/      # Main package
│   ├── __init__.py
│   ├── cli.py               # Command-line interface
│   ├── parsers/             # Conversation & CSV parsers
│   ├── extractors/          # NLP extraction engines
│   ├── crawlers/            # Web crawling (crawl4ai)
│   ├── validators/          # LinkML validation
│   ├── exporters/           # Multi-format export
│   ├── geocoding/           # Nominatim geocoding
│   └── utils/               # Utilities
├── schemas/20251121/        # LinkML schemas
│   ├── linkml/              # Hyper-modular schema (78 files)
│   │   ├── 01_custodian_name_modular.yaml
│   │   └── modules/
│   │       ├── metadata.yaml
│   │       ├── classes/     # 12 class modules
│   │       ├── enums/       # 5 enum modules
│   │       └── slots/       # 59 slot modules
│   ├── rdf/                 # 8 RDF serialization formats
│   │   ├── 01_custodian_name.owl.ttl
│   │   ├── 01_custodian_name.nt
│   │   ├── 01_custodian_name.jsonld
│   │   ├── 01_custodian_name.rdf
│   │   ├── 01_custodian_name.n3
│   │   ├── 01_custodian_name.trig
│   │   ├── 01_custodian_name.trix
│   │   └── 01_custodian_name.nq
│   └── examples/            # LinkML instance examples
├── tests/                   # Test suite
│   ├── unit/
│   ├── integration/
│   └── fixtures/
├── docs/                    # Documentation
│   ├── plan/global_glam/    # Planning documents
│   ├── api/                 # API documentation
│   ├── tutorials/           # User tutorials
│   └── examples/            # Usage examples
└── data/                    # Reference data
    ├── ISIL-codes_2025-08-01.csv
    ├── voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv
    └── ontology/            # Base ontologies (TOOI, CPOV, Schema.org, etc.)

Data Sources

Conversation JSON Files

139+ conversation files covering global GLAM research:

Geographic coverage: 60+ countries across all continents
Content: Institution names, locations, collections, digital platforms, partnerships
Languages: Multilingual (English, Dutch, Portuguese, Spanish, Vietnamese, Japanese, Arabic, etc.)

CSV Datasets

Dutch ISIL Registry (ISIL-codes_2025-08-01.csv): ~300 Dutch heritage institutions with authoritative ISIL codes
Dutch Organizations (voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv): Comprehensive metadata including systems, partnerships, collection platforms

External Sources (Optional Enrichment)

Wikidata: SPARQL queries for additional metadata
VIAF: Authority file linking
GeoNames: Geographic name authority
Nominatim: Geocoding service

Data Quality & Provenance

Every extracted record includes provenance metadata:

provenance:
  data_source: CONVERSATION_NLP | ISIL_REGISTRY | DUTCH_ORG_CSV | WEB_CRAWL | WIKIDATA
  data_tier: TIER_1_AUTHORITATIVE | TIER_2_VERIFIED | TIER_3_CROWD_SOURCED | TIER_4_INFERRED
  extraction_date: 2025-11-05T...
  extraction_method: "spaCy NER + GPT-4 classification"
  confidence_score: 0.0-1.0
  conversation_id: "uuid"
  source_url: "https://..."
  verified_date: null
  verified_by: null

Data Tiers:

Tier 1: Official registries (ISIL, national registers) - highest authority
Tier 2: Verified institutional data (official websites)
Tier 3: Community-sourced data (Wikidata, OpenStreetMap)
Tier 4: NLP-extracted or inferred data - requires verification

LinkML Schema

Hyper-Modular Architecture

The project uses a hyper-modular LinkML schema (schemas/20251121/linkml/01_custodian_name_modular.yaml) where every class, enum, and slot is defined in its own individual file for maximum maintainability and version control granularity.

Schema Structure:

78 YAML files total
- 12 class modules (modules/classes/)
- 5 enum modules (modules/enums/)
- 59 slot modules (modules/slots/)
- 1 metadata module (modules/metadata.yaml)
- 1 main schema (01_custodian_name_modular.yaml)

Direct Import Pattern:

imports:
  - linkml:types
  - modules/metadata
  - modules/enums/AgentTypeEnum
  - modules/slots/observed_name
  - modules/classes/CustodianObservation
  # ... 76 total individual module imports

Benefits:

✅ Complete transparency - all dependencies visible
✅ Granular version control - one file per concept
✅ Parallel development - no merge conflicts
✅ Selective imports - customize schemas easily

See .opencode/HYPER_MODULAR_STRUCTURE.md for complete architecture documentation.

Ontology Alignment

The schema integrates multiple international standards:

CPOV: Core Public Organisation Vocabulary (EU public sector)
TOOI: Dutch organizational ontology
Schema.org: General web semantics
CIDOC-CRM: Cultural heritage domain model
RiC-O: Records in Contexts Ontology
PROV-O: Provenance tracking
PiCo: Person observations pattern

Key Classes:

CustodianObservation: Source-based references (emic/etic perspectives)
CustodianName: Standardized emic names
CustodianReconstruction: Formal legal entities
ReconstructionActivity: Entity derivation from observations
Agent: People responsible for observations/reconstructions
SourceDocument: Documentary evidence
Identifier: External identifiers (ISIL, VIAF, Wikidata)
TimeSpan: Temporal extents with fuzzy boundaries
ConfidenceMeasure: Data quality metrics

Observation → Reconstruction Pattern:

SourceDocument → CustodianObservation → ReconstructionActivity → CustodianReconstruction
     (text)        (what source says)    (synthesis method)      (formal entity)

This pattern distinguishes between source-based references and scholar-derived formal entities, inspired by the PiCo (Persons in Context) ontology.

Development

Run Tests

poetry run pytest                    # All tests
poetry run pytest -m unit           # Unit tests only
poetry run pytest -m integration    # Integration tests only
poetry run pytest --cov             # With coverage report

Code Quality

poetry run black src/ tests/        # Format code
poetry run ruff check src/ tests/   # Lint code
poetry run mypy src/                # Type checking

Pre-commit Hooks

poetry run pre-commit install
poetry run pre-commit run --all-files

Documentation

poetry run mkdocs serve    # Serve docs locally
poetry run mkdocs build    # Build static docs

Examples

Extract Brazilian Institutions

from glam_extractor import ConversationParser, InstitutionExtractor

# Parse conversation
parser = ConversationParser()
conversation = parser.load("Brazilian_GLAM_collection_inventories.json")

# Extract institutions
extractor = InstitutionExtractor()
institutions = extractor.extract(conversation)

# Print results
for inst in institutions:
    print(f"{inst.name} ({inst.institution_type})")
    print(f"  Location: {inst.locations[0].city}, {inst.locations[0].country}")
    print(f"  Confidence: {inst.provenance.confidence_score}")

Cross-link Dutch Data

from glam_extractor import CSVParser, InstitutionExtractor
from glam_extractor.validators import LinkMLValidator

# Load Dutch ISIL registry
csv_parser = CSVParser()
dutch_institutions = csv_parser.load_isil_registry("ISIL-codes_2025-08-01.csv")

# Load Dutch organizations
dutch_orgs = csv_parser.load_dutch_organizations("voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv")

# Cross-link and merge
extractor = InstitutionExtractor()
merged = extractor.merge_dutch_data(dutch_institutions, dutch_orgs)

# Validate
validator = LinkMLValidator(schema="schemas/heritage_custodian.yaml")
results = validator.validate_batch(merged)
print(f"Valid: {results.valid_count}, Invalid: {results.invalid_count}")

Export to Multiple Formats

from glam_extractor.exporters import JSONLDExporter, RDFExporter, CSVExporter

# Load extracted data
institutions = load_institutions("output.jsonld")

# Export to RDF/Turtle
rdf_exporter = RDFExporter()
rdf_exporter.export(institutions, "output.ttl")

# Export to CSV
csv_exporter = CSVExporter()
csv_exporter.export(institutions, "output.csv")

# Export to Parquet
csv_exporter.export_parquet(institutions, "output.parquet")

Documentation

Planning Docs: docs/plan/global_glam/
- 01-implementation-phases.md: 7-phase implementation plan
- 02-architecture.md: System architecture and data flow
- 03-dependencies.md: Technology stack and dependencies
- 04-data-standardization.md: Data integration strategies
- 05-design-patterns.md: Software design patterns
- 06-consumers-use-cases.md: User segments and applications
AI Agent Instructions: AGENTS.md
- NLP extraction guidelines
- Data quality protocols
- Agent workflow examples
API Documentation: Generated from docstrings with mkdocstrings

Contributing

This is a research project. Contributions welcome!

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

License

MIT License - see LICENSE file for details

Acknowledgments

LinkML: Schema framework
spaCy: NLP processing
crawl4ai: Web crawling
RDFLib: RDF processing
Dutch ISIL Registry: Authoritative institution data
Claude AI: Conversation data source

Contact

For questions or collaboration inquiries, please open an issue on GitHub.

Version: 0.1.0
Status: Alpha - Implementation in progress
Last Updated: 2025-11-05