glam/scripts/fix_bce_dates_final.py
kempersc e5a532a8bc Add comprehensive tests for NLP institution extraction and RDF partnership integration
- Introduced `test_nlp_extractor.py` with unit tests for the InstitutionExtractor, covering various extraction patterns (ISIL, Wikidata, VIAF, city names) and ensuring proper classification of institutions (museum, library, archive).
- Added tests for extracted entities and result handling to validate the extraction process.
- Created `test_partnership_rdf_integration.py` to validate the end-to-end process of extracting partnerships from a conversation and exporting them to RDF format.
- Implemented tests for temporal properties in partnerships and ensured compliance with W3C Organization Ontology patterns.
- Verified that extracted partnerships are correctly linked with PROV-O provenance metadata.
2025-11-19 23:20:47 +01:00

46 lines
1.5 KiB
Python

#!/usr/bin/env python3
"""
Fix BCE dates by using placeholder date '0001-01-01' for ancient events.
For archaeological sites with BCE founding dates, we use:
- event_date: '0001-01-01' (earliest representable date in ISO 8601)
- event_description: Contains the actual BCE date context
"""
import yaml
from pathlib import Path
# Paths
LIBYA_YAML = Path(__file__).parent.parent / "data/instances/libya/libyan_institutions.yaml"
def main():
print("Fixing BCE date events with placeholder dates...\n")
# Load YAML
with open(LIBYA_YAML, 'r', encoding='utf-8') as f:
data = yaml.safe_load(f)
fixes = 0
for inst in data:
name = inst.get('name', '')
if 'change_history' in inst and inst['change_history']:
for event in inst['change_history']:
# If event has no event_date (removed because it was BCE)
if 'event_date' not in event:
# Add placeholder date for ancient events
event['event_date'] = '0001-01-01'
fixes += 1
print(f"{name}: Added placeholder date for ancient event")
# Save
if fixes > 0:
with open(LIBYA_YAML, 'w', encoding='utf-8') as f:
yaml.dump(data, f, allow_unicode=True, sort_keys=False, default_flow_style=False)
print(f"\n💾 Saved {fixes} fixes to {LIBYA_YAML}")
else:
print("No fixes needed!")
if __name__ == '__main__':
main()