glam/fix_financial_statement_duplicate_key.py
kempersc f7bf1cc5ae Refactor schema slots and classes
- Deleted obsolete slot definitions: statement_summary, statement_text, statement_type, status_name, supersede_articles, supersede_condition, supersede_name, temporal_dynamics, total_amount, typical_contents, use_cases, was_acquired_through, was_fetched_at, was_retrieved_at.
- Updated existing slot definitions for states_or_stated to enhance clarity and structure.
- Introduced new classes: Article, ConditionofAccess, FinancialStatementType, MaximumQuantity, Series, Summary, Type, and their respective slots to improve schema organization and usability.
- Added new slots: changes_or_changed_through, has_or_had_condition_of_access, has_or_had_heritage_type, is_or_was_part_of_series, is_or_was_retrieved_at, maximum_of_maximum to capture additional metadata and relationships.
2026-01-30 00:29:31 +01:00

42 lines
1.3 KiB
Python

import os
import yaml
# Check FinancialStatement.yaml for duplicate has_or_had_format
filepath = 'schemas/20251121/linkml/modules/classes/FinancialStatement.yaml'
with open(filepath, 'r') as f:
lines = f.readlines()
new_lines = []
seen_keys = set()
in_slot_usage = False
slot_usage_indent = -1
for line in lines:
stripped = line.strip()
indent = len(line) - len(line.lstrip())
if stripped.startswith('slot_usage:'):
in_slot_usage = True
slot_usage_indent = indent
new_lines.append(line)
continue
if in_slot_usage:
if stripped and indent <= slot_usage_indent:
in_slot_usage = False
seen_keys.clear()
else:
# Check for keys at level 6 (slot usage items)
if indent == slot_usage_indent + 2:
key = stripped.split(':')[0]
if key in seen_keys:
print(f"Skipping duplicate key {key} in slot_usage")
# Skip this line AND subsequent lines until next key
continue
# This logic is too simple, need to skip block.
seen_keys.add(key)
new_lines.append(line)
# Since simple skipping is hard line-by-line, let's use the fix_duplicate_keys.py approach again.