glam/docs/dspy_rag/05-entity-linking.md
2025-12-12 12:51:10 +01:00

940 lines
32 KiB
Markdown

# Entity Linking for Heritage Custodians
## Overview
This document defines entity linking strategies for resolving extracted heritage institution mentions to canonical knowledge bases (Wikidata, VIAF, ISIL registry) and the local Heritage Custodian Ontology knowledge graph.
## Entity Linking Architecture
```
┌──────────────────────────────────────────────────────────────────────┐
│ Entity Linking Pipeline │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ Extracted Entity ──► Candidate Generation ──► Candidate Ranking │
│ (NER) (Multi-source) (Features + ML) │
│ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Knowledge │ │ Disambiguation │ │
│ │ Bases │ │ Module │ │
│ ├─────────────────┤ └────────┬────────┘ │
│ │ • Wikidata │ │ │
│ │ • VIAF │ ▼ │
│ │ • ISIL Registry │ ┌─────────────────┐ │
│ │ • GeoNames │ │ NIL Detection │ │
│ │ • Local KG │ │ (No KB Entry) │ │
│ └─────────────────┘ └────────┬────────┘ │
│ │ │
│ ▼ │
│ Linked Entity (or NIL) │
└──────────────────────────────────────────────────────────────────────┘
```
## Knowledge Bases
### Primary Knowledge Bases
| KB | Property | Use Case | Lookup Method |
|----|----------|----------|---------------|
| **Wikidata** | Q-entities | Primary reference KB | SPARQL + API |
| **VIAF** | Authority IDs | Organization authorities | SRU API |
| **ISIL** | Library/archive codes | Unique institution IDs | Direct lookup |
| **GeoNames** | Place IDs | Location disambiguation | API + DB |
| **Local KG** | GHCID | Internal entity resolution | TypeDB query |
### Identifier Cross-Reference Table
```python
IDENTIFIER_PROPERTIES = {
"wikidata": {
"isil": "P791", # ISIL identifier
"viaf": "P214", # VIAF ID
"isni": "P213", # ISNI
"ror": "P6782", # ROR ID
"gnd": "P227", # GND ID (German)
"loc": "P244", # Library of Congress
"bnf": "P268", # BnF (French)
"nta": "P1006", # Netherlands Thesaurus for Author names
}
}
```
## DSPy Entity Linker Module
### EntityLinker Signature
```python
import dspy
from typing import List, Optional
from pydantic import BaseModel, Field
class LinkedEntity(BaseModel):
"""A linked entity with KB reference."""
mention_text: str = Field(description="Original mention text")
canonical_name: str = Field(description="Canonical name from KB")
kb_id: str = Field(description="Knowledge base identifier")
kb_source: str = Field(description="KB source: wikidata, viaf, isil, geonames, local")
confidence: float = Field(ge=0.0, le=1.0)
# Additional identifiers discovered
wikidata_id: Optional[str] = None
viaf_id: Optional[str] = None
isil_code: Optional[str] = None
ghcid: Optional[str] = None
# Disambiguation features
type_match: bool = Field(default=False, description="KB type matches expected type")
location_match: bool = Field(default=False, description="Location context matches")
class EntityLinkerOutput(BaseModel):
linked_entities: List[LinkedEntity]
nil_entities: List[str] = Field(description="Mentions with no KB match (NIL)")
class EntityLinker(dspy.Signature):
"""Link extracted heritage institution mentions to knowledge bases.
Linking strategy:
1. Generate candidates from multiple KBs (Wikidata, VIAF, ISIL, local KG)
2. Score candidates using name similarity, type matching, location context
3. Apply disambiguation for ambiguous cases
4. Detect NIL entities (no KB entry exists)
Priority:
- ISIL code match → highest confidence (unique identifier)
- Wikidata exact match → high confidence
- VIAF authority match → high confidence
- Local KG GHCID match → medium confidence
- Fuzzy name match → lower confidence, requires verification
"""
entities: List[str] = dspy.InputField(desc="Extracted entity mentions to link")
entity_types: List[str] = dspy.InputField(desc="Expected types (GLAMORCUBESFIXPHDNT)")
context: str = dspy.InputField(desc="Surrounding text for disambiguation")
country_hint: Optional[str] = dspy.InputField(default=None, desc="Country context")
linked: EntityLinkerOutput = dspy.OutputField(desc="Linked entities")
```
## Candidate Generation
### Multi-Source Candidate Generator
```python
class CandidateGenerator:
"""Generate entity candidates from multiple knowledge bases."""
def __init__(self):
self.wikidata_client = WikidataClient()
self.viaf_client = VIAFClient()
self.isil_registry = ISILRegistry()
self.geonames_client = GeoNamesClient()
self.local_kg = TypeDBClient()
def generate_candidates(
self,
mention: str,
entity_type: str,
country_hint: str = None,
max_candidates: int = 10,
) -> List[Candidate]:
"""Generate candidates from all sources."""
candidates = []
# 1. ISIL Registry (exact match for known codes)
if self._looks_like_isil(mention):
isil_candidate = self.isil_registry.lookup(mention)
if isil_candidate:
candidates.append(Candidate(
kb_id=mention,
kb_source="isil",
name=isil_candidate["name"],
score=1.0, # Exact match
))
# 2. Wikidata (label search + type filter)
wd_candidates = self.wikidata_client.search_entities(
query=mention,
instance_of=self._type_to_wikidata_class(entity_type),
country=country_hint,
limit=max_candidates,
)
candidates.extend(wd_candidates)
# 3. VIAF (organization search)
if entity_type in ["A", "L", "M", "O", "R"]: # Formal organizations
viaf_candidates = self.viaf_client.search_organizations(
query=mention,
limit=max_candidates // 2,
)
candidates.extend(viaf_candidates)
# 4. Local KG (GHCID lookup)
local_candidates = self.local_kg.search_custodians(
name_query=mention,
custodian_type=entity_type,
country=country_hint,
limit=max_candidates // 2,
)
candidates.extend(local_candidates)
return self._deduplicate(candidates)
def _type_to_wikidata_class(self, glamor_type: str) -> str:
"""Map GLAMORCUBESFIXPHDNT type to Wikidata class."""
TYPE_MAP = {
"G": "Q1007870", # art gallery
"L": "Q7075", # library
"A": "Q166118", # archive
"M": "Q33506", # museum
"O": "Q2659904", # government agency
"R": "Q31855", # research institute
"B": "Q167346", # botanical garden
"E": "Q3918", # university
"S": "Q988108", # historical society
"H": "Q16970", # church (with collections)
"D": "Q35127", # website / digital platform
}
return TYPE_MAP.get(glamor_type, "Q43229") # Default: organization
def _looks_like_isil(self, text: str) -> bool:
import re
return bool(re.match(r"^[A-Z]{2}-[A-Za-z0-9]+$", text))
```
### Wikidata Candidate Search
```python
class WikidataClient:
"""Wikidata entity search and lookup."""
ENDPOINT = "https://query.wikidata.org/sparql"
def search_entities(
self,
query: str,
instance_of: str = None,
country: str = None,
limit: int = 10,
) -> List[Candidate]:
"""Search Wikidata entities by label."""
# Build SPARQL query with filters
filters = []
if instance_of:
filters.append(f"?item wdt:P31/wdt:P279* wd:{instance_of} .")
if country:
country_qid = self._country_to_qid(country)
if country_qid:
filters.append(f"?item wdt:P17 wd:{country_qid} .")
filter_clause = "\n".join(filters)
sparql = f"""
SELECT ?item ?itemLabel ?itemDescription ?isil ?viaf WHERE {{
SERVICE wikibase:mwapi {{
bd:serviceParam wikibase:api "EntitySearch" .
bd:serviceParam wikibase:endpoint "www.wikidata.org" .
bd:serviceParam mwapi:search "{query}" .
bd:serviceParam mwapi:language "en,nl,de,fr" .
?item wikibase:apiOutputItem mwapi:item .
}}
{filter_clause}
OPTIONAL {{ ?item wdt:P791 ?isil }}
OPTIONAL {{ ?item wdt:P214 ?viaf }}
SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en,nl,de,fr" }}
}}
LIMIT {limit}
"""
results = self._execute_sparql(sparql)
return [
Candidate(
kb_id=r["item"]["value"].split("/")[-1],
kb_source="wikidata",
name=r.get("itemLabel", {}).get("value", ""),
description=r.get("itemDescription", {}).get("value", ""),
isil=r.get("isil", {}).get("value"),
viaf=r.get("viaf", {}).get("value"),
score=0.0, # Score computed later
)
for r in results
]
def get_entity_details(self, qid: str) -> dict:
"""Get full entity details from Wikidata."""
sparql = f"""
SELECT ?prop ?propLabel ?value ?valueLabel WHERE {{
wd:{qid} ?prop ?value .
SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en,nl" }}
}}
"""
return self._execute_sparql(sparql)
```
### VIAF Authority Search
```python
class VIAFClient:
"""VIAF Virtual International Authority File client."""
SRU_ENDPOINT = "https://viaf.org/viaf/search"
def search_organizations(
self,
query: str,
limit: int = 10,
) -> List[Candidate]:
"""Search VIAF for corporate bodies."""
# SRU CQL query
cql_query = f'local.corporateNames all "{query}"'
params = {
"query": cql_query,
"maximumRecords": limit,
"httpAccept": "application/json",
"recordSchema": "BriefVIAF",
}
response = requests.get(self.SRU_ENDPOINT, params=params)
data = response.json()
candidates = []
for record in data.get("records", []):
viaf_id = record.get("viafID")
main_heading = record.get("mainHeadingEl", {}).get("datafield", {})
name = self._extract_name(main_heading)
candidates.append(Candidate(
kb_id=viaf_id,
kb_source="viaf",
name=name,
score=0.0,
))
return candidates
def get_authority_cluster(self, viaf_id: str) -> dict:
"""Get all authority records linked to a VIAF cluster."""
url = f"https://viaf.org/viaf/{viaf_id}/viaf.json"
response = requests.get(url)
if response.status_code == 200:
return response.json()
return {}
```
### ISIL Registry Lookup
```python
class ISILRegistry:
"""ISIL (International Standard Identifier for Libraries) registry."""
def __init__(self, db_path: str = "data/reference/isil_registry.db"):
self.db_path = db_path
def lookup(self, isil_code: str) -> Optional[dict]:
"""Look up institution by ISIL code."""
import sqlite3
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute("""
SELECT name, city, country, institution_type, notes
FROM isil_registry
WHERE isil_code = ?
""", (isil_code,))
row = cursor.fetchone()
conn.close()
if row:
return {
"isil_code": isil_code,
"name": row[0],
"city": row[1],
"country": row[2],
"institution_type": row[3],
"notes": row[4],
}
return None
def search_by_name(
self,
name: str,
country: str = None,
limit: int = 10,
) -> List[dict]:
"""Search ISIL registry by institution name."""
import sqlite3
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
query = """
SELECT isil_code, name, city, country, institution_type
FROM isil_registry
WHERE name LIKE ?
"""
params = [f"%{name}%"]
if country:
query += " AND country = ?"
params.append(country)
query += f" LIMIT {limit}"
cursor.execute(query, params)
rows = cursor.fetchall()
conn.close()
return [
{
"isil_code": row[0],
"name": row[1],
"city": row[2],
"country": row[3],
"institution_type": row[4],
}
for row in rows
]
```
## Candidate Ranking
### Feature-Based Ranking
```python
class CandidateRanker:
"""Rank entity candidates using multiple features."""
def __init__(self):
self.name_matcher = NameMatcher()
self.type_checker = TypeChecker()
self.location_matcher = LocationMatcher()
def rank_candidates(
self,
mention: str,
candidates: List[Candidate],
context: str,
expected_type: str,
location_context: str = None,
) -> List[Candidate]:
"""Rank candidates by combined feature score."""
for candidate in candidates:
# Feature 1: Name similarity
name_score = self.name_matcher.similarity(mention, candidate.name)
# Feature 2: Type match
type_score = self.type_checker.type_match_score(
candidate.kb_source,
candidate.kb_id,
expected_type,
)
# Feature 3: Location context
location_score = 0.0
if location_context:
location_score = self.location_matcher.location_match_score(
candidate,
location_context,
)
# Feature 4: Context similarity
context_score = self._context_similarity(candidate, context)
# Feature 5: Source priority
source_score = self._source_priority(candidate.kb_source)
# Combine scores (weighted)
candidate.score = (
0.35 * name_score +
0.25 * type_score +
0.15 * location_score +
0.15 * context_score +
0.10 * source_score
)
# Sort by score descending
candidates.sort(key=lambda c: c.score, reverse=True)
return candidates
def _source_priority(self, source: str) -> float:
"""Priority score for KB source (ISIL > Wikidata > VIAF > local)."""
PRIORITIES = {
"isil": 1.0, # Unique identifier
"wikidata": 0.9, # Rich entity data
"viaf": 0.8, # Authority file
"local": 0.7, # Local KG
"geonames": 0.6, # Place data
}
return PRIORITIES.get(source, 0.5)
def _context_similarity(self, candidate: Candidate, context: str) -> float:
"""Semantic similarity between candidate description and context."""
if not candidate.description:
return 0.5
# Use sentence embeddings
from sentence_transformers import util
context_emb = self.embedder.encode(context)
desc_emb = self.embedder.encode(candidate.description)
return float(util.cos_sim(context_emb, desc_emb)[0][0])
```
### Name Matching
```python
class NameMatcher:
"""Fuzzy name matching for entity linking."""
def __init__(self):
self.normalizer = NameNormalizer()
def similarity(self, mention: str, candidate_name: str) -> float:
"""Compute name similarity score."""
# Normalize both names
norm_mention = self.normalizer.normalize(mention)
norm_candidate = self.normalizer.normalize(candidate_name)
# Exact match
if norm_mention == norm_candidate:
return 1.0
# Token overlap (Jaccard)
mention_tokens = set(norm_mention.split())
candidate_tokens = set(norm_candidate.split())
jaccard = len(mention_tokens & candidate_tokens) / len(mention_tokens | candidate_tokens)
# Levenshtein ratio
from rapidfuzz import fuzz
levenshtein = fuzz.ratio(norm_mention, norm_candidate) / 100.0
# Token sort ratio (order-independent)
token_sort = fuzz.token_sort_ratio(norm_mention, norm_candidate) / 100.0
# Combine scores
return 0.4 * jaccard + 0.3 * levenshtein + 0.3 * token_sort
class NameNormalizer:
"""Normalize institution names for matching."""
# Skip words by language (legal forms, articles)
SKIP_WORDS = {
"nl": ["stichting", "de", "het", "van", "voor", "en", "te"],
"en": ["the", "of", "and", "for", "foundation", "trust", "inc"],
"de": ["der", "die", "das", "und", "für", "stiftung", "e.v."],
"fr": ["le", "la", "les", "de", "du", "et", "fondation"],
}
def normalize(self, name: str, language: str = "nl") -> str:
"""Normalize institution name."""
import unicodedata
import re
# Lowercase
name = name.lower()
# Remove diacritics
name = unicodedata.normalize("NFD", name)
name = "".join(c for c in name if unicodedata.category(c) != "Mn")
# Remove punctuation
name = re.sub(r"[^\w\s]", " ", name)
# Remove skip words
skip = set(self.SKIP_WORDS.get(language, []))
tokens = [t for t in name.split() if t not in skip]
# Collapse whitespace
return " ".join(tokens)
```
### Type Checking
```python
class TypeChecker:
"""Check if candidate type matches expected type."""
# Wikidata class mappings for GLAMORCUBESFIXPHDNT
WIKIDATA_TYPE_MAP = {
"G": ["Q1007870", "Q207694"], # art gallery, museum of art
"L": ["Q7075", "Q856234"], # library, national library
"A": ["Q166118", "Q2860091"], # archive, national archive
"M": ["Q33506", "Q17431399"], # museum, museum building
"O": ["Q2659904", "Q327333"], # government agency, public body
"R": ["Q31855", "Q7315155"], # research institute, research center
"B": ["Q167346", "Q43501"], # botanical garden, zoo
"E": ["Q3918", "Q875538"], # university, public university
"S": ["Q988108", "Q15911314"], # historical society, heritage organization
"H": ["Q16970", "Q839954"], # church, religious institute
"D": ["Q35127", "Q856584"], # website, digital library
}
def type_match_score(
self,
kb_source: str,
kb_id: str,
expected_type: str,
) -> float:
"""Score type compatibility."""
if kb_source == "wikidata":
return self._wikidata_type_match(kb_id, expected_type)
elif kb_source == "isil":
return 0.9 # ISIL implies library/archive type
elif kb_source == "viaf":
return 0.8 # VIAF implies organization
return 0.5 # Unknown
def _wikidata_type_match(self, qid: str, expected_type: str) -> float:
"""Check if Wikidata entity type matches expected."""
expected_classes = self.WIKIDATA_TYPE_MAP.get(expected_type, [])
if not expected_classes:
return 0.5
# Query Wikidata for instance_of
sparql = f"""
SELECT ?class WHERE {{
wd:{qid} wdt:P31/wdt:P279* ?class .
VALUES ?class {{ {' '.join(f'wd:{c}' for c in expected_classes)} }}
}}
LIMIT 1
"""
results = wikidata_execute_sparql(sparql)
if results:
return 1.0 # Direct type match
# Check for broader match
sparql_broad = f"""
SELECT ?class WHERE {{
wd:{qid} wdt:P31 ?class .
}}
LIMIT 5
"""
results_broad = wikidata_execute_sparql(sparql_broad)
if results_broad:
return 0.6 # Has some type, but not exact match
return 0.3 # No type information
```
## Disambiguation Strategies
### Context-Based Disambiguation
```python
class DisambiguationModule(dspy.Module):
"""Disambiguate between multiple candidate matches."""
def __init__(self):
super().__init__()
self.disambiguator = dspy.ChainOfThought(DisambiguationSignature)
def forward(
self,
mention: str,
candidates: List[Candidate],
context: str,
) -> Candidate:
# Format candidates for LLM
candidate_descriptions = "\n".join([
f"- {c.kb_source}:{c.kb_id} - {c.name}: {c.description or 'No description'}"
for c in candidates[:5] # Top 5
])
result = self.disambiguator(
mention=mention,
candidates=candidate_descriptions,
context=context,
)
# Parse result and find matching candidate
selected_id = result.selected_id
for candidate in candidates:
if f"{candidate.kb_source}:{candidate.kb_id}" == selected_id:
return candidate
# Return top candidate if parsing fails
return candidates[0] if candidates else None
class DisambiguationSignature(dspy.Signature):
"""Select the correct entity from candidates.
Given a mention, multiple candidate matches, and surrounding context,
determine which candidate is the correct entity reference.
Consider:
- Name similarity (exact vs partial match)
- Type compatibility (is it the right kind of institution?)
- Location context (does location match?)
- Contextual clues (other entities, topics mentioned)
"""
mention: str = dspy.InputField(desc="Entity mention text")
candidates: str = dspy.InputField(desc="Formatted candidate list")
context: str = dspy.InputField(desc="Surrounding text context")
selected_id: str = dspy.OutputField(desc="Selected candidate ID (format: source:id)")
reasoning: str = dspy.OutputField(desc="Explanation for selection")
```
### Geographic Disambiguation
```python
class LocationMatcher:
"""Disambiguate entities using location context."""
def __init__(self):
self.geonames = GeoNamesClient()
def location_match_score(
self,
candidate: Candidate,
location_context: str,
) -> float:
"""Score location compatibility."""
# Extract location from context
context_locations = self._extract_locations(location_context)
if not context_locations:
return 0.5 # No location to match
# Get candidate location
candidate_location = self._get_candidate_location(candidate)
if not candidate_location:
return 0.5 # No candidate location
# Compare locations
for context_loc in context_locations:
# Same city
if self._same_city(context_loc, candidate_location):
return 1.0
# Same region
if self._same_region(context_loc, candidate_location):
return 0.8
# Same country
if self._same_country(context_loc, candidate_location):
return 0.6
return 0.2 # No location match
def _get_candidate_location(self, candidate: Candidate) -> Optional[dict]:
"""Get location for candidate from KB."""
if candidate.kb_source == "wikidata":
sparql = f"""
SELECT ?city ?country ?coords WHERE {{
OPTIONAL {{ wd:{candidate.kb_id} wdt:P131 ?city }}
OPTIONAL {{ wd:{candidate.kb_id} wdt:P17 ?country }}
OPTIONAL {{ wd:{candidate.kb_id} wdt:P625 ?coords }}
}}
LIMIT 1
"""
results = wikidata_execute_sparql(sparql)
if results:
return {
"city": results[0].get("city", {}).get("value"),
"country": results[0].get("country", {}).get("value"),
"coords": results[0].get("coords", {}).get("value"),
}
elif candidate.kb_source == "isil":
# ISIL country from code prefix
country_code = candidate.kb_id.split("-")[0]
return {"country_code": country_code}
return None
```
## NIL Detection
### NIL Entity Classifier
```python
class NILDetector:
"""Detect entities with no knowledge base entry (NIL)."""
def __init__(self, nil_threshold: float = 0.4):
self.nil_threshold = nil_threshold
def is_nil(
self,
mention: str,
top_candidate: Optional[Candidate],
context: str,
) -> Tuple[bool, str]:
"""Determine if mention refers to a NIL entity.
Returns:
(is_nil, reason)
"""
# No candidates found
if top_candidate is None:
return True, "no_candidates_found"
# Top candidate score below threshold
if top_candidate.score < self.nil_threshold:
return True, f"low_confidence_score_{top_candidate.score:.2f}"
# Name too dissimilar
name_sim = NameMatcher().similarity(mention, top_candidate.name)
if name_sim < 0.5:
return True, f"name_mismatch_{name_sim:.2f}"
# Type mismatch (if type info available)
# ...
return False, "valid_match"
def create_nil_entity(
self,
mention: str,
entity_type: str,
context: str,
provenance: dict,
) -> dict:
"""Create a NIL entity record for later KB population."""
return {
"mention_text": mention,
"entity_type": entity_type,
"context_snippet": context[:500],
"nil_reason": "no_kb_match",
"provenance": provenance,
"created_date": datetime.now().isoformat(),
"status": "pending_verification",
}
```
## Full Entity Linking Pipeline
```python
class EntityLinkingPipeline(dspy.Module):
"""Complete entity linking pipeline."""
def __init__(self):
super().__init__()
self.candidate_generator = CandidateGenerator()
self.candidate_ranker = CandidateRanker()
self.disambiguator = DisambiguationModule()
self.nil_detector = NILDetector()
def forward(
self,
entities: List[dict], # [{mention, type, context}]
country_hint: str = None,
) -> EntityLinkerOutput:
linked_entities = []
nil_entities = []
for entity in entities:
mention = entity["mention"]
entity_type = entity["type"]
context = entity["context"]
# 1. Generate candidates
candidates = self.candidate_generator.generate_candidates(
mention=mention,
entity_type=entity_type,
country_hint=country_hint,
)
if not candidates:
nil_entities.append(mention)
continue
# 2. Rank candidates
ranked = self.candidate_ranker.rank_candidates(
mention=mention,
candidates=candidates,
context=context,
expected_type=entity_type,
location_context=country_hint,
)
# 3. Disambiguate if needed
if len(ranked) > 1 and ranked[0].score - ranked[1].score < 0.1:
# Close scores - need disambiguation
selected = self.disambiguator(
mention=mention,
candidates=ranked[:5],
context=context,
)
else:
selected = ranked[0]
# 4. NIL detection
is_nil, nil_reason = self.nil_detector.is_nil(
mention=mention,
top_candidate=selected,
context=context,
)
if is_nil:
nil_entities.append(mention)
continue
# 5. Create linked entity
linked_entities.append(LinkedEntity(
mention_text=mention,
canonical_name=selected.name,
kb_id=selected.kb_id,
kb_source=selected.kb_source,
confidence=selected.score,
wikidata_id=selected.kb_id if selected.kb_source == "wikidata" else None,
viaf_id=selected.viaf,
isil_code=selected.isil,
type_match=selected.score > 0.7,
))
return EntityLinkerOutput(
linked_entities=linked_entities,
nil_entities=nil_entities,
)
```
## Confidence Thresholds
| Scenario | Threshold | Action |
|----------|-----------|--------|
| **Exact ISIL match** | 1.0 | Auto-link |
| **Wikidata exact name + type** | ≥0.9 | Auto-link |
| **Fuzzy match, high context** | ≥0.7 | Auto-link |
| **Fuzzy match, low context** | 0.5-0.7 | Flag for review |
| **Low score** | <0.5 | Mark as NIL |
| **No candidates** | 0.0 | Create NIL record |
## See Also
- [04-entity-extraction.md](./04-entity-extraction.md) - NER patterns and extraction
- [07-sparql-templates.md](./07-sparql-templates.md) - Wikidata SPARQL queries
- [06-retrieval-patterns.md](./06-retrieval-patterns.md) - KG retrieval strategies
- [AGENTS.md](../../AGENTS.md) - Rule 1 (Ontology consultation), Rule 10 (CH-Annotator)