Implement a function to verify if a generated statement contains only entities present in a given knowledge base. Optimize for time and space complexity, and explain how your approach reduces hallucinations.
Interview
How to structure your answer
The approach involves converting the knowledge base into a set for O(1) lookups, extracting entities from the statement using NER, and validating each entity against the set. This reduces hallucinations by ensuring all entities are explicitly present in the KB. Time complexity is O(n + m) where n is text length and m is entity count. Space complexity is O(k) for the KB set.
Sample answer
The solution uses a set for the knowledge base to enable constant-time lookups. Entities in the generated statement are extracted using spaCy's NER, which identifies named entities like PERSON, ORG, and LOC. Each extracted entity is checked against the KB set. If any entity is not found, the statement is invalid. This approach reduces hallucinations by filtering out entities not in the KB, ensuring outputs are factually grounded. Time complexity is O(n + m) where n is text length and m is entity count. Space complexity is O(k) for the KB set. The method avoids full-text scanning and minimizes redundant checks, optimizing both time and memory usage.
Key points to mention
- • set data structure optimization
- • knowledge base preprocessing
- • hallucination reduction through strict entity validation
Common mistakes to avoid
- ✗ using linear search instead of hash map
- ✗ ignoring case normalization
- ✗ not handling entity synonyms