LexisNexis Innovates Beyond Standard RAG for Legal AI: Prioritizing Completeness and Reliability Over Mere Accuracy in High-Stakes Domains
In complex fields like law, prioritizing AI accuracy alone is insufficient, according to Min Chen, SVP and chief AI officer at LexisNexis. While accuracy is crucial, high-stakes industries demand higher standards, including relevancy, authority, citation accuracy, and low hallucination rates. LexisNexis has evolved its AI approach beyond standard retrieval-augmented generation (RAG) to incorporate graph RAG and agentic graphs. They've also developed "planner" and "reflection" AI agents that critically assess their own outputs. Chen emphasizes that "perfect AI" is unattainable, especially in legal domains, and the focus is on managing uncertainty to deliver consistent customer value and high-quality AI outcomes. The company evaluates models using sub-metrics for "usefulness" and "comprehensiveness," ensuring AI responses fully address all aspects of multi-faceted legal questions, as partial answers, even if accurate, can be misleading and insufficient.
When developing, training, and deploying AI, enterprises typically prioritize accuracy. While undoubtedly important, in highly complex and nuanced industries such as law, accuracy alone is insufficient. Higher stakes necessitate higher standards: model outputs must be evaluated for relevancy, authority, citation accuracy, and hallucination rates. To address this significant challenge, LexisNexis has advanced beyond standard retrieval-augmented generation (RAG) to implement graph RAG and agentic graphs. The company has also developed "planner" and "reflection" AI agents designed to parse requests and critically assess their own outputs.
Min Chen, LexisNexis' SVP and chief AI officer, acknowledged in a recent VentureBeat Beyond the Pilot podcast that "There’s no such [thing] as ‘perfect AI’ because you never get 100% accuracy or 100% relevancy, especially in complex, high stake domains like legal." The primary objective is to manage this inherent uncertainty as effectively as possible and translate it into consistent customer value. Chen stated, "At the end of the day, what matters most for us is the quality of the AI outcome, and that is a continuous journey of experimentation, iteration and improvement."
To achieve 'complete' answers to multi-faceted questions, Chen’s team has established over half a dozen "sub metrics" to measure "usefulness" based on factors such as authority, citation accuracy, and hallucination rates. Additionally, a crucial metric is "comprehensiveness," specifically designed to evaluate whether a generative AI response fully addresses all aspects of a user's legal questions. Chen emphasized, "So it's not just about relevancy. Completeness speaks directly to legal reliability."
For example, a user might pose a question requiring an answer that covers five distinct legal considerations. A generative AI system might provide a response that accurately addresses three of these. However, despite being relevant, this partial answer is incomplete and, from a user's perspective, insufficient. Such incompleteness can be misleading and pose risks.