Skip to content

Generative AI – sifting the opportunity from the hype

A wave of hype is washing over the evidence management technology industry. Every vendor is spruiking its artificial intelligence (AI) credentials and capabilities to reflect the huge public interest in generative AI, large language models (LLMs), natural language processing, and machine learning.

At EDT, we’ve been using various kinds of AI for more than a decade and we believe this new generation of AI has immense potential. We think these technologies will take some time to deliver real productivity gains but we’re already working on several promising projects, and we envisage a variety of real-world use cases. The successful solutions will be the ones that can be integrated with clients’ evidence management workflows to solve real issues facing the industry.

Broadly, we believe there are three areas where these new technologies may be helpful:

  • Improving our current processes
  • Providing better ways to solve difficult problems
  • Solving problems where older technologies have failed.

A man shouting into a megaphone - AI, LLM, NLP, ML
There’s no shortage of shouting about AI but the useful applications may be the quiet achievers. Photo by Sora Shimazaki

Improving our current processes

Evidence management practitioners have used machine learning for many years to help with document review and quality control – under the banner of predictive coding, technology assisted review (TAR), or continuous active learning (CAL). These technologies are well established and accepted (although we sometimes wish our clients used them more often!).

The question is whether LLMs – or other deep learning technologies – might deliver better results than our current approaches or deliver the same results using fewer resources or less time. For example, LLMs may already provide richer, more contextually sensitive representations of text than traditional methods such as Word2vec or term frequency–inverse document frequency (tf–idf). With the right training and prompting, LLMs might become more accurate than TAR or CAL at separating responsive documents from nonresponsive ones or privileged from nonprivileged. They could allow us to train models from just a handful of examples (few-shot learning) that are at least as accurate as supervised learning models trained on thousands of documents.

Based on our research to date, it seems likely LLMs can at least be helpful to supplement traditional approaches in these areas.

Providing better ways to solve difficult problems

Next let’s consider a range of tasks where evidence management technologies provide a solution but it’s not a very good one – these include:

  • Extracting facts and topics from documents. This streamlines early case assessment by allowing lawyers and investigators to understand the key facts of a case before they move on to the costly document review process. This capability may also be useful for quality control checks by helping to test if anything has been missed during the review process.
  • Using context to recognise named entities, private information, and aliases. This could provide greater accuracy and reduce the number of false positives compared with the traditional approach of using letter-number patterns such as regular expressions.
  • Detecting boilerplate text and email signatures based on their content rather than pre-defined blocks of text.

Our testing has shown LLMs and few-shot learning approaches deliver very promising results for all these tasks and potentially many more.

Solving problems where older technologies have failed

There are several evidence management tasks that pre-AI technologies have done very badly – in some cases we may not even have considered them tractable. These include:

  • Identifying complex entities such as clauses in contracts or individuals using multiple contact methods and aliases
  • Summarising documents and creating chronologies or briefs of evidence from those summaries
  • Interrogating data sets by asking natural-language questions rather than using search terms and syntax.

Many of these tasks can be solved using LLMs.

Many of them are emergent properties of LLMs, in that they weren’t foreshadowed by earlier generations of models and were surprising when discovered. LLMs have repeatedly shown powerful new capabilities that couldn’t reasonably have been predicted.

Added to this, the technology is rapidly evolving as the largest tech organisations spend eye-watering amounts of money on developing it. Just because one technology is dominant now, who’s to say it will still be the leader in 12 months?

Where to from here?

EDT is actively researching new AI capabilities and, as we go, we see more potential uses emerging for these technologies. We believe the key to success is ensuring we stay agile and able to adapt to this ever-changing landscape.

We’ll also need to reassure our clients that their sensitive and confidential data stays that way. Many of them would be uncomfortable using documents from an upcoming court case or current investigation as prompts for a public generative AI model, despite the vendors’ assurances that they don’t retain prompts or use them to train their models. But does the cost of developing and training a private AI model reflect the ultimate benefit?

Like the rest of the world, we’re extremely excited about the capabilities generative AI and LLMs could provide our industry. We’re even more excited by what it means to our clients and how they’ll use EDT in the future – stay tuned for more news!

Dr Paul Hunter

Dr Paul Hunter, Chief Data Scientist, EDT

Phil Smith

Phil Smith, Head of Client Experience, EDT

Paul drives EDT’s product innovation, pushing the envelope towards best practice in machine learning, artificial intelligence, and natural language processing. He is a PhD–qualified mathematician who played a key role in the design, development, and popularisation of predictive coding in Australia and the United States.

Phil’s multifaceted role at EDT brings our clients’ perspectives to product management, data science, consulting, and business process innovation. Phil has managed complex multijurisdictional data challenges for investigations, data breaches, and regulatory responses.