3/04/23 2:00 PM

Can statistics make you a better regulatory investigator?

Dr Paul Hunter, Chief Data Scientist, EDT

A lot of regulatory investigators are lawyers or come from a legal background. I would argue that a knowledge of mathematics, specifically statistics, can be as valuable as legal expertise. As the stereotype goes, however, lawyers are terrible at maths.

“I think there are a lot of people who go to law school because they are not good at math and can’t think of anything else to do,” US Chief Justice John Roberts told an audience of law students at Rice University. On the other hand, a young legal clerk named Abraham Lincoln reportedly found the study of mathematical reasoning helpful in understanding legal reasoning.

Computer screen showing dashboard with various metrics in charts Managing data sets is all about the numbers. Photo by Stephen Dawson

Maths makes data sets manageable

There are many different techniques you can use to make large volumes of digital evidence more manageable – from simple keyword searches and date ranges to sophisticated machine learning models. If you know how statistical sampling works, it can help you make the most of these techniques. Sampling also gives you an arsenal of new methods to manage and make decisions about digital evidence.

If you are, stereotypically, bad at maths, you may find this challenging. Some aspects of statistics are counterintuitive, such as the fact that the same sample size can be equally useful whether you’re dealing with 100,000 items or 100 million.

You may also be uncomfortable with the idea of uncertainty – because statistical sampling can only ever estimate the proportions of a larger set of items. That means sometimes you will be wrong. However, the advantage of sampling is you can calculate how likely it is that you’re wrong and how wrong you might be.

Making informed decisions about evidence

Our white paper, The art of elusion: How statistical sampling techniques can help regulators manage a deluge of data, gives you a primer on statistical sampling. We start with the basics of sampling techniques and progress to the complexities of calculating error margins and balancing the sample size with your appetite for uncertainty. There’s a bit of maths involved but we explain it with coloured tennis balls and diagrams to make it easier to get your head around.

Once you understand the basics, we’ll show you how you can apply that knowledge to:

Understand the composition of a data set, such as the proportion of responsive or relevant items in the produced data and the coverage of responsive material
Help you decide how much data or how many items you need to review before making an informed decision
Use machine learning techniques such as technology assisted review (TAR) and continuous active learning (CAL) to minimise the data you need to review
Know when to adjust your criteria – for example, if a large proportion of material you receive isn’t relevant to your investigation– and when to challenge the completeness of a production – for instance, if you believe potentially responsive data was left behind.