Fraud Detection By The Numbers
Mary mentioned our new fraud detection capabilities in her last post. Our primary fraud detection mechanism uses what is known as Benford’s Law. Benford’s law, also known as the first-digit law, is a neat little algorithm that checks to see if the digits in a randomly selected subset from a large group of numbers match the experimentally determined probabilities for a particular digit.
While powerful, you have to be careful that your problem really fits within its constraints. Benford’s law works best on:
- Highly variable numeric data (such as stock prices, global sales figures, tax returns and not IQs, body weight, or most things that follow a normal distribution)
- Data that is truly numeric and not an identifier (for example, a price versus a Social Security number)
- Large data sets (if sampling a larger set, make sure to use a truly random sample)
Also, even if your data fits these criteria, you need to remember that Benford’s law is only an indication that there might be fraud, not that there is fraud.
How It Works
For a detailed explanation, see this Wikipedia article or this great book from O’Reilly: Statistics Hacks. The shorthand version is that for data meeting the above requirements it has been experimentally determined that the probability of the first non-zero digit being a certain number is the following:
|First digits probabilities under Benford’s Law|
|First digit||Probability according to Benford’s law|
To utilize Benford’s law for fraud detection, you simply calculate the relative frequency of the first digit of each number of your data set and compare them to the table above. Large discrepancies mean that your data should be viewed with some skepticism. Benford’s law has been accepted as evidence in US courts of law and has become popular lately with the IRS, SEC, and forensic accountants. Accountants tend to refer to Benford’s law as digital frequency analysis. Here is an example using U.S. tax data from author T.P. Hill.
To give you an idea of how universal Benford’s law applications are, take a look at this great graphic from this ANU research paper that shows how closely non-fraudulent naturally occurring data
matches expected Benford values.
In the next version of PAF we allow Benford’s law to be applied to any times series data that we track, with a single click for the first two non-zero digits. PAF will also warn you if the data set is not a good candidate for Benford’s law. We will be putting up some videos of this and some other new features, after we get done tweaking the UI based on some of our beta feedback. It is pretty cool – for example, it spotted that some of our unit test data was fake. Happy fraud detection!