Fraud Detection By The Numbers

February 21, 2011 at 7:39 am 4 comments

By Terence Craig

Picture of a data thief.

Mary mentioned our new fraud detection capabilities in her last post. Our primary fraud detection mechanism uses what is known as Benford’s Law. Benford’s law, also known as the first-digit law, is a neat little algorithm that checks to see if the digits in a randomly selected subset from a large group of numbers match the experimentally determined probabilities for a particular digit.

While powerful, you have to be careful that your problem really fits within its constraints. Benford’s law works best on:

  • Highly variable numeric data (such as stock prices, global sales figures, tax returns and not IQs, body weight, or most things that follow a normal distribution)
  • Data that is truly numeric and not an identifier (for example, a price versus a Social Security number)
  • Large data sets (if sampling a larger set, make sure to use a truly random sample)

Also, even if your data fits these criteria, you need to remember that Benford’s law is only an indication that there might be fraud, not that there is fraud.

How It Works

For a detailed explanation, see this Wikipedia article or this great book from O’Reilly: Statistics Hacks. The shorthand version is that for data meeting the above requirements it has been experimentally determined that the probability of the first non-zero digit being a certain number is the following:

First digits probabilities under Benford’s Law
First digit Probability according to Benford’s law
1 0.301
2 0.176
3 0.125
4 0.097
5 0.079
6 0.067
7 0.058
8 0.051
9 0.046

To utilize Benford’s law for fraud detection, you simply calculate the relative frequency of the first digit of each number of your data set and compare them to the table above. Large discrepancies mean that your data should be viewed with some skepticism. Benford’s law has been accepted as evidence in US courts of law and has become popular lately with the IRS, SEC, and forensic accountants. Accountants tend to refer to Benford’s law as digital frequency analysis. Here is an example using U.S. tax data from author T.P. Hill.



To give you an idea of how universal Benford’s law applications are, take a look at this great graphic from this ANU research paper that shows how closely non-fraudulent naturally occurring data
matches expected Benford values.

In the next version of PAF we allow Benford’s law to be applied to any times series data that we track, with a single click for the first two non-zero digits. PAF will also warn you if the data set is not a good candidate for Benford’s law. We will be putting up some videos of this and some other new features, after we get done tweaking the UI based on some of our beta feedback. It is pretty cool – for example, it spotted that some of our unit test data was fake. Happy fraud detection!

Entry filed under: Data, General Analytics. Tags: , , , , , .

The Power of Analytics: Take Credit Card Fraud as an Example How “Real” is Real-Time and What the Heck is Streaming Analytics?

4 Comments Add your own

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Video: Big Data Made Easy

PatternBuilders Corporate

Follow us on Twitter

Special privacy section!

Enter your email address to subscribe.

Join 61 other followers

Recent Posts

Previous Posts


%d bloggers like this: