Take a look at a natural data set, e.g. the populations of all the counties in the United States.

*Figure 1 – County Map*

There are over 3,000 counties. If you look at all the 3000+ population counts, the first digit is 1 about 30% of the time, the first digit is 2 about 18% of the time, and 9 about 5% of the time.

The digit 1 shows up as a first digit about 6 times as often as 9. Amazing! This phenomenon is called Benford’s law.

*Figure 3 – Benford’s Law*

This phenomenon shows up in many other data sets. In addition to population data, it shows up in geographical data (e.g. areas of rivers), baseball statistics, numbers in magazine articles, numbers in street addresses, house prices and stock market trading data. Benford’s law also has practical applications in business because it shows up in corporate financial statements.

Let’s take a look at the annual financial statements of Google from 2013 to 2016. This is the summary of the first digits in these financial statements.

*Figure 4 – First Digits in Goolge’s Financial Statements*

Look very similar to Benford’s law, right? Here’s is a side by side comparison with Benford’s law.

*Figure 5 – Side by Side Comparison of Google and Benford*

Overall, there is a general agreement between first digits in Google’s financial statements and the predicted percentages according to Benford’s law. There are some noticeable differences. For example, the first digit 1 shows up about 35% of the time in Google’s statements versus 30% of the time according to Benford’s law. The differences are not statistically significant in this case. This conclusion is based on a statistical test called the chi-squared test.

The comparison between the first digits in Google’s financial statements and Benford’s law shows that Benford’s law is a powerful tool for detecting financial frauds.

How do you use Benford’s law? Compare the actual frequencies of the first digits in a set of financial statements with the predicted frequencies according to the Benford’s law. Anyone who fakes financial data at random will not produce data that look convincing. Even when the first digits do not distribute uniformly, too big of a discrepancy between the actual first digits and the Benford’s law (e.g. too few 1’s or too many 7’s, 8’s and 9’s) will raise a giant red flag, at which time the investigator can use more sophisticated tests for further evaluation.

Remember thus guy?

*Figure 6 – Bernie Madoff’s Mug Shot*

Bernie Madoff perpetrated most likely the most massive Ponzi scheme in all of history. His operation would have a constant need to make up numbers for the purpose of keeping up the appearance of legitimate investing. There was a study showing that the first digits in the monthly returns over a 215-month period did not conform to Benford’s law. So Bernie Madoff could have been caught a lot sooner if auditors and regulators were willing to look more closely.

Whether the financial data in question are or are not close to Benford’s law does not prove anything. But too big of a discrepancy should raise suspicion. Then the investigator can further test or evaluate using more sophisticated methods.

There are other applications in addition to fraud detection and forensic accounting. Benford’s law can be used to detect changes in natural processes (e.g. earthquake detection) and as a tool to assess the appropriateness of mathematical models (e.g. assessing whether projected population counts make sense).

This post is an abbreviated version of an article on Benford’s law in an affiliated blog.

________________________________________________

2017 – Dan Ma