Demystifying Benford’s Law

While Benford’s law(a.k.a. the first-digit law) is old hat for those in themathematics posse, and has long been demystified, it’s seeingincreasingly frequent references in the online world: From blogsubscriber counts, to advice for tax cheats (“make sure todistribute those numbers appropriately!“), to claims that it’sa magic technique for detecting real or fake sequences of dicerolls (dubious) — it’s being portrayed as an infalliablemethod of numerical omniscience, applicable anywhere that sets ofnumbers can be found.

There’s a lot of truth out there, but there’s also alot of mistruth. So after seeing yet anotherincorrect application of Benford’s law (where again it was presumedto magically apply to all number sets), I thought it worth throwinga quick entry together, adding in a little scripting goodness todemonstrate the point (the scripted section may not work in someaggregators and readers). This doesn’t really relate to the normalsubject matter of this blog, but hopefully it’s interesting topeople regardless.

I should add the warning that I am not amathematician, and my interest in this subject came only as apassing interest several years back. It was then that I caught atelevision program featuring a pundit describing a technique he wasadvocating to catch fraudulent tax returns. By analyzing thedistribution of leading digits on tax return rows, he claimed, theycould accurately predict where numbers were artifically generated,and conversely where they were real.

The argument that he was proposing, and the cursory informationI then found about this law, struck me as remarkably unintuitive(at the time, though now it seems embarrassingly obvious), so Ispent a little time thinking about how this sort of numericdistribution comes about.

Leading Digit Chart

What I learned then was that the “law” predicts thatapproximately 1/3 of numbers in certain sets of data — inparticular those with a logarithmic distribution (this will bediscussed later) — begin with the number 1, with decreasingfrequency for each remaining digit (e.g. numbers beginning with a”9″ occur in only 4.6% of numeric sets conforming with the law. Ofcourse this is all in regards to base-10 numbers).

Purportedly the first known inklings of the law weredescribed when Simon Newcomb, a Nova Scotianastronomer, noticed that certain pages of a logarithm book hadfar more wear than other pages, indicating that certain valuesappeared with more prevalence.

The reason for the unevent lookup wear became evident on furtheranalysis: If one were to accumulate a vast reservoir of dataon the populations of cities, the prices of menu items, and soon, the eerie presence of Benford’s law would become evident,seemingly against common wisdom. Where one would expectnumbers to cover the spectrum, instead the leading digitdistribution predictions held true.

The following is a demonstration of Benford’s Law materialized,with zero magic or alien intervention. Simply choose the settings(the defaults should be fine) and then click on “Initialize RandomSet”. This will give you a set of randomly distributed numbersbetween 0 and the max random number chosen. The table will displaythe prevalence of leading digits.

Thus far the numbers should be randomly distributed, risking aticket from a Benford’s law enforcement officer. Of course randomor linearly distributed numbers aren’t expected to conform toBenford’s law, so that’s entirely expected.

Now click on the “Inflation / Deflation!” button, which willrandomly scale each value in the set to anywhere from 25% to 225%of its original value on each press.

Almost immediately the distribution will start to mirrorBenford’s Law. At most you might require two or three iterationsuntil it accurately comforms.

Try it with a random starting max of 5 (thereby making theinitial set only possibly contain the starting digits from 1-5) andthen start scaling. Does Benford’s Law appear?

Benford’s Law Demonstration

Number of Random Values:     Random Max:

  

Leading Digit Count Proportion
1  

 

2  

 

3  

 

4  

 

5  

 

6  

 

7  

 

8  

 

9  

 

The explanation is simple and obvious once described: To go from1 to 2, a number has to appreciate by 100%, whereas to go from 2 to3 it would only have to appreciate by 50%. To go from 3 to 4requires only a 33% increase.

This might seem irrelevant, as from a purely additive sense eachincrease is the same +1 linear increase, however in a logarithmicdistribution (e.g. funding that increases or decreases 15% a year),increases and decreases are proportionate with the underlyingvalue.

The same friction-of-appreciation also holds true going from 10to 20, or 100 to 200, or 100000 to 200000, each representing a muchmore significant proportionate increase than the following 20 to 30or 200 to 300 or 200000 to 300000.

For this particular sample, this materializes as randomproportionate deltas have a higher probability of “skipping” thehigher leading digits, while sticking to the lower leading digits.If an existing value is 50, for instance,  and it’s goingto randomly increase anywhere from 0 to 200%, that yields a50% probability that the resulting value will have a leading digitof 1.

Think of the population increase or decrease of a city — itgenerally scales with the city. A large city might grow or shrinkby 50,000 people year over year, albeit representing only a smallpercentage of the total population, while a small city mightincrease by 500 people. Yet as a percentage of population changethey might be the same.

Similarly, an item at $10.00 will have to see a lot of inflationuntil it costs $20, but then it’s a short ride to $30, and an evenshorter hop to $40 — proportionately speaking, of course.

And units of measure don’t actually matter. After Benford’s Lawhas appeared in the set, click on the Multiply Set button – thiswill multiply every set member by 3.75X (a completely arbitraryvalue)…yet the pattern remains.

Hopefully this has delivered a bit of food for thought about theapplicability (or inapplicability) of Benford’s law. It generallyonly fits larger sets of logarithmically distributed values,although that happens to be what many of the values in society, andin nature, are.