On Abnormal Distributions, Psuedostatistics and Modern Management Fads

Note: I originally published this nearly 10 years ago in a previous incarnation of The Reticulum - mrv 

Dr. Frankenstein: “Would you mind telling me whose brain I did put in?”
Igor: “And you wont’ be angry?”
Dr. F: “I will NOT-be-angry.”
Igor: “Abby…someone.”
Dr. F : “Abby Someone..?”

Igor: [Nods with an enthusiastic positive manner while looking up as if he is recollecting]
Dr. F: “Abby Who?”
Igor: “Abby Normal”
Dr. F: “Abby Normal?”
Igor: “I’m almost sure that was the name.”
Dr. F: “Are you saying that I put an abnormal brain into an seven and a half foot long, 54 inch wide… GORILLA!? IS THAT WHAT YOU’RE TELLING ME!?”
— Dialogue from Young Frankenstein

As the movie Young Frankenstein demonstrates so hilariously, abnormal stuff just happens. We need to get better at dealing with it.

A case in point occurred when analyzing a large dataset for a survey project. The findings were rich and interesting, but there was one small hitch: the responses we got for one of the more important questions was rather skewed in one direction. In other words, the data was distributed in a way that looked nothing like the conventional bell curve, or Gaussian distribution, that represents normality in statistics.

This kind of thing can give researchers a touch of heartburn. After all, inferential statistical theory usually boils down to deviations from normal bell-curve distributions. It’s all related to the so-called Central Limit Theorem, which states that the distribution of any statistics (e.g., size of snowflakes, heights of people, lifetimes of light bulbs) will, if you have enough data points, wind up in something pretty close to a bell-curve shape.

That normal shape is handy dandy because the mean (aka, average) of all the data is equivalent to the median (aka, midpoint) and the mode (aka, number that appears most often). If your data looks like it has a normal distribution, then standard deviations are a piece of cake and it’s easier to analyze using statistical techniques such as conventional regressions.

Empirical_Rule normal distribution
Visual representation of the Empirical (68-95-99.7) Rule based on the normal distribution, by Dan Kernler

So, we had non-normal, or what I’ll call abnormal, distributions in one data set. It wasn’t really a serious problem, of course. There are lots of ways of coping. Maybe you can’t run a T-test but you can conduct a Mann-Whitney test. Maybe an ANOVA is no good, but you can drum up a Kruskal-Wallis Test. A conventional correlation may not work but a Spearman’s correlation just might. You get the idea (for more on this, see “Dealing with Non-normal Data“).

Over the years, statisticians have come up with quite a few methods for coping with abnormal data because, well, the world isn’t nearly as normal as we normally assume. That fact should not only be remembered by statisticians but by everyone who has been, consciously or not, sucked into the world of what could be called normal-distribution psuedostatistics.

One example of such psuedostatistics is so-called forced or stacked ranking, which is when companies adopt employee performance evaluation systems that require set percentages of employees to be ranked in specific categories. It’s controversial, in part because it can force managers to give unrealistically low evaluations to members of all-around strong teams.

Aside from the fact it can be a lousy system, it bugs me because it’s inspired by, if not based on, the notion that people, even pretty small and non-arbitrary selections of them, fall into normal distributions of talent and performance. In my book, that’s a dangerous form of psuedostatistics. The world is just too abby-normal, as Igor might say, to bet the professional lives of employees on such a shady notion.

There are plenty of other examples of psuedostatistics biting us in our bimodal rumps. The bell-curve meme messes with our heads all the time. For example, we are conditioned into wondering and worrying if, in any given area of our lives, we are, like the children of Lake Woebegone, above average. Or maybe far above above average, at the 95th percentile? The 99th?

And it’s not just ourselves we place somewhere along the bell curves of our imaginations. Men start assigning numbers to women walking by on the streets based on some creepy central limit theorem of beauty. Parents start worrying to which side of some infernal bell curve their kids’ grammar school test scores fall.

I could go on, coming up with hundreds of data points along this warped line of reasoning. And, I fear, so could you. That’s because we are victims as well as beneficiaries of our powerful statistical paradigms. And these paradigms that will only grow more powerful in our increasingly digitized, quantified, big-data world that encourages us to view everyone, including ourselves, as abstracted volumes of variables and vectors. So, amid the measurement mania, we should strive to remember that we are all, in the end, an abnormal sample of one. Vive la difference 

PS: Lately, there’s been another stats-related meme focusing on the idea that employee performance follows a Paretian (aka, Pareto or Power-Law) distribution rather than a normal distribution. Therefore, in theory, a sliver of the employee population is able to produce the majority of positive impact in an organization. The Pareto Principle has been around since management guru Joseph Juran coined the phrase, but, from what I can tell, the notion that this can legitimately explain employee skill and performance levels stems largely from a 2012 article in Personnel Psychology called The Best and the Rest: Revisiting the Norm of Normality of Individual Performance.”

In it, the authors looked at factors such as citation reports in academic journals and awards given to entertainers. I’m sure the article is a legitimate attempt to shed light on the elusive subject of performance within professional fields. But the findings strike me as far from conclusive. The authors themselves, for example, allude to the  Matthew effect. So, are the patterns to which they allude truly about elite performance, or are they more about network effects and preferential attachments?

Another way of stating this is, “Are perceived elite performers actually much better than others in their fields or are they just better connected and able to leverage a more polished public image?” These things are often tough to tease apart. Perhaps time will tell. In the meantime, I recommend maintaining a modicum of skepticism in the face of sweeping sociological assertions linked to simple statistical equations.. Human behavior is tricky stuff and seldom boils down to single lines, however curvy and lovely, of mathematical abstraction.