·

How to Detect an Anomaly

When we hear the word “anomaly” in our day-to-day lives, we probably liken the term to “something out of the ordinary.” From a statistical, process monitoring perspective, this is a great definition to start with! Anomalies, like all observed phenomena, are a function of their context. What is normal in one context (an NBA player…

When we hear the word “anomaly” in our day-to-day lives, we probably liken the term to “something out of the ordinary.” From a statistical, process monitoring perspective, this is a great definition to start with!


Anomalies, like all observed phenomena, are a function of their context. What is normal in one context (an NBA player dunking a basketball) may be abnormal in another context (a toddler dunking a basketball). So before we can get into the business of detecting anomalies, we must first be very clear about what “normal” looks like.

In MLB, for example, there are many, many quantitative performance metrics we can categorize into levels like “below average,” “average,” and “above average.” So if I know what average play looks like based on a given metric or metrics, I can use this as a point of comparison for evaluating a specific player.


Once we’ve established what “normal” looks like, we can now sequentially and systematically collect information from our monitored process. In the MLB example, this would mean collecting our specific performance metric for our player(s) on a game-to-game basis.

Once we have a data point, we can compare it to our normal, expected value, perhaps by taking a simple difference. In some instances, our data point may be close to the expected value. In others, it may not be.

So the question then becomes, if we observe a non-zero difference between our observed data point and our expected value, how big does the difference have to be before we classify the data point as an anomaly?


Here is where statistics come into play. In the late 19th century, a Russian mathematician named Paftnuty Chebyshev (Dr. C for short) proved a theorem that states (in part) that for any random process (like observing performance metrics from MLB players), the probability of observing a difference between an observation and its mean (average) that is greater than k standard deviations* is at most 1/k2.

Quick sidebar about mean and standard deviation before we continue. Suppose I want to know the mean (also called “average”) age of everyone on a particular flight. To calculate this, we would determine everyone’s age, sum them up, and divide that sum by the total number of people on the flight. But how do we interpret it? Let’s say everyone on the flight had written their age on a little slip of paper and put it into a hat. I shake the hat to scramble up the slips, randomly pull one out, and hold it clasped in my hand.

If I asked you to guess what value is written on the slip of paper, your best guess would be the mean we calculated previously. In other words, the mean is the expected value of a randomly selected observation from a random process.

Now what about standard deviation? Standard deviation is a measure of variability (or similarity) among a set of observations. For instance, on this flight, suppose everyone is between the age of 75 and 77. Not much variability there! Conversely, maybe we have a wide variety of ages ranging from newborns to people in their 90s. It is fair to say we would have a good deal of variability in age in such a case!

Standard deviation is a way of quantifying variation with respect to the mean we calculated previously. So once again, suppose I pull a slip of paper out of my hat and hold it clasped in my hand. Rather than asking you what you think the specific value is, I instead ask you how far away from the mean you think the value is. Your best guess is the standard deviation. Larger values of standard deviation imply more variation in the group of values we’ve collected and vice versa. Its value is strictly non-negative (and practically almost always positive) based on its calculation.


Back to Chebyshev! If we divide the absolute difference we calculated before by the standard deviation, we get a standardized (read, “unitless”) measure whose value tells us how many standard deviations our observed value is from our expected value. If this standardized value is greater than 4, Chebyshev’s theorem tells us that the probability of observing such a value is less than or equal to 6% (so pretty unlikely!). Similarly, if the value is greater than 3, the probability of observing this value is less than or equal to 11% (also fairly unlikely!).

In other words, if we observe a standardized value greater than 3 or 4, it is reasonable to conclude that we’ve likely observed an anomaly!


While the specific methods we use to model phenomena can be highly sophisticated and complex (think credit card fraud detection, for instance!), the core concepts are rooted in the general ideas described here! And in fact, the modeling we do at RJ Sports Analytics for evaluating re-injury likelihood, at its core, follows these same principles!

More from the blog