Sunday, May 17, 2020

What Is the Interquartile Range Rule

The interquartile range rule is useful in detecting the presence of outliers. Outliers are individual values that fall outside of the overall pattern of the rest of the data. This definition is somewhat vague and subjective, so it is helpful to have a rule to help in considering if a data point truly is an outlier. The Interquartile Range Any set of data can be described by its five number summary. These five numbers, in ascending order, consist of: The minimum, or lowest value of the datasetThe first quartile Q1 - this represents a quarter of the way through the list of all the dataThe median of the data set - this represents the midpoint of the list of all of the dataThe third quartile Q3 - this represents three quarters of the way through the list of all the dataThe maximum, or highest value of the data set. These five numbers can be used to tell us quite a bit about our data. For example, the range, which is just the minimum subtracted from the maximum, is one indicator of how to spread out the data set is. Similar to the range, but less sensitive to outliers, is the interquartile range. The interquartile range is calculated in much the same way as the range. All that we do is subtract the first quartile from the third quartile: IQR Q3 – Q1. The interquartile range shows how the data is spread about the median. It is less susceptible than the range to outliers. Interquartile Rule for Outliers The interquartile range can be used to help detect outliers. All that we need to do is to is the following: Calculate the interquartile range for our dataMultiply the interquartile range (IQR) by the number 1.5Add 1.5 x (IQR) to the third quartile. Any number greater than this is a suspected outlier.Subtract 1.5 x (IQR) from the first quartile. Any number less than this is a suspected outlier. It is important to remember that this is a rule of thumb and generally holds. In general, we should follow up in our analysis. Any potential outlier obtained by this method should be examined in the context of the entire set of data. Example We will see this interquartile range rule at work with a numerical example. Suppose we have the following set of data: 1, 3, 4, 6, 7, 7, 8, 8, 10, 12, 17. The five number summary for this data set is minimum 1, first quartile 4, median 7, third quartile 10 and maximum 17. We may look at the data and say that 17 is an outlier. But what does our interquartile range rule say? We calculate the interquartile range to be Q3 – Q1 10 – 4 6 We now multiply by 1.5 and have 1.5 x 6 9. Nine less than the first quartile is 4 – 9 -5. No data is less than this. Nine more than the third quartile is 10 9 19. No data is greater than this. Despite the maximum value being five more than the nearest data point, the interquartile range rule shows that it should probably not be considered an outlier for this data set.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.