👩🏫 2.2 Displaying and Summarizing a Numeric Variable #2 (Tabbed)
Section 2: Displaying and Summarizing a Numeric Variable
In this section, we'll explore how to look at the displays we created previously to describe the distribution of our variable.
This content should take about 🐢⌚ 20 minutes if you worked through all of the content (read, watch at 1x speed, attempt problems then watch/read feedback). Expand the tabs below to explore
The distribution of a numeric variable refers to the nature or shape of our data values. For quantitative data, there are three important aspects to the distribution:
Center – What outcomes are the most common?
Spread – What is the range of outcomes that occur in the data set?
Shape – Are there any unusual observations? Does it appear like more people have values on the lower end, higher end or middle of the range?
Consider the definitions above when looking at the histogram of Exam 1 scores from Spring 2018. How would you describe the distribution’s center, spread, and shape?
Be sure to consider this on your own, then watch the video below.
There are three important features to the shape of a distribution: Modality, symmetry and unusual features.
Modality: How many “bumps” or “peaks” does the data have?
There are three ways we might describe the modality of a distribution. Most numeric variables follow a unimodal distribution, meaning they have a single peak. The Exam 1 scores and the wind speeds presented earlier would both be considered unimodal.
Bimodal data is harder to come by and typically happens when you mix two groups with centers that are ‘far enough away from each other.’ Don’t worry, we’ll tackle what 'far enough' is later in this part of the lecture notes. Thankfully, there was a blog with 5 examples of bimodal distributionsLinks to an external site. that linked to the excellent example using the difference in paperback and hardback book prices to show a bimodal distribution in the histogram of book prices to the right.
Lastly, we might call data multimodal if there are more than two modes or if the data appears to be uniform. An example of a distribution that may be uniform - evenly distributed across all outcomes - would be the final digit of phone numbers. You can see a histogram for the final digit of 100 sampled phone numbers below. Admittedly, this would be a meaningless variable and not really quantitative as there is no quantity measured, but let's go with it...
Understanding Check
For the 17 quiz scores graphed in the histogram, determine the modality of the distribution.
Symmetry: Does the data appear to be evenly spread out away from the mode?
When looking at a unimodal distribution, is the data evenly spread away from the mode or does it have one side that trails out further than the other (i.e. skewed)?
⚠ Caution ⚠
Students often want the skew to describe where the mode of the distribution is, but it describes where the longer tail is!
Our previous examples of exam 1 scores and wind speeds were both skewed to the left. While our number of states visited data would be considered skewed towards the right.
If the upper tail is much longer than the lower tail, then the data is positively (or right) skewed.
If the lower tail is much longer than the upper tail, then the data is negatively (or left) skewed.
If the tails are similar lengths, we can call our data fairly symmetric.
Understanding Check
Unusual Features: Does the data have any noteworthy features?
These include gaps (spaces separating data) and outliers (data values that are set far apart from the rest of the body of the distribution). Both of our previous examples have both a gap and potential outliers.
Note: When a distribution is unimodal, symmetric and bell-shaped, we say it is approximately normally distributed.
Example
The histogram to the right shows the lengths of the episodes in the first 4 seasons of Stranger Things.
Take a minute to describe the distribution of episode lengths including center, spread, and shape.
Be sure to consider this on your own, then watch the video below.