Section 2: Displaying and Summarizing a Numeric Variable

In this section, we'll explore how to look at the displays we created previously to describe the distribution of our variable.

This content should take about 🐢⌚ 20 minutes if you worked through all of the content (read, watch at 1x speed, attempt problems then watch/read feedback). Expand the tabs below to explore

Describing the Distribution of a Numeric Variable
Modality
Symmetry
Unusual Features

Describing the Distribution of a Numeric Variable

A histogram showing Exam 1 scores from Spring 2018 provided for students to be able to look for center, spread, and shape.

The distribution of a numeric variable refers to the nature or shape of our data values. For quantitative data, there are three important aspects to the distribution:

Center – What outcomes are the most common?
Spread – What is the range of outcomes that occur in the data set?
Shape – Are there any unusual observations? Does it appear like more people have values on the lower end, higher end or middle of the range?

Consider the definitions above when looking at the histogram of Exam 1 scores from Spring 2018. How would you describe the distribution’s center, spread, and shape?

Be sure to consider this on your own, then watch the video below.

If you'd like to see another example, check out this Oscar for Best Actress example.

More on Shape

A histogram of book prices from a blogger's Amazon wishlist that were under $30 showing a bimodal distribution with a peak around $10 and another around $17.
Source: A real-life example of a bimodal distribution

There are three important features to the shape of a distribution: Modality, symmetry and unusual features.

Modality: How many “bumps” or “peaks” does the data have?

There are three ways we might describe the modality of a distribution. Most numeric variables follow a unimodal distribution, meaning they have a single peak. The Exam 1 scores and the wind speeds presented earlier would both be considered unimodal.

Bimodal data is harder to come by and typically happens when you mix two groups with centers that are ‘far enough away from each other.’ Don’t worry, we’ll tackle what 'far enough' is later in this part of the lecture notes. Thankfully, there was a blog with 5 examples of bimodal distributions that linked to the excellent example using the difference in paperback and hardback book prices to show a bimodal distribution in the histogram of book prices to the right.

Lastly, we might call data multimodal if there are more than two modes or if the data appears to be uniform. An example of a distribution that may be uniform - evenly distributed across all outcomes - would be the final digit of phone numbers. You can see a histogram for the final digit of 100 sampled phone numbers below. Admittedly, this would be a meaningless variable and not really quantitative as there is no quantity measured, but let's go with it...

A histogram of the last digit of 100 phone numbers showing a uniform distribution.

Understanding Check

For the 17 quiz scores graphed in the histogram, determine the modality of the distribution.

Symmetry: Does the data appear to be evenly spread out away from the mode?

When looking at a unimodal distribution, is the data evenly spread away from the mode or does it have one side that trails out further than the other (i.e. skewed)?

⚠ Caution ⚠

Students often want the skew to describe where the mode of the distribution is, but it describes where the longer tail is!

Our previous examples of exam 1 scores and wind speeds were both skewed to the left. While our number of states visited data would be considered skewed towards the right.

Two histograms. On the left, windspeeds showing a left skew and on the right, number of states visited showing a right skew.

If the upper tail is much longer than the lower tail, then the data is positively (or right) skewed.
If the lower tail is much longer than the upper tail, then the data is negatively (or left) skewed.
If the tails are similar lengths, we can call our data fairly symmetric.

Understanding Check

Unusual Features: Does the data have any noteworthy features?

These include gaps (spaces separating data) and outliers (data values that are set far apart from the rest of the body of the distribution). Both of our previous examples have both a gap and potential outliers.

Same histograms as before, showing wind speed and number of states visited. Windspeed has a low outlier and number of states visited has two high outliers.

Note: When a distribution is unimodal, symmetric and bell-shaped, we say it is approximately normally distributed.

Example

A histogram showing the length of episodes (in minutes) for the first four seasons of Stranger Things.

The histogram to the right shows the lengths of the episodes in the first 4 seasons of Stranger Things.

Take a minute to describe the distribution of episode lengths including center, spread, and shape.

Be sure to consider this on your own, then watch the video below.

Some content and questions can be found in Lumen Learning's Concepts of Statistics the original copyright is provided by: Open Learning Initiative. Located at: http://oli.cmu.edu License: CC BY: Attribution

Before you click "Next" please read through all of tabbed pages.