👩🏫 2.1 Displaying and Summarizing a Categorical Variable (Tabbed)
Displaying and Summarizing a Categorical Variable
As a reminder, categorical variables place an observational unit into one of several groups using category labels. This section should be mostly review, but covers the ways to display and summarize a single categorical variable and allows you your first chance to tinker in some statistical software - StatCrunch. This content should take about 🐢⌚ 40 minutes if you worked through all of the content (read, watch at 1x speed, attempt problems then watch/read feedback). Expand the tabs below to explore.
- Summarizing a Categorical Variable
- Displaying a Categorical Variable
- Communication Considerations
- Misleading Graphs
Summarizing a Categorical Variable with Tables
⚠ Caution ⚠
The summaries of these variables may be numbers, but the underlying variable is categorical - categorizing students by major.For categorical data, the most common numerical measures are percentages or proportions (though tallies or counts may also be used). In statistics, tallies or counts may be referred to as frequencies while percentages or proportions may be referred to as relative frequencies.
Frequency Tables
These measurements mentioned above may be displayed within a table like the one below which shows the student majors from one of my sections of Math 119 in terms of frequency (#) and relative frequency (proportion and %).
Major |
# |
Rel Freq |
% of Total |
---|---|---|---|
Business |
17 |
0.165 |
16.5% |
Other |
12 |
0.117 |
11.7% |
Biology |
11 |
0.107 |
10.7% |
Nursing |
10 |
0.097 |
9.7% |
Allied Health |
6 |
0.058 |
5.8% |
Computer Science |
5 |
0.049 |
4.9% |
Kinesiology |
5 |
0.049 |
4.9% |
Social Science |
5 |
0.049 |
4.9% |
Undecided |
5 |
0.049 |
4.9% |
This table shows the top majors in all of my Math 119 sections in Fall 2021. The most common major for Math 119 students that semester (and in previous semesters) was Business with 17 students or 16.5% of the 103 students who shared their major falling in that category.
Understanding Check
Suppose you found the following table, but someone had dropped a bottle of ink on it, obscuring some of the original data! Fill in the missing values based on your knowledge of frequencies, relative frequencies, and percents. There are 127 students represented in the table.
Frequency Table for Favorite Movie Genre (Fall 2021 Math 119 Survey)
n = 127
Want a video walkthrough? Click below.
💡 Understanding Check Walkthrough
Creating Frequency Tables
💡 If you need the StatCrunch details, be sure to visit the 🔍 StatCrunch Help page in our Resources for Students Module!
Now you get to play with the Fall 2021 Survey data we've been using in the examples. Click Fall 2021 Survey data Links to an external site. and create a frequency table for FirstGen using StatCrunch. Fill in the table below based on your output. Round the relative frequency values to 3 decimal places and the percentages to a single decimal like the table in the previous question.
Understanding Check
Most likely, if you got marked wrong on this, it was for a rounding error. If we're rounding 0.29365079 to the thousandths place (or three decimal places), we want to look at the fourth decimal place to make decisions about how to round: 0.29365079 if that number is 5 or more, we want to round the digit before it up! So our relative frequency for the No category is 0.294 rather than 0.293. If you'd prefer a video walkthrough, you can find it here: Creating Frequency Table Walkthrough
Displaying a Categorical Variable with Graphs
Graphical displays like bar charts and pie charts are typically used to display a categorical variable. Below, you'll see a bar chart and a pie chart for Favorite Social Media App from our Fall 2021 Stats Class Survey Links to an external site..
⚠ Caution⚠
Pie charts can’t be used for data where more than one outcome is possible as the total values must sum to 100%.What do you notice and wonder about the displays above? Look at them with a critical eye and then watch the video below.
Pie Charts
Pie charts are gross. I mean, they are a fine way to display categorical data, but can be much harder to read than the G.O.A.T. (greatest of all time), bar charts. You can see in the pie chart below, the relative frequencies of our different categories are given in the text next to each segment to make it clear what they are. I will note that pie charts are also the default for the summaries in Google Forms when there are not too many levels of our categorical variable. You may notice that the pie chart below partially needs those relative frequencies given because we don't immediately know what 6% of a circle looks like.
Entertainment Software Association, “Essential Facts About the Video Game Industry”
Links to an external site.
Understanding Check
Looking at the pie chart to the right, identify the observational units and variable.
The answer to this can be ambiguous without knowing how they collected data! So watch the video explanation if you're interested in that discussion.
💡 Understanding Check Walkthrough
Bar Charts
Bar charts are the G.O.A.T. when it comes to displaying categorical variables.
Understanding Check
“How Other Nations Pay for Child Care. The U.S. Is an Outlier
Links to an external site..” (The New York Times, October 6, 2021)
Looking at the bar chart to the right, identify the observational units and variable.
⚠ Caution ⚠
Not all bar charts and pie charts summarize categorical data! Depending on what you call the observational units above, that bar chart may be summarizing a numeric variable. However, in that case, the graph isn't as useful as the graphs we will see in 2.2 in terms of telling us about the distribution of a numeric variable.
Communication Considerations
💡 Accessible links do not include html and instead are simple and meaningful text for the link, which you'll learn more about in Section 2.3.
When you include a graph in any work, you should include a short description of what information the graph is conveying. The previous examples all had titles, but no description. If you had to add a description for them, what would it be?
A strong description includes why the visual is useful and how it is related to the content being presented as well as any information about patterns or trends that are illustrated. If possible, a link to your raw data should also be included in your description to provide transparency.
Additionally, the graph itself should have a title, axes labels, and legend, if appropriate.
Misleading Graphs
Perhaps, because of their simplicity, bar charts and pie charts tend to be used a lot in news, but often with unintentional (or intentional) misleading choices.
The two ‘infographics’ below both suffer from being misleading. Why are these graphs misleading? Do you think this was likely an intentional choice by their creator or an unintentional oversight?
Once you've thought about it, watch this quick video explaining the discrepancy.
Note: The first infographic also demonstrates the issue brought up on Last Week Tonight’s Scientific Studies Links to an external site. in terms of the disconnect between media and science reports. The infographic was on a facebook page with the link to India’s Merchandise Trade press release Links to an external site. as a source which doesn’t contain the same violation of the area principle. The second was posted on the DataIsUgly Subreddit Links to an external site. which is a fun place to see bad data visualizations.
More Bad Graphs
Here are some websites with bad graphs, if you want a laugh:
- Skew the Script's Lesson 1-1 Links to an external site.
- WTF Visualizations
- Worst Graphs of 2018
- DataIsUgly Subreddit Links to an external site.
And one for good measure that made me giggle from Twitter:
Before you click "Next" please read through all of tabbed pages.