Lab: Avengers’ Peril

Overview

In this activity we are going to

  1. reshape a data set, and then

  2. use the data to fact-check some statistics in a published report.

Data Background

The data was collected by FiveThirtyEight. This time we are interested in a review of life and death among the Avengers, a group of super heroes in the Marvel universe; the accompanying article is published here.

The description of the data is here.

In a universe where time travel, alternate lifelines and portals with all kinds of powers exist, dying might be less permanent than in the world that we are used to.

In 2015 FiveThirtyEight has conducted a comprehensive review of all Avengers. We can access the resulting data using the command

av <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/avengers/avengers.csv", stringsAsFactors = FALSE)
head(av)
##                                                       URL
## 1           http://marvel.wikia.com/Henry_Pym_(Earth-616)
## 2      http://marvel.wikia.com/Janet_van_Dyne_(Earth-616)
## 3       http://marvel.wikia.com/Anthony_Stark_(Earth-616)
## 4 http://marvel.wikia.com/Robert_Bruce_Banner_(Earth-616)
## 5        http://marvel.wikia.com/Thor_Odinson_(Earth-616)
## 6       http://marvel.wikia.com/Richard_Jones_(Earth-616)
##                    Name.Alias Appearances Current. Gender Probationary.Introl
## 1   Henry Jonathan "Hank" Pym        1269      YES   MALE                    
## 2              Janet van Dyne        1165      YES FEMALE                    
## 3 Anthony Edward "Tony" Stark        3068      YES   MALE                    
## 4         Robert Bruce Banner        2089      YES   MALE                    
## 5                Thor Odinson        2402      YES   MALE                    
## 6      Richard Milhouse Jones         612      YES   MALE                    
##   Full.Reserve.Avengers.Intro Year Years.since.joining Honorary Death1 Return1
## 1                      Sep-63 1963                  52     Full    YES      NO
## 2                      Sep-63 1963                  52     Full    YES     YES
## 3                      Sep-63 1963                  52     Full    YES     YES
## 4                      Sep-63 1963                  52     Full    YES     YES
## 5                      Sep-63 1963                  52     Full    YES     YES
## 6                      Sep-63 1963                  52 Honorary     NO        
##   Death2 Return2 Death3 Return3 Death4 Return4 Death5 Return5
## 1                                                            
## 2                                                            
## 3                                                            
## 4                                                            
## 5    YES      NO                                             
## 6                                                            
##                                                                                                                                                                              Notes
## 1                                                                                                                Merged with Ultron in Rage of Ultron Vol. 1. A funeral was held. 
## 2                                                                                                  Dies in Secret Invasion V1:I8. Actually was sent tto Microverse later recovered
## 3 Death: "Later while under the influence of Immortus Stark committed a number of horrible acts and was killed.'  This set up young Tony. Franklin Richards later brought him back
## 4                                                                               Dies in Ghosts of the Future arc. However "he had actually used a hidden Pantheon base to survive"
## 5                                                      Dies in Fear Itself brought back because that's kind of the whole point. Second death in Time Runs Out has not yet returned
## 6                                                                                                                                                                             <NA>

Data Tidying

The avenger data consists of ten columns for each avenger called Death[1-5] to Return[1-5]

Get the data into a format where the five columns for Death[1-5] are replaced by two columns: Time, and Death. Time should be a number between 1 and 5 (look into the function parse_number()); Death is a categorical variables with values “yes”, “no” and ““. Call the resulting data set deaths.

Work here to reshape the data set.

Similarly, deal with the returns of characters. Call the resulting data set `returns``

Work here to reshape the data set.

Answer the question: how many deaths on average, does an Avenger suffer?

Answer here, and include all work and code done to get the answer.

Fact Checking

Pick one of the statements in the FiveThirtyEight analysis and fact check it based on the data. Use dplyr functionality whenever possible.

Answer here, and include all work and code done to get the answer.

Homework Assignment (Part 1)

Flying etiquette

FiveThirtyEight is a website founded by Statistician and writer Nate Silver to publish results from opinion poll analysis, politics, economics, and sports blogging. One of the featured articles considers flying etiquette. This article is based on data collected by FiveThirtyEight and publicly available on github. Use the code below to read in the data from the survey:

fly <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/flying-etiquette-survey/flying-etiquette.csv")

The following couple of lines of code provide a bit of cleanup of the demographic information by reordering the levels of the corresponding factor variables. Run this code in your session.

fly$Age <- factor(fly$Age, levels=c("18-29", "30-44", "45-60", "> 60", ""))
fly$Household.Income <- factor(fly$Household.Income, levels = c("$0 - $24,999","$25,000 - $49,999", "$50,000 - $99,999", "$100,000 - $149,999", "150000", ""))
fly$Education <- factor(fly$Education, levels = c("Less than high school degree", "High school degree", "Some college or Associate degree", "Bachelor degree",  "Graduate degree", ""))
  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.

  2. Some people do not travel often by plane. Provide a (visual) breakdown of travel frequency (use variable How.often.do.you.travel.by.plane.). Reorder the levels in the variable by travel frequency from least frequent travel to most frequent. Draw a barchart of travel frequency and comment on it. Exclude all respondents who never fly from the remainder of the analysis. How many records does the data set have now?

Answer here.

  1. In the demographic variables (Education, Age, and Houshold.Income), replace all occurrences of the empty string “” by a missing value NA. How many responses in each variable do not have any missing values? How many responses have no missing values in any of the three variables? (Hint: think of the function is.na)

Answer here.

  1. Run the command below and interpret the output. What potential purpose can you see for the chart? What might be a problem with the chart? Find at least one purpose and one problem.
library(ggplot2)
fly$Education = with(fly, factor(Education, levels = rev(levels(Education))))

ggplot(data = fly, aes(x = 1)) + 
  geom_bar(aes(fill=Education), position="fill") + 
  coord_flip() +
  theme(legend.position="bottom") +
  scale_fill_brewer() + 
  xlab("Ratio") 

Answer here.

  1. Rename the variable In.general..is.itrude.to.bring.a.baby.on.a.plane. to baby.on.plane.. How many levels does the variable baby.on.plane have, and what are these levels? Rename the level labeled “” to “Not answered”. Bring the levels of baby.on.plane in an order from least rude to most rude. Put the level “Not answered” last. Draw a barchart of variable baby.on.plane. Interpret the result. (This question is very similar to question 2, but preps the data for the next question)

Answer here.

  1. Investigate the relationship between gender and the variables Do.you.have.any.children.under.18. and baby.on.plane. How is the attitude towards babies on planes shaped by gender and own children under 18? Find a plot that summarises your findings (use ggplot2).

Answer here.

Homework Assignment (Part 2)

Liquor Sales in Iowa

This dataset contains the spirits purchase information of Iowa Class “E” liquor licensees by product and date of purchase from January 1, 2017 to current. The dataset can be used to analyze total spirits sales in Iowa of individual products at the store level.

For all of the questions use functionality from the tidyverse whenever possible.

  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.

  2. The code below downloads the data from the Iowa Portal and saves a local copy to your machine. The first time you run this code, make sure you have a good internet connection and enough space on your machine (~50 MB). If the local copy exists, re-knitting the file will load the (binary) file from disk and be much faster.

if (!file.exists("ames-liquor.rds")) {
  url <- "https://github.com/ds202-at-ISU/materials/blob/master/03_tidyverse/data/ames-liquor.rds?raw=TRUE"
  download.file(url, "ames-liquor.rds", mode="wb")
}
data <- readRDS("ames-liquor.rds")
  1. Quick fact overview:
  • how many observations are in data?

Answer here.

  • how many different cities are in the data? (Variable City, careful, trick question!)

Answer here.

  • different stores: how many different stores are in the data? Check first with Store Name, then with Store Number. Discuss differences (give an example), and then answer the question of how many different stores are in the data set.

Answer here.

Note: Your submission is supposed to be fully reproducible, i.e. the TA and I will ‘knit’ your submission in RStudio.

For the submission: submit your solution in an R Markdown file and (just for insurance) submit the corresponding html (or Word) file with it.

(Optional but encouraged):
If you’d like to practice using GitHub, feel free to push your .Rmd and knitted .html file to a public GitHub repository under your own account. If you do, paste the link to your GitHub repo below:

GitHub repo link (optional): __________