class: center, middle, inverse, title-slide .title[ # DS 202 - lab #3: Avengers’ Perils ] .author[ ### Will Ju ] --- ![](https://upload.wikimedia.org/wikipedia/en/2/2b/Avengers_%28Marvel_Comics%29_vol_3_num_38.jpg) # Overview In this activity we are going to 1. reshape a data set, and then 2. use the data to fact-check some statistics in a published report. The deliverable is, again, a team-edited report. --- # Getting Ready 1. Identify your team! Go to Canvas and find out which team you are in for Lab 3. 2. If you are participating remotely, find your teammates via Slack / email. You can use any platform for collaboration that works for you all (Microsoft Teams, discord, WebEx, Zoom, etc) If you are in the classroom, find the other members of your team and sit with them. 3. Introduce yourself to each other. 4. Go to https://ds202-at-isu.github.io/labs/lab03.html and follow the instructions. --- # Step-by Step 1. Accept the link to Github Classroom shared in the announcement/chat. - This link will ask you to log in to github. Select your name from the list by clicking on it. <!-- https://classroom.github.com/a/pfl2YPZY--> - Check if your team number already exists - if it does, join the team with the right number. If it doesn't exist yet, create it yourself. 2. Overall aim of this lab: Tidy a data set, then use it for fact-checking a report. --- # Data Background The data was collected by FiveThirtyEight. This time we are interested in a review of life and death among the Avengers, a group of super heroes in the Marvel universe; the accompanying article is published [here](https://fivethirtyeight.com/features/avengers-death-comics-age-of-ultron/) The description of the data is [here](https://github.com/fivethirtyeight/data/tree/master/avengers) In a universe where time travel, alternate lifelines and portals with all kinds of powers exist, dying might be less permanent than in the world that we are used to. --- # Data Background (2) In 2015 FiveThirtyEight has conducted a comprehensive review of all Avengers. We can access the resulting data using the command ```r av <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/avengers/avengers.csv", stringsAsFactors = FALSE) head(av) ``` ``` ## URL ## 1 http://marvel.wikia.com/Henry_Pym_(Earth-616) ## 2 http://marvel.wikia.com/Janet_van_Dyne_(Earth-616) ## 3 http://marvel.wikia.com/Anthony_Stark_(Earth-616) ## 4 http://marvel.wikia.com/Robert_Bruce_Banner_(Earth-616) ## 5 http://marvel.wikia.com/Thor_Odinson_(Earth-616) ## 6 http://marvel.wikia.com/Richard_Jones_(Earth-616) ## Name.Alias Appearances Current. Gender Probationary.Introl ## 1 Henry Jonathan "Hank" Pym 1269 YES MALE ## 2 Janet van Dyne 1165 YES FEMALE ## 3 Anthony Edward "Tony" Stark 3068 YES MALE ## 4 Robert Bruce Banner 2089 YES MALE ## 5 Thor Odinson 2402 YES MALE ## 6 Richard Milhouse Jones 612 YES MALE ## Full.Reserve.Avengers.Intro Year Years.since.joining Honorary Death1 Return1 ## 1 Sep-63 1963 52 Full YES NO ## 2 Sep-63 1963 52 Full YES YES ## 3 Sep-63 1963 52 Full YES YES ## 4 Sep-63 1963 52 Full YES YES ## 5 Sep-63 1963 52 Full YES YES ## 6 Sep-63 1963 52 Honorary NO ## Death2 Return2 Death3 Return3 Death4 Return4 Death5 Return5 ## 1 ## 2 ## 3 ## 4 ## 5 YES NO ## 6 ## Notes ## 1 Merged with Ultron in Rage of Ultron Vol. 1. A funeral was held. ## 2 Dies in Secret Invasion V1:I8. Actually was sent tto Microverse later recovered ## 3 Death: "Later while under the influence of Immortus Stark committed a number of horrible acts and was killed.' This set up young Tony. Franklin Richards later brought him back ## 4 Dies in Ghosts of the Future arc. However "he had actually used a hidden Pantheon base to survive" ## 5 Dies in Fear Itself brought back because that's kind of the whole point. Second death in Time Runs Out has not yet returned ## 6 <NA> ``` --- # As a team: Data tidying The avenger data consists of ten columns for each avenger called Death[1-5] to Return[1-5] Get the data into a format where the five columns for Death[1-5] are replaced by two columns: Time, and Death. Time should be a number between 1 and 5 (look into the function `parse_number`); Death is a categorical variables with values "yes", "no" and "". Call the resulting data set `deaths`. Similarly, deal with the returns of characters. Answer the question: how many deaths on average, does an Avenger suffer? --- # Individually: fact-check Each team member picks one of the statements in the FiveThirtyEight [analysis](https://fivethirtyeight.com/features/avengers-death-comics-age-of-ultron/) and fact checks it based on the data. Use dplyr functionality whenever possible. Upload your answers and the code to the repository. Discuss and refine answers as a team. --- # Submission Knit the Rmd document and upload both the md and the Rmd file to the team's repository. Due date: You have time until Monday at 11:59 pm to submit the final RMmarkdown file. One team member: upload the team's repo link to Canvas (just to signal to the instructor that you are done)