+ - 0:00:00
Notes for current slide
Notes for next slide

Reshaping data with tidyr - working with separate and unite

Will Ju

1 / 10

Outline

  • cleaning data (first run)

  • functions separate and unite

2 / 10

parse_number

the tidyverse package readr contains function parse_number

library(readr)
x <- c("3.14", "5.2%", "$10", "5,321.00")
x
## [1] "3.14" "5.2%" "$10" "5,321.00"
parse_number(x)
## [1] 3.14 5.20 10.00 5321.00
3 / 10

Your Turn

Work on the previously created data frame deaths (Avengers). The description of the data is here https://github.com/fivethirtyeight/data/tree/master/avengers

  • Use parse_number to extract from the variable Time a number. Inspect the result.

  • Group by Avenger (use URL) and Died. Find the maximum of Time. Call the result maxdeaths. Interpret the resulting data set.

## `summarise()` has grouped output by 'URL'. You can override using the `.groups`
## argument.
## # A tibble: 10 × 3
## # Groups: URL [10]
## URL Died Time
## <chr> <chr> <dbl>
## 1 http://marvel.wikia.com/2ZP45-9-X-51_(Earth-616)# YES 1
## 2 http://marvel.wikia.com/Abyss_(Ex_Nihilo%27s)_(Earth-616)# YES 1
## 3 http://marvel.wikia.com/Adam_Brashear_(Earth-616)# NO 1
## 4 http://marvel.wikia.com/Alani_Ryan_(Earth-616)# NO 1
## 5 http://marvel.wikia.com/Alexander_Summers_(Earth-616)# NO 1
## 6 http://marvel.wikia.com/Alexis_(Earth-616)# NO 1
## 7 http://marvel.wikia.com/Amadeus_Cho_(Earth-616)# NO 1
## 8 http://marvel.wikia.com/America_Chavez_(Earth-616)# NO 1
## 9 http://marvel.wikia.com/Angelica_Jones_(Earth-616)# NO 1
## 10 http://marvel.wikia.com/Anthony_Druid_(Earth-616)# YES 2
4 / 10

Your Turn

Work with the maxdeaths dataset. Find the frequency break down by Time and Died (group_by, tally).

## # A tibble: 6 × 3
## # Groups: Died [2]
## Died Time n
## <chr> <dbl> <int>
## 1 NO 1 104
## 2 NO 2 1
## 3 YES 1 53
## 4 YES 2 14
## 5 YES 3 1
## 6 YES 5 1

Use the same steps for all of the Avengers' returns and you have the basic info for all statements for lab #3.

5 / 10

separate

Messy (2): Multiple variables are stored in one column

library(tidyverse)
df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
df
## x
## 1 <NA>
## 2 a.b
## 3 a.d
## 4 b.c
df %>% separate(x, into = c("A", "B"))
## A B
## 1 <NA> <NA>
## 2 a b
## 3 a d
## 4 b c
6 / 10

Your Turn (5 min)

The Iowa Data Portal is a wealth of information on and about the State of Iowa.

The website Liquor Sales provides data on every liquor sale in a licensed store in Iowa. The code below reads (part of) the data into an R session.

url <- "https://github.com/ds202-at-ISU/materials/blob/master/03_tidyverse/data/ames-liquor.rds?raw=TRUE"
download.file(url, "ames-liquor.rds", mode="wb")
ames <- readRDS("ames-liquor.rds")
7 / 10

Problems with the data

  • Date is text, in the format of Month/Day/Year (Messy 2)

  • Store location is a textual expression of form POINT (...) and geographic latitude and longitude. (Messy 2)

no Messy 1? - problems of type Messy 1 are typically hard to detect and often up to interpretation/dependent on the analysis to be done.

8 / 10

Your Turn (5 min)

  • Check the help for the function parse_number in the readr package and use it on the store location. What result do you get?

  • Use separate to split the variable for store location into longitude and latitude. (Hint - you might need several steps of separate)

  • Make sure that latitude and longitude are numeric variables.

9 / 10

Your Turn - exploration (5 mins)

Use dplyr functionality to answer the following questions:

  1. What is the total amount spent on Liquor Sales?

  2. What is the single largest sale (in volume/in dollar amount)?

  3. Plot geographic longitude and latitude. Where are liquor sales in Ames happening?

10 / 10

Outline

  • cleaning data (first run)

  • functions separate and unite

2 / 10
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow