dplyr examples: happiness

class: center, middle, inverse, title-slide

.title[
# dplyr examples: happiness
]
.author[
### Will Ju
]

---

# The Happy data from GSS

The General Social Survey (GSS) has been run by NORC every other year since 1972 to keep track of current opinions across the United States.

An excerpt of the GSS data called `happy` is available from the `classdata` package:

```r
remotes::install_github("heike/classdata")
```

```r
library(classdata)
head(happy)
```

```
##   year age         degree       finrela         happy    health       marital
## 1 1972  23       bachelor       average not too happy      good never married
## 2 1972  70 lt high school above average not too happy      fair       married
## 3 1972  48    high school       average  pretty happy excellent       married
## 4 1972  27       bachelor       average not too happy      good       married
## 5 1972  61    high school above average  pretty happy      good       married
## 6 1972  26    high school above average  pretty happy      good never married
##      sex polviews          partyid wtssall wtssnr
## 1 female     <NA>     ind,near dem       7   1147
## 2   male     <NA> not str democrat      54   1147
## 3 female     <NA>      independent      54   1147
## 4 female     <NA> not str democrat      54   1147
## 5 female     <NA>  strong democrat      54   1147
## 6   male     <NA>     ind,near dem       7   1147
```

You can find a codebook with explanations for each of the variables at https://gssdataexplorer.norc.org/

---
class: inverse
# Your Turn

Load the `happy` data from the `classdata` package.

- how many variables, how many observations does the data have? What do the variables mean?

- Plot the variable `happy`. Introduce a new variable `nhappy` that has values 1 for `not too happy`, 2 for `pretty happy`, 3 for `very happy` and `NA` for missing values. There are multiple ways to get to that. Avoid `for` loops.

- Based on the newly introduced numeric scores, what is the average happiness of respondents?

---
class: inverse
# Your Turn

- Are people now happier than previously? How does happiness evolve over time? Is this relationship different for men and women? Draw plots.

- how does average happiness change over the course of a life time? Is this relationship different for men and women? Draw plots.

---
class: inverse
# Your Turn

- Are Republicans or Democrats happier? Compare average happiness levels over `partyid`.

- How are financial relations associated with average happiness levels? Is this association different for men and women?

- Find a plot that shows the differences for each one of the summaries.

---
class: inverse
# Your Turn: asking questions

- What other variable(s) might be associated with happiness? Plot it.

- Submit your code in Canvas for one point of extra credit.

---

# Helper functions (1)

- `n()` provides the number of rows of a subset:

```r
library(dplyr)
happy %>% group_by(sex) %>% summarise(n = n())
```

```
## # A tibble: 3 × 2
##   sex        n
##   <fct>  <int>
## 1 male   30350
## 2 female 38404
## 3 <NA>      92
```

- `tally()` is a combination of `summarise` and `n`

```r
happy %>% group_by(sex) %>% tally()
```

```
## # A tibble: 3 × 2
##   sex        n
##   <fct>  <int>
## 1 male   30350
## 2 female 38404
## 3 <NA>      92
```

---

# Helper functions (2)

- `count()` is a further shortcut of `group_by` and `tally`:

```r
happy %>% count(sex, degree)
```

```
##       sex         degree     n
## 1    male lt high school  6021
## 2    male    high school 14828
## 3    male junior college  1637
## 4    male       bachelor  4951
## 5    male       graduate  2828
## 6    male           <NA>    85
## 7  female lt high school  7766
## 8  female    high school 19942
## 9  female junior college  2400
## 10 female       bachelor  5551
## 11 female       graduate  2641
## 12 female           <NA>   104
## 13   <NA> lt high school    46
## 14   <NA>    high school    22
## 15   <NA> junior college     1
## 16   <NA>       bachelor     9
## 17   <NA>       graduate     7
## 18   <NA>           <NA>     7
```

- `count()` doesn't introduce any grouping

---

# Grouping and Ungrouping

- `ungroup` removes a grouping structure from a data set

- necessary to make changes to a grouping variable (such as re-ordering or re-labelling)