class: center, middle, inverse, title-slide .title[ # dplyr examples: happiness ] .author[ ### Will Ju ] --- # The Happy data from GSS The General Social Survey (GSS) has been run by NORC every other year since 1972 to keep track of current opinions across the United States. An excerpt of the GSS data called `happy` is available from the `classdata` package: ```r remotes::install_github("heike/classdata") ``` ```r library(classdata) head(happy) ``` ``` ## year age degree finrela happy health marital ## 1 1972 23 bachelor average not too happy good never married ## 2 1972 70 lt high school above average not too happy fair married ## 3 1972 48 high school average pretty happy excellent married ## 4 1972 27 bachelor average not too happy good married ## 5 1972 61 high school above average pretty happy good married ## 6 1972 26 high school above average pretty happy good never married ## sex polviews partyid wtssall wtssnr ## 1 female <NA> ind,near dem 7 1147 ## 2 male <NA> not str democrat 54 1147 ## 3 female <NA> independent 54 1147 ## 4 female <NA> not str democrat 54 1147 ## 5 female <NA> strong democrat 54 1147 ## 6 male <NA> ind,near dem 7 1147 ``` You can find a codebook with explanations for each of the variables at https://gssdataexplorer.norc.org/ --- class: inverse # Your Turn Load the `happy` data from the `classdata` package. - how many variables, how many observations does the data have? What do the variables mean? - Plot the variable `happy`. Introduce a new variable `nhappy` that has values 1 for `not too happy`, 2 for `pretty happy`, 3 for `very happy` and `NA` for missing values. There are multiple ways to get to that. Avoid `for` loops. - Based on the newly introduced numeric scores, what is the average happiness of respondents? --- class: inverse # Your Turn - Are people now happier than previously? How does happiness evolve over time? Is this relationship different for men and women? Draw plots. - how does average happiness change over the course of a life time? Is this relationship different for men and women? Draw plots. --- class: inverse # Your Turn - Are Republicans or Democrats happier? Compare average happiness levels over `partyid`. - How are financial relations associated with average happiness levels? Is this association different for men and women? - Find a plot that shows the differences for each one of the summaries. --- class: inverse # Your Turn: asking questions - What other variable(s) might be associated with happiness? Plot it. - Submit your code in Canvas for one point of extra credit. --- # Helper functions (1) - `n()` provides the number of rows of a subset: ```r library(dplyr) happy %>% group_by(sex) %>% summarise(n = n()) ``` ``` ## # A tibble: 3 × 2 ## sex n ## <fct> <int> ## 1 male 30350 ## 2 female 38404 ## 3 <NA> 92 ``` - `tally()` is a combination of `summarise` and `n` ```r happy %>% group_by(sex) %>% tally() ``` ``` ## # A tibble: 3 × 2 ## sex n ## <fct> <int> ## 1 male 30350 ## 2 female 38404 ## 3 <NA> 92 ``` --- # Helper functions (2) - `count()` is a further shortcut of `group_by` and `tally`: ```r happy %>% count(sex, degree) ``` ``` ## sex degree n ## 1 male lt high school 6021 ## 2 male high school 14828 ## 3 male junior college 1637 ## 4 male bachelor 4951 ## 5 male graduate 2828 ## 6 male <NA> 85 ## 7 female lt high school 7766 ## 8 female high school 19942 ## 9 female junior college 2400 ## 10 female bachelor 5551 ## 11 female graduate 2641 ## 12 female <NA> 104 ## 13 <NA> lt high school 46 ## 14 <NA> high school 22 ## 15 <NA> junior college 1 ## 16 <NA> bachelor 9 ## 17 <NA> graduate 7 ## 18 <NA> <NA> 7 ``` - `count()` doesn't introduce any grouping --- # Grouping and Ungrouping - `ungroup` removes a grouping structure from a data set - necessary to make changes to a grouping variable (such as re-ordering or re-labelling)