+ - 0:00:00
Notes for current slide
Notes for next slide

dplyr examples: happiness

Will Ju

1 / 9

The Happy data from GSS

The General Social Survey (GSS) has been run by NORC every other year since 1972 to keep track of current opinions across the United States.

An excerpt of the GSS data called happy is available from the classdata package:

remotes::install_github("heike/classdata")
library(classdata)
head(happy)
## year age degree finrela happy health marital
## 1 1972 23 bachelor average not too happy good never married
## 2 1972 70 lt high school above average not too happy fair married
## 3 1972 48 high school average pretty happy excellent married
## 4 1972 27 bachelor average not too happy good married
## 5 1972 61 high school above average pretty happy good married
## 6 1972 26 high school above average pretty happy good never married
## sex polviews partyid wtssall wtssnr
## 1 female <NA> ind,near dem 7 1147
## 2 male <NA> not str democrat 54 1147
## 3 female <NA> independent 54 1147
## 4 female <NA> not str democrat 54 1147
## 5 female <NA> strong democrat 54 1147
## 6 male <NA> ind,near dem 7 1147

You can find a codebook with explanations for each of the variables at https://gssdataexplorer.norc.org/

2 / 9

Your Turn

Load the happy data from the classdata package.

  • how many variables, how many observations does the data have? What do the variables mean?

  • Plot the variable happy. Introduce a new variable nhappy that has values 1 for not too happy, 2 for pretty happy, 3 for very happy and NA for missing values. There are multiple ways to get to that. Avoid for loops.

  • Based on the newly introduced numeric scores, what is the average happiness of respondents?

3 / 9

Your Turn

  • Are people now happier than previously? How does happiness evolve over time? Is this relationship different for men and women? Draw plots.
  • how does average happiness change over the course of a life time? Is this relationship different for men and women? Draw plots.
4 / 9

Your Turn

  • Are Republicans or Democrats happier? Compare average happiness levels over partyid.

  • How are financial relations associated with average happiness levels? Is this association different for men and women?

  • Find a plot that shows the differences for each one of the summaries.

5 / 9

Your Turn: asking questions

  • What other variable(s) might be associated with happiness? Plot it.

  • Submit your code in Canvas for one point of extra credit.

6 / 9

Helper functions (1)

  • n() provides the number of rows of a subset:
library(dplyr)
happy %>% group_by(sex) %>% summarise(n = n())
## # A tibble: 3 × 2
## sex n
## <fct> <int>
## 1 male 30350
## 2 female 38404
## 3 <NA> 92
  • tally() is a combination of summarise and n
happy %>% group_by(sex) %>% tally()
## # A tibble: 3 × 2
## sex n
## <fct> <int>
## 1 male 30350
## 2 female 38404
## 3 <NA> 92
7 / 9

Helper functions (2)

  • count() is a further shortcut of group_by and tally:
happy %>% count(sex, degree)
## sex degree n
## 1 male lt high school 6021
## 2 male high school 14828
## 3 male junior college 1637
## 4 male bachelor 4951
## 5 male graduate 2828
## 6 male <NA> 85
## 7 female lt high school 7766
## 8 female high school 19942
## 9 female junior college 2400
## 10 female bachelor 5551
## 11 female graduate 2641
## 12 female <NA> 104
## 13 <NA> lt high school 46
## 14 <NA> high school 22
## 15 <NA> junior college 1
## 16 <NA> bachelor 9
## 17 <NA> graduate 7
## 18 <NA> <NA> 7
  • count() doesn't introduce any grouping
8 / 9

Grouping and Ungrouping

  • ungroup removes a grouping structure from a data set

  • necessary to make changes to a grouping variable (such as re-ordering or re-labelling)

9 / 9

The Happy data from GSS

The General Social Survey (GSS) has been run by NORC every other year since 1972 to keep track of current opinions across the United States.

An excerpt of the GSS data called happy is available from the classdata package:

remotes::install_github("heike/classdata")
library(classdata)
head(happy)
## year age degree finrela happy health marital
## 1 1972 23 bachelor average not too happy good never married
## 2 1972 70 lt high school above average not too happy fair married
## 3 1972 48 high school average pretty happy excellent married
## 4 1972 27 bachelor average not too happy good married
## 5 1972 61 high school above average pretty happy good married
## 6 1972 26 high school above average pretty happy good never married
## sex polviews partyid wtssall wtssnr
## 1 female <NA> ind,near dem 7 1147
## 2 male <NA> not str democrat 54 1147
## 3 female <NA> independent 54 1147
## 4 female <NA> not str democrat 54 1147
## 5 female <NA> strong democrat 54 1147
## 6 male <NA> ind,near dem 7 1147

You can find a codebook with explanations for each of the variables at https://gssdataexplorer.norc.org/

2 / 9
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow