+ - 0:00:00
Notes for current slide
Notes for next slide

DS 2020 - Working in teams

1 / 11
2 / 11

A test case

The dataset brfss_iowa.csv (linked from website) contains 6227 records from the Behavioral Risk Factor Surveillance System (BRFSS) for Iowans.

The Behavioral Risk Factor Surveillance System (BRFSS) is the nation's premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors.

It is conducted annually by the Center for Disease Control and Prevention (CDC).

Codebook with detailed explanations of variables is https://www.cdc.gov/brfss/annual_data/2015/pdf/codebook15_llcp.pdf.

3 / 11

Your turn

For this "Your turn", a data set is provided, and you are asked to answer several questions based on the dataset.

  • Imagine that you are in a collaborative environment, and you want to make sure that your colleague can reproduce your work.

  • It's OKAY if you can NOT figure some of the questions out. Just try your best and document what you did. Remember: your "colleague" will read your code and what you write.

The goal of this exercise is to

  • make you think about how to make your work reproducible
  • give you a taste of doing data analysis with R
4 / 11

Your turn

You are asked to investigate the relationship between height and weight of Iowans based on the 2015 BRFSS records.

In the next 20 mins:

  1. Introduce yourself to your neighbor(s).

  2. Complete the following tasks and write instructions / documentation for a collaborator to be able to reproduce your findings.

  3. Verify that there are 6227 cases (= number of interviews) in the data.

  4. Verify that there are variables WEIGHT2 and HEIGHT3 in the data and read the description of the variables in the codebook.

  5. How are height and weight related? Find correlations of weight and height by gender (SEX). How many values are the correlations based on for each gender?

  6. Write a short report of your findings. Address potential problems in the data.

5 / 11

Your turn

In the next 10 mins:

  1. Introduce yourself to another team.

  2. Swap the report and the instructions / documentation with your collaborator - BUT DO NOT TALK ABOUT THE DATA OR THE ANALYSIS (yet)

  3. Try to reproduce the other team's findings.

  4. Only in case things are not working - talk to the other team and walk them through your report.

6 / 11

How did that go?

  • Where did things get problematic?
7 / 11

Reflection

  • What tools did you use?

  • Were you successful in reproducing each others' work?

  • What would happen if your collaborator is no longer available to walk you through their analysis?

  • What made it easy / hard for reproducing your partners' work?

  • What would have to happen if

    • you had to swap out the dataset or extend the analysis further?
    • you caught further data errors and had to re-create the analysis with corrections?
    • you had to revert back to the original dataset?
8 / 11

Summary

  • Everyone struggles with reproducibility and it is a hindrance to moving science forward

  • Even with a fairly simple analysis challenges were faced in four main areas: organization, documentation, automation, and dissemination

9 / 11

Facets of reproducibility

10 / 11

Four facets of reproducibility

  1. Documentation: difference between binary files (e.g. docx) and text files and why text files are preferred for documentation

    • Protip: Use markdown to document your workflow so that anyone can pick up your data and follow what you are doing
    • Protip: Use literate programming so that your analysis and your results are tightly connected, or better yet, unseparable
  2. Organization: tools to organize your projects so that you don't have a single folder with hundreds of files

  3. Automation: the power of scripting to create automated data analyses

  4. Dissemination: publishing is not the end of your analysis, rather it is a way station towards your future research and the future research of others

11 / 11
2 / 11
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow