class: center, middle, inverse, title-slide .title[ # R Basics ] .author[ ### Will Ju ] --- # Outline - R is a calculator - vectors and indices - data as a data.frame object - extracting pieces - five commands to look at objects --- ## R is a calculator - Basic algebra is the same as calculator/mathematics - explicit operators: `2*x` not `2x`, `2^p` instead of `2p` - Applying a function is similar to other programming language - Making a variable, use `<-` (using `=` is NOT encouraged) - Everything in R is a vector --- ## Examples | Math | Type | Code | |:--- |:-------------:|-------------:| | `$$x = \frac{2}{3}$$` | Assignments | `x <- 2/3` | | `$$\sqrt{x}$$` | Functions | `sqrt(x)` | | `$$y = \left( \begin{array}{c} 1\\4\\5\\2\end{array}\right)$$` | Vectors | `y <- c(1, 4, 5, 2) ` | | `$$y_2$$` | Indices | `y[2]`| --- ## More Examples | Math | Type | Code | |:--- |:-------------:|-------------:| | `$$\sum_{i=1}^{4} y_i$$` | Mathematical Operators | `sum(y)` | | `$$2y$$` | | `2*y` | --- class: inverse ## Your Turn (5 min) - Introduce vector `\(x\)` defined as `$$x = \left( \begin{array}{c} 4\\1\\3\\9\end{array}\right)$$` - Introduce vector `\(y\)` defined as `$$y = \left( \begin{array}{c} 1\\2\\3\\5\end{array}\right)$$` - Calculate the Euclidean distance between the two vectors `\(x\)` and `\(y\)`, defined as `$$d = \sqrt{\sum_{i=1}^4 (x_i - y_i)^2}$$` --- ## Vectors and Indices Vectors are the most basic building structures of `R` + there are different types of vectors ```r num_vec <- c(1, 2, 3) char_vec <- c('apple', 'banana', 'cherry') bool_vec <- c(TRUE, FALSE, TRUE) ``` + In `R`, indices start at 1 instead of 0 --- ## data as a data.frame object What is a typical dataset look like? ```r head(mtcars) ``` ``` ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 ``` - each row represents one observation - each column represents one variable (feature) --- ## data as a data.frame object How to create a dataset? - values of a variable is stored in a vector - vectors are collected by a `data.frame()` ```r mydata <- data.frame( num_vec, char_vec, bool_vec ) mydata ``` ``` ## num_vec char_vec bool_vec ## 1 1 apple TRUE ## 2 2 banana FALSE ## 3 3 cherry TRUE ``` --- ## Getting help within R If you want to know what a specific `command` is doing: ```r ?command help("command") help.search("command") ??command ``` --- ## Loading class data - Some R packages have in-built datasets - For this class, there is an R package available on github - we will use a function `install_github` from `remotes` package. So if you haven't installed `remotes` package, please do so first ```r install.packages("remotes") ``` - Installing/Updating `classdata` package (once every so often): ```r remotes::install_github("heike/classdata") ``` - Make the data available (every time you start R): ```r library(classdata) ``` --- class: inverse ## Your Turn (5 min) - Install the package `classdata` on your machine <br> - Make the package active in your current R session: ```r library(classdata) ``` - Check the R help on the dataset `fbi`<br> - What happens if you just type in the name of the dataset? --- ## Extracting parts of objects - use indices ``` x[row_index, col_index] x[1:5, 2:3] x[c(1,5,6), c(3,4,5)] ``` - For a 2D object (a typical datasets) with column (variable) names ``` x$variable x[, var_names] x[, c("State", "Year")] ``` <font color="darkblue">Try these commands out for yourself on the `fbi` data.</font> --- ## five commands to look at objects for object `x`, we can try out the following commands: - `x` - `head(x)` - `summary(x)` - `str(x)` - `dim(x)` <br><br><br> <font color="darkblue">Try these commands out for yourself on the `fbi` data.</font> --- ## `str` stands for *structure* - `str` shows us the **str**ucture of an object - fbi is a data frame with columns (variables) and rows (records) ```r str(fbi) ``` ``` ## Classes 'tbl_df', 'tbl' and 'data.frame': 19476 obs. of 8 variables: ## $ state : chr "Alabama" "Alabama" "Alabama" "Alabama" ... ## $ state_id : int 2 2 2 2 2 2 2 2 2 2 ... ## $ state_abbr : chr "AL" "AL" "AL" "AL" ... ## $ year : int 1983 1983 1983 1983 1983 1983 1983 1983 1983 1985 ... ## $ population : int 3959000 3959000 3959000 3959000 3959000 3959000 3959000 3959000 3959000 4021000 ... ## $ type : chr "homicide" "rape_legacy" "rape_revised" "robbery" ... ## $ count : int 364 931 NA 3895 11281 42485 94279 9126 981 396 ... ## $ violent_crime: logi TRUE TRUE TRUE TRUE TRUE FALSE ... ``` --- ## Statistical summaries Elements of the five point summary: <br> `mean, median, min, max, quartiles` Other summary statistics:<br> `range, sd, var` Summaries of two variables:<br> `cor, cov` --- class: inverse ## Your turn - Look at the first 10 data records of the `fbi` data - Compute mean and standard deviation for the number of counts. Why do you get NAs? (read `?NA`) - Advanced: Read `?mean` and `?sd`, and fix missing value problem --- ## More about R - lists - control structures - user-defined functions --- ## lists `list` is also a fundamental building structure of `R`. Vector stores values of the same type, but `list` can store values of different types. ```r will <- list( name = 'Will', age = 28, lives_in_Ames = TRUE ) will ``` ``` ## $name ## [1] "Will" ## ## $age ## [1] 28 ## ## $lives_in_Ames ## [1] TRUE ``` --- # control structures + if statements + for statements + while statements + repeat statements + break and next statements + switch statements --- # control structures + if statements ```r if (will$lives_in_Ames) { print("Ames is good") } ``` ``` ## [1] "Ames is good" ``` + for statements ```r for (num in num_vec) { print(num) } ``` ``` ## [1] 1 ## [1] 2 ## [1] 3 ``` --- # user-defined functions ```r plus_a_b <- function(arg1, arg2) { arg1 + arg2 } plus_a_b(3, 10) ``` ``` ## [1] 13 ```