R Basics

class: center, middle, inverse, title-slide

.title[
# R Basics
]
.author[
### Will Ju
]

---

# Outline

- R is a calculator

- vectors and indices

- data as a data.frame object

- extracting pieces

- five commands to look at objects

---

## R is a calculator

- Basic algebra is the same as calculator/mathematics

- explicit operators: `2*x` not `2x`,  `2^p` instead of `2p`

- Applying a function is similar to other programming language

- Making a variable, use `<-` (using `=` is NOT encouraged)

- Everything in R is a vector

---

## Examples

| Math | Type | Code | 
|:--- |:-------------:|-------------:|
| `$$x = \frac{2}{3}$$` | Assignments | `x <- 2/3` |
| `$$\sqrt{x}$$` | Functions | `sqrt(x)` |
| `$$y = \left( \begin{array}{c} 1\\4\\5\\2\end{array}\right)$$` | Vectors | `y <- c(1, 4, 5, 2) ` |
| `$$y_2$$` | Indices | `y[2]`|

---

## More Examples

| Math   | Type          | Code           | 
|:--- |:-------------:|-------------:|
| `$$\sum_{i=1}^{4} y_i$$` | Mathematical Operators | `sum(y)` |
| `$$2y$$` |  | `2*y` |

---
class: inverse

## Your Turn (5 min)

- Introduce vector `$x$` defined as `$$x = \left( \begin{array}{c} 4\\1\\3\\9\end{array}\right)$$`

- Introduce vector `$y$` defined as  `$$y = \left( \begin{array}{c} 1\\2\\3\\5\end{array}\right)$$`

- Calculate the Euclidean distance between the two vectors `$x$` and `$y$`, defined as

`$$d = \sqrt{\sum_{i=1}^4 (x_i - y_i)^2}$$`

---

## Vectors and Indices

Vectors are the most basic building structures of `R`

+ there are different types of vectors

```r
num_vec <- c(1, 2, 3)

char_vec <- c('apple', 'banana', 'cherry')

bool_vec <- c(TRUE, FALSE, TRUE)
```

+ In `R`, indices start at 1 instead of 0

---

## data as a data.frame object

What is a typical dataset look like?

```r
head(mtcars)
```

```
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
```

- each row represents one observation
- each column represents one variable (feature)

---

## data as a data.frame object

How to create a dataset?

- values of a variable is stored in a vector
- vectors are collected by a `data.frame()`

```r
mydata <- data.frame(
 num_vec,
 char_vec,
 bool_vec
)
mydata
```

```
##   num_vec char_vec bool_vec
## 1       1    apple     TRUE
## 2       2   banana    FALSE
## 3       3   cherry     TRUE
```

---

## Getting help within R

If you want to know what a specific `command` is doing:

```r
?command

help("command")

help.search("command")

??command
```

---

## Loading class data

- Some R packages have in-built datasets

- For this class, there is an R package available on github

- we will use a function `install_github` from `remotes` package. So if you haven't installed `remotes` package, please do so first

```r
install.packages("remotes")
```

- Installing/Updating `classdata` package (once every so often):

```r
remotes::install_github("heike/classdata")
```

- Make the data available (every time you start R):

```r
library(classdata)
```

---
class: inverse
## Your Turn (5 min)

- Install the package `classdata` on your machine

- Make the package active in your current R session:

```r
library(classdata)
```

- Check the R help on the dataset `fbi`

- What happens if you just type in the name of the dataset?

---

## Extracting parts of objects

- use indices

```
x[row_index, col_index]
x[1:5, 2:3]
x[c(1,5,6), c(3,4,5)]
```

- For a 2D object (a typical datasets) with column (variable) names

```
x$variable
x[, var_names]
x[, c("State", "Year")]
```

Try these commands out for yourself on the `fbi` data.

---

## five commands to look at objects

for object `x`, we can try out the following commands:

- `x`

- `head(x)`

- `summary(x)`

- `str(x)`

- `dim(x)`

Try these commands out for yourself on the `fbi` data.

---

## `str` stands for *structure*

- `str` shows us the **str**ucture of an object

- fbi is a data frame with columns (variables) and rows (records)

```r
str(fbi)
```

```
## Classes 'tbl_df', 'tbl' and 'data.frame':	19476 obs. of  8 variables:
##  $ state        : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ state_id     : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ state_abbr   : chr  "AL" "AL" "AL" "AL" ...
##  $ year         : int  1983 1983 1983 1983 1983 1983 1983 1983 1983 1985 ...
##  $ population   : int  3959000 3959000 3959000 3959000 3959000 3959000 3959000 3959000 3959000 4021000 ...
##  $ type         : chr  "homicide" "rape_legacy" "rape_revised" "robbery" ...
##  $ count        : int  364 931 NA 3895 11281 42485 94279 9126 981 396 ...
##  $ violent_crime: logi  TRUE TRUE TRUE TRUE TRUE FALSE ...
```

---

## Statistical summaries

Elements of the five point summary: 
`mean, median, min, max, quartiles`

Other summary statistics: 
`range, sd, var`

Summaries of two variables: 
`cor, cov`

---
class: inverse
## Your turn

- Look at the first 10 data records of the `fbi` data

- Compute mean and standard deviation for the number of counts. Why do you get NAs? (read `?NA`)

- Advanced:  Read `?mean` and `?sd`, and fix missing value problem

---

## More about R

- lists

- control structures

- user-defined functions

---

## lists

`list` is also a fundamental building structure of `R`. Vector stores values of the same type, but `list` can store values of different types.

```r
will <- list(
 name = 'Will',
 age = 28,
 lives_in_Ames = TRUE
)

will
```

```
## $name
## [1] "Will"
## 
## $age
## [1] 28
## 
## $lives_in_Ames
## [1] TRUE
```

---

# control structures

+ if statements
+ for statements
+ while statements
+ repeat statements
+ break and next statements
+ switch statements

---

# control structures

+ if statements

```r
if (will$lives_in_Ames) {
  print("Ames is good")
}
```

```
## [1] "Ames is good"
```

+ for statements

```r
for (num in num_vec) {
  print(num)
}
```

```
## [1] 1
## [1] 2
## [1] 3
```

---

# user-defined functions

```r
plus_a_b <- function(arg1, arg2) {
 arg1 + arg2
}

plus_a_b(3, 10)
```

```
## [1] 13
```