class: title-slide, nobar ## NRI 7350 # Getting started with R  .footnote[[@allison_horst](https://github.com/allisonhorst/stats-illustrations)] --- # Check-in - Everyone getting emails? - Email about these slides? - Everyone have access to these slides? https://steffilazerte.ca/NRI_7350/slides.html --- # About these Labs ## Format - I will provide you tools and workflow to get started with R - I will go over specific statistical functions - How to run them - How to interpret the results - We'll have hands-on, lecture, and demonstrations .spacer[] ## R is hard: But have no fear! - Don't expect to remember everything! - Copy/Paste is your friend (never apologize for using it!) - Consider these labs a resource to return to --- # About these Labs ## Format - I will provide you tools and workflow to get started with R - I will go over specific statistical functions - How to run them - How to interpret the results - We'll have hands-on, lecture, and demonstrations .spacer[] ## R is hard: But have no fear! - **Don't expect to remember everything!** - Copy/Paste is your friend (never apologize for using it!) - Consider these labs a resource to return to --- background-image: url(figures/impostR_en.png) background-position: center center background-size: 70% # Impost**R** Syndrome --- background-image: url(figures/impostR_en.png) background-position: right 75px top 25% background-size: 30% # Impost**R** Syndrome  --  --- class: nobar  .footnote[[@allison_horst](https://github.com/allisonhorst/stats-illustrations)] --- class: section # About R --- layout: true # Why R? --- background-image: url(figures/R_hard.png) background-position: right 15% bottom 10% background-size: 70% ## R is hard --- background-image: url(figures/R_powerful2_edit.png) background-position: center bottom 40% background-size: 70% ## But R is powerful (and reproducible)! -- .footnote[(I made these slides with **R**markdown)] --- background-image: url(figures/spatial.png) background-position: center bottom 10% background-size: 40% ## R is also beautiful --- background-image: url(figures/R_free.png) background-position: center bottom 40% background-size: 70% ## R is affordable (i.e., free!) --- layout: false class: section # What is R? --- # R is Programming language > A programming **language** is a way to give instructions in order to get a computer to do something - You need to know the language (i.e., the code) - Computers don't know what you mean, only what you type (unfortunately) - Spelling, punctuation, and capitalization all matter! ## For example **R, what is 56 times 5.8?** ```r 56 * 5.8 ``` ``` ## [1] 324.8 ``` --- # Use code to tell R what to do **R, what is the average of numbers 1, 2, 3, 4?** ```r mean(c(1, 2, 3, 4)) ``` ``` ## [1] 2.5 ``` -- **R, save this value for later** ```r steffis_mean <- mean(c(1, 2, 3, 4)) ``` -- **R, multiply this value by 6** ```r steffis_mean * 6 ``` ``` ## [1] 15 ``` --- class: split-50 # Code, Output, Scripts .columnl[ ## Code - The actual commands ## Output - The result of running code or a script ## Script - A text file full of code that you want to run - You should always keep your code in a script ] -- .columnr[ ## For example: ```r mean(c(1, 2, 3, 4)) ``` ``` ## [1] 2.5 ``` ]     --- # RStudio vs. R   ![:spacer 175px]() - **RStudio** is not **R** - RStudio is a User Interface or IDE (integrated development environment) - (i.e., Makes coding simpler) - But sometimes tries to be **too** helpful --- # RStudio Features ### Changing Options: Tools > Global Options - General > Restore RData into workspace at startup (NO!) - General > Save workspace to on exit (NEVER!) - Code > Insert matching parens/quotes (Personal preference) ## Projects - Handles working directories - Organizes your work ## Packages - Can use the package manager to install packages - Can use the manager to load them as well, but not recommended - Load packages in your script so you remember which ones you used! --- class: section # Let's take a look at RStudio ## Set up a Project for this course --- class: section # Your first *real* code! --- # First Code <code class ='r hljs remark-code'># First load the package<br>library(tidyverse)<br><br># Now create the figure<br>ggplot(data = msleep, aes(x = sleep_total, y = sleep_rem, colour = vore)) +<br> geom_point()</code> .spacer[ ] - Copy/paste or type this into the script window in RStudio - You may have to go to File > New File > R Script - Click anywhere on the first line of code - Use the 'Run' button to run this code, **or** use the short-cut `Ctrl-Enter` - Repeat until all the code has run --- layout: true # First Code --- <code class ='r hljs remark-code'># First load the package<br>library(tidyverse)<br><br># Now create the figure<br>ggplot(data = msleep, aes(x = sleep_total, y = sleep_rem, colour = vore)) +<br> geom_point()</code> ``` ## Warning: Removed 22 rows containing missing values (geom_point). ``` <img src="1 Introduction to R_files/figure-html/first_code-flaired-q9qg8ys-1.png" width="60%" style="display: block; margin: auto;" /> --- <code class ='r hljs remark-code'># First load the package<br>library(<span style="background-color:#ffff7f">tidyverse</span>)<br><br># Now create the figure<br>ggplot(data = msleep, aes(x = sleep_total, y = sleep_rem, colour = vore)) +<br> geom_point()</code> ``` ## Warning: Removed 22 rows containing missing values (geom_point). ``` <img src="1 Introduction to R_files/figure-html/first_code-flaired-fea0vqm-1.png" width="60%" style="display: block; margin: auto;" />  --- <code class ='r hljs remark-code'># First load the package<br><span style="background-color:#ffff7f">library</span>(tidyverse)<br><br># Now create the figure<br><span style="background-color:#ffff7f">ggplot</span>(data = msleep, <span style="background-color:#ffff7f">aes</span>(x = sleep_total, y = sleep_rem, colour = vore)) +<br> <span style="background-color:#ffff7f">geom_point</span>()</code> ``` ## Warning: Removed 22 rows containing missing values (geom_point). ``` <img src="1 Introduction to R_files/figure-html/first_code-flaired-b5tewze-1.png" width="60%" style="display: block; margin: auto;" /> </code>, <code>ggplot()</code><br><code>aes()</code>, and <code>geom_point()</code>) --- <code class ='r hljs remark-code'># First load the package<br>library(tidyverse)<br><br># Now create the figure<br>ggplot(data = msleep, aes(x = sleep_total, y = sleep_rem, colour = vore)) <span style="background-color:#ffff7f">+</span><br> geom_point()</code> ``` ## Warning: Removed 22 rows containing missing values (geom_point). ``` <img src="1 Introduction to R_files/figure-html/first_code-flaired-51e6bgr-1.png" width="60%" style="display: block; margin: auto;" /> ) --- <code class ='r hljs remark-code'># First load the package<br>library(tidyverse)<br><br># Now create the figure<br>ggplot(data = msleep, aes(x = sleep_total, y = sleep_rem, colour = vore)) +<br> geom_point()</code> ``` ## Warning: Removed 22 rows containing missing values (geom_point). ``` <img src="1 Introduction to R_files/figure-html/first_code-flaired-xilb4s0-1.png" width="60%" style="display: block; margin: auto;" />  --- <code class ='r hljs remark-code'># First load the package<br>library(tidyverse)<br><br># Now create the figure<br>ggplot(data = msleep, aes(x = sleep_total, y = sleep_rem, colour = vore)) +<br> geom_point()</code> ``` ## Warning: Removed 22 rows containing missing values (geom_point). ``` <img src="1 Introduction to R_files/figure-html/first_code-flaired-xm4ygyn-1.png" width="60%" style="display: block; margin: auto;" />  --- <code class ='r hljs remark-code'><span style="background-color:#ffff7f"># First load the package</span><br>library(tidyverse)<br><br><span style="background-color:#ffff7f"># Now create the figure</span><br>ggplot(data = msleep, aes(x = sleep_total, y = sleep_rem, colour = vore)) +<br> geom_point()</code> ``` ## Warning: Removed 22 rows containing missing values (geom_point). ``` <img src="1 Introduction to R_files/figure-html/first_code-flaired-alp2x00-1.png" width="60%" style="display: block; margin: auto;" />  --- layout:false class: section # R Basics: Objects Objects are *things* in the environment (Check out the **Environment** pane in RStudio) --- # `functions()` ## Do things, Return things ### Does something but returns nothing e.g., `write_csv()` - Saves the `mtcars` data frame as a csv file ```r write_csv(mtcars, path = "mtcars.csv") ``` .spacer[ ] ### Does something and returns something e.g., `sd()` - returns the standard deviation of a vector ```r sd(c(4, 10, 21, 55)) ``` ``` ## [1] 22.78157 ``` --- class: split-50 # `functions()` - Functions can take **arguments** (think 'options') - `data`, `x`, `y`, `colour` <code class ='r hljs remark-code'>ggplot(<span style="background-color:#ffff7f">data</span> = msleep, aes(<span style="background-color:#ffff7f">x</span> = sleep_total, <span style="background-color:#ffff7f">y</span> = sleep_rem, <span style="background-color:#ffff7f">colour</span> = vore)) +<br> geom_point()</code> -- ![:spacer 15px]() - Arguments defined by **name** or by **position** - With correct position, do not need to specify by name ![:spacer 10px]() .columnl[ ### By name: <code class ='r hljs remark-code'>mean(<span style="background-color:#ffff7f">x = </span>c(1, 5, 10))</code> ``` ## [1] 5.333333 ``` ] .columnr[ ### By order: <code class ='r hljs remark-code'>mean(c(1, 5, 10))</code> ``` ## [1] 5.333333 ``` ] -- ![:spacer 70px]() > Note that `c()` is also a function: combine or concatenate --- class: split-40 # `functions()` ## Watch out for 'hidden' arguments .columnl[ ### By name: ```r mean(x = c(1, 5, 10, NA), na.rm = TRUE) ``` ``` ## [1] 5.333333 ``` ] -- .columnr[ ### By order: ```r mean(c(1, 5, 10, NA), TRUE) ``` ``` ## Error in mean.default(c(1, 5, 10, NA), TRUE): 'trim' must be numeric of length one ``` ] -- ![:spacer 130px]() .center[This error states that we've assigned the argument `trim` to a non-valid argument] .spacer[ ] .center[Where did **`trim`** come from?] --- # R documentation ```r ?mean ``` ![:spacer 25px]() .center[**Your Turn:** Run this, what happens? Do you see the `trim` argument?] --  --- class: split-40 # Data Generally kept in `vectors` or `data.frames`/`tibbles` - These are objects with names (like functions) - We can use `<-` to assign values to objects (assignment) ![:spacer 10px]() .columnl[ ## Vector (1 dimension) ```r a <- c("a", "b", "c") a ``` ``` ## [1] "a" "b" "c" ``` ] .columnr[ ## Data frame (2 dimensions) ```r d <- data.frame(letters = c("a", "b", "c"), numbers = c(1, 2, 3), treat = c("control", "control", "control")) d ``` ``` ## letters numbers treat ## 1 a 1 control ## 2 b 2 control ## 3 c 3 control ``` ]  --- class: space # Vectors ### Use `c()` to create a vector ```r a <- c("apples", 12, "bananas") ``` ### Use `x[index]` to access part of a vector ```r a[3] # [1] "bananas" ``` ### Vectors contain one type of variable (Even if you try to make it with more) ```r class(a) # [1] "character" ``` --- class: split-50 # Data frames (also tibbles) .columnl[ ### Create with `data.frame()`/`tibble()` ```r my_data <- tibble(x = c("s1", "s2", "s3", "s4"), y = c(101, 102, 103, 104), z = c("a", "b", "c", "d")) my_data ``` ``` ## # A tibble: 4 × 3 ## x y z ## <chr> <dbl> <chr> ## 1 s1 101 a ## 2 s2 102 b ## 3 s3 103 c ## 4 s4 104 d ``` .small[(`dbl` = "Double" = Computer talk for non-integer number)] ] -- .columnr[ ### Cols have different types of variables ```r str(my_data) ``` ``` ## tibble [4 × 3] (S3: tbl_df/tbl/data.frame) ## $ x: chr [1:4] "s1" "s2" "s3" "s4" ## $ y: num [1:4] 101 102 103 104 ## $ z: chr [1:4] "a" "b" "c" "d" ``` ] --- class: split-50 # Data frames (also tibbles) .columnl[ ### `x$colname` to pull out column ```r my_data$x ``` ``` ## [1] "s1" "s2" "s3" "s4" ``` ### Or use `pull()` .small[(from `tidyverse`)] ```r pull(my_data, x) ``` ``` ## [1] "s1" "s2" "s3" "s4" ``` ] -- .columnr[ `x[row, col]` to access rows and columns of a data frame ```r my_data[1:2, 2:3] ``` ``` ## # A tibble: 2 × 2 ## y z ## <dbl> <chr> ## 1 101 a ## 2 102 b ``` ] --- class: split-40 # Your Turn: Vectors and Data frames ### 1) Create a vector with 5 numbers and look at it - Find it in the "Global Environment" pane (upper right) - Type its name in the console and hit enter <code class ='r hljs remark-code'><span style="background-color:#ffff7f"> </span> <- c(<span style="background-color:#ffff7f"> </span>, <span style="background-color:#ffff7f"> </span>, <span style="background-color:#ffff7f"> </span>, <span style="background-color:#ffff7f"> </span>, <span style="background-color:#ffff7f"> </span>)<br><span style="background-color:#ffff7f"> </span></code> ![:spacer 15px]() ### 2) Create a data frame with `data.frame()` or `tibble()` - Click on it's name in the "Global Environment" - Type its name in the console and hit enter <code class ='r hljs remark-code'><span style="background-color:#ffff7f"> </span> <- <span style="background-color:#ffff7f"> </span>(<span style="background-color:#ffff7f"> </span> = c("<span style="background-color:#ffff7f"> </span>", "<span style="background-color:#ffff7f"> </span>", "<span style="background-color:#ffff7f"> </span>"),<br> <span style="background-color:#ffff7f"> </span> = c(<span style="background-color:#ffff7f"> </span>, <span style="background-color:#ffff7f"> </span>, <span style="background-color:#ffff7f"> </span>))<br><span style="background-color:#ffff7f"> </span></code> --- exclude: TRUE class: split-40 # Your Turn: Vectors and Data frames ### 1) Create a vector with 5 numbers and look at it - Find it in the "Global Environment" pane (upper right) - Type its name in the console and hit enter <code class ='r hljs remark-code'>wings <- c(10, 42, 18, 12, 54)<br>wings</code> ![:spacer 15px]() ### 2) Create a data frame with `data.frame()` or `tibble()` - Click on its name in the "Global Environment" - Type its name in the console and hit enter <code class ='r hljs remark-code'>sites <- data.frame(site = c("A1", "A2", "A3"),<br> vals = c(10, 51, 92))<br>sites</code> --- class: section # Miscellaneous --- # R has spelling and punctuation - R cares about spelling - R is also case sensitive! (`Apple` is not the same as `apple`) - Commas are used to separate arguments in functions ## For example This is correct: ```r mean(c(5, 7, 10)) # [1] 7.333333 ``` This is **not** correct: ```r mean(c(5 7 10)) ``` ``` ## Error: <text>:1:10: unexpected numeric constant ## 1: mean(c(5 7 ## ^ ``` -- .box-r[\>80% of learning R is learning to **troubleshoot**] --- class: space # R has spelling and punctuation ### Spaces usually don't matter unless they change meanings ```r 5>=6 # [1] FALSE 5 >=6 # [1] FALSE 5 >= 6 # [1] FALSE 5 > = 6 # Error: unexpected '=' in "5 > =" ``` ### Periods don't matter either, but can be used in the same way as letters .small[(But for complex programming reasons... don't)] ```r apple.oranges <- "fruit" ``` --- class: space # Assignments and Equal signs ### Use `<-` to assign values to objects ```r a <- "hello" ``` ### Use `=` to set function arguments ```r mean(x = c(4, 9, 10)) ``` ### Use `==` to determine equivalence (logical) ```r 10 == 10 # [1] TRUE 10 == 9 # [1] FALSE ``` --- layout:true # Braces/Brackets --- ## Round brackets: `()` - Run functions (even if there are no arguments) ```r Sys.Date() # Get the Current Date ``` ``` ## [1] "2021-09-27" ``` -- - Without the `()`, R spits out information on the function: ```r Sys.Date ``` ``` ## function () ## as.Date(as.POSIXlt(Sys.time())) ## <bytecode: 0x55b94f5be1a0> ## <environment: namespace:base> ``` -- .box-br[ `()` must be associated with a **function** .small[(Well, _almost_ always)] ] --- ## Square brackets: `[]` - Extract parts of objects ```r LETTERS ``` ``` ## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" ## [20] "T" "U" "V" "W" "X" "Y" "Z" ``` ```r LETTERS[1] ``` ``` ## [1] "A" ``` ```r LETTERS[26] ``` ``` ## [1] "Z" ``` -- .box-br[ `[]` have to be associated with an **object** that has dimensions .small[(Always)] ] --- layout: false # Improving code readability ### Use spaces like you would in sentences: ```r a <- mean(c(4, 10, 13)) ``` is easier to read than ```r a<-mean(c(4,10,13)) ``` (But they are equivalent, coding-wise) --- # Improving code readability ### Don't be afraid to use line breaks ('Enters') to make the code more readable ```r a <- data.frame(exp = c("A", "B", "A", "B", "A", "B"), sub = c("A1", "A1", "A2", "A2", "A3", "A3"), res = c(10, 12, 45, 12, 12, 13)) ``` vs. ```r a <- data.frame(exp = c("A", "B", "A", "B", "A", "B"), sub = c("A1", "A1", "A2", "A2", "A3", "A3"), res = c(10, 12, 45, 12, 12, 13)) ``` --- class: section layout: false # Reproducible research --- # What is reproducible research? ## Remembering what you've done (and sharing) - Keep scripts - Annotate scripts (use comments) - Date scripts! - Compile scripts into reports or notebooks - Include version information - `devtools::session_info()` .box-b[We can use the "Compile Report" button in RStudio to create an HTML report of your work] --- class: section layout: false # tidyverse? --- # R base vs. tidyverse ## R base - R base is basic R - Most packages used are installed and loaded by default -- ## `tidyverse` - Collection of 'new' packages developed by a team closely affiliated with RStudio - Packages designed to work well together - Use a slightly different syntax - Among others, includes packages used for data transformations and visualizations: - e.g., `ggplot2`, `dplyr`, `tidyr` -- > Can be helpful to understand whether functions are `tidyverse` or R base functions --- # Wrapping up: Further reading - http://www.cookbook-r.com - [R for Data Science](http://r4ds.had.co.nz) - [R base cheatsheet](https://www.rstudio.com/wp-content/uploads/2016/05/base-r.pdf)