5  Set Up your R Document

Make sure you have the latest versions of R and Rstudio downloaded and open. You will then need to create a new project in R, name it, and then open an R document, either a script or a markdown file to begin following along with the tutorial.

The following libraries (often called dependencies) are needed to run this tutorial in R. Before you code anything, make sure you have the following packages installed and updated. The below code should be copied and pasted to your R document before any other code is written as you need the tools in these libraries/dependencies to complete the tutorial. Throughout the code chunks in this tutorial I have written comments that tell you what each step in the data process does.

library(tidyverse) # for data wrangling
library(knitr) # for showing you pretty pictures and making tables
library(here) # for keeping track of where to get files needed for this tutorial
library(janitor) # for examining and cleaning data
library(ggpubr) # to make Q-Q plots for testing normality
library(car) # for making component plus resistance (CR) plots to check for linearity
library(mlr) # for doing machine learning in R!
library(parallelMap) # for running code using all of the available processing power on your computer
library(parallel) # for running code using all of the available processing power on your computer
library(randomForest) # for running the random forest algorithm

5.0.1 Open and View Data

Once you have downloaded the data files from the GitHub repo and loaded all the necessary dependencies make sure the data file manydogs_etal_2024_data is in the same file path as the R file you are working in. When they are in the same file path you can use the following code to read the data into your R file and begin working!

manydogs_data <- read.csv("manydogs_etal_2024_data.csv") # Read in the data file I will be using

head(colnames(manydogs_data), n = 20) #show first 20 names of each column in the data
 [1] "date"                     "site"                    
 [3] "subject_id"               "experiment_status"       
 [5] "owned_status"             "birthdate"               
 [7] "sex"                      "age"                     
 [9] "desexed"                  "purebred"                
[11] "breed"                    "breed_registry"          
[13] "mixed_breed"              "communication_method"    
[15] "gesture_frequency"        "gaze_follow"             
[17] "training_type"            "training_freq_puppy"     
[19] "training_freq_neighbor"   "training_freq_obedience1"