Beginners Guide to Machine Learning Binary Classification Problems in R

Author

London Wolff

Published

2024-10-31

1 Overview & Scope of Tutorial

The purpose of this tutorial is to give a brief example of how to use machine learning classification techniques on an example dataset from the animal behavior field. In this tutorial I will cover 1) the five main types of classification algorithms and when to use them; 2) how to set up and clean data to use in classification machine learning, and 3) how to run models and compare the outcomes. I propose three different questions that could be answered with machine learning classification to showcase how the types of predictor and outcome variables you are working with impacts what models to run and with what hyperparameters. (Throughout the tutorial, you’ll see highlighted vocab terms like hyperparameters - click to see the linked content and then click the back arrow to return). The data used in this tutorial is from a team science open access dataset from the Many Dogs Project, and the number and type of variables makes it a good example dataset to use with classification machine learning. For a more extensive description of the dataset please see the data description section. A beginners level of knowledge in using R, Rstudio, and GitHub is recommended.

This tutorial is not meant to be an exhaustive list of what can be done with machine learning. I only cover supervised learning for binary classification problems. I do not cover any inductive processes like deep learning (aka neural nets), semi-supervised learning, unsupervised learning, or how to deal with continuous dependent variables. See links within text for more detail on these applications. This tutorial also assumes you have a basic understanding of simple statistical concepts like distributions and categorical vs. continuous variables, and some exposure to basic models like linear regression. If the terms in that last sentence weren’t familiar to you, check out the resources at PsyTeachR for an introduction to statistics.

1.0.1 Attribution and Licensing

This guide is licensed under a Creative Commons Attribution - NonCommercial 4.0 International License. Please share and distribute this tutorial to people who would find it useful! This guide should not be used for commercial purposes and you must give appropriate credit and indicate what changes were made if any.