4  Data Description

The data being used in this tutorial is from the ManyDogs Project, a large multi-lab collaboration where dogs from across the world all completed the same two behavioral tasks and information about the dog was collected via survey from owners. The two behavioral tasks were variations on a two-alternative object choice task. Simply put, dogs were asked to choose between two cups. In both tasks an unknown human pointed at the “correct” cup that had a treat under it. In one task, ostensive, pointing was accompanied with eye gazing and calling the dog’s name. In the other condition, non-ostensive, the unknown human simply pointed without other cues. There were also warm-up trials to familiarize the dog with the task as well as an odor control condition. I will not be using the odor control condition or warm up tasks in this tutorial but they are available in the dataset to explore later on as possible predictors of performance. The survey data collected included breed, age, sex of the dog, training frequency, and values from the CBARQ. The CBARQ is a widely used tool in the field of canine behavioral research, which assesses aspects of dog behavior, temperament, and personality. For a complete list of all variables in the dataset see the ManyDogs GitHub repository README file.

This dataset is a great example to use when investigating machine learning predictive classification models, as it has many possible predictors to investigate with a discrete binary dependent variable (i.e. whether the dog chose correctly). Furthermore, all the data from this project is available to anyone to share or adapt with attribution, which makes it ideal to use as a learning tool. To work through this tutorial with me, you will first need to download the data from the GitHub repository associated with the project and/or create a local clone of the project repo on your computer. I recommend using the GitHub desktop application to easily get and give information from/to a GitHub repository using a point and click method instead of the command line.