16 Step 5: Compare Model Outcomes

So far, we know how the models worked on the training data, but we haven’t fed our models any of our held-out testing data. To compare how each of the models did against each other we need to look at how each model does on new unseen data. We will do this by looking at prediction accuracy and (poorly named!) confusion matrices. A confusion matrix is far from confusing: it is simply a 4 by 4 table showing the false positive, false negative, true positive, and true negative rates that were produced by a specific model:

Now that we know how to read a confusion matrix, let’s make the testing data frames we need to feed into the models.

#Research Question #1
#extract only the two columns from the large pre-processed testing dataset we made in step 1 that are needed for the research question: training score and the ostensive binary outcome variable
testing_data_RQ_1 <- testing_data |> 
  select(training_score, ostensive_binary) |> 
      mutate(training_score = as.numeric(training_score),
         ostensive_binary = as.factor(ostensive_binary))

#Research Question #2
#extract only the two columns from the large pre-processed dataset that are needed for the research question: training score and the nonostensive binary outcome variable
testing_data_RQ_2 <- testing_data |> 
  select(training_score, nonostensive_binary) |> 
    mutate(training_score = as.numeric(training_score),
         nonostensive_binary = as.factor(nonostensive_binary))

#Research Question #3
#take out the two outcome variables that we don't want to use in this analysis but leave every other predictor
testing_data_RQ_3_factor <- testing_data |> 
  select(c(sex:miscellaneous_score, nonos_best)) |> 
mutate(across(c(sex, desexed, purebred, gaze_follow, nonos_best), as.factor))

Now that we have our testing data for each of the three research questions, we can use the two functions in mlr that help us look at model outcomes: predict() and performance(). predict() lets us feed a model and new data into the function to get the predicted classification for each observation in the testing data. You then feed the object you made with predict() into performance() to get the confusion matrix information. Cool, right? Let’s do it!

#predicting new values and getting performance metrics for all 3 models run with RQ 1
#KNN model
knn_predictions_RQ1 <- predict(KNN_model_RQ_1, newdata = testing_data_RQ_1)

performance(knn_predictions_RQ1)

     mmce 
0.2553191

calculateConfusionMatrix(knn_predictions_RQ1)

        predicted
true       0 1 -err.-
  0      105 0      0
  1       36 0     36
  -err.-  36 0     36

The KNN model is classifying 75% of the cases correctly, which sounds okay. However, when we look further into the confusion matrix, we can see that the algorithm is classifying every observation as 0 (i.e., that almost every dog was performing below chance at the ostensive task, which we know isn’t correct). So, this algorithm is probably not a good one to use when understanding the true relationship between the predictor and variable.

#Decision Tree
decision_tree_predictions_RQ1 <- predict(decision_tree_model_RQ_1, newdata = testing_data_RQ_1)

performance(decision_tree_predictions_RQ1)

     mmce 
0.2553191

calculateConfusionMatrix(decision_tree_predictions_RQ1)

        predicted
true       0 1 -err.-
  0      105 0      0
  1       36 0     36
  -err.-  36 0     36

The decision tree model did just as poorly as the simplest model as it predicted every observation would be 0. Again, not very useable.

#SVM
SVM_predictions_RQ_1 <- predict(SVM_model_RQ_1, newdata = testing_data_RQ_1)

performance(SVM_predictions_RQ_1)

    mmce 
0.248227

calculateConfusionMatrix(SVM_predictions_RQ_1)

        predicted
true       0 1 -err.-
  0      105 0      0
  1       35 1     35
  -err.-  35 0     35

The SVM algorithm has the same results as our decision tree except for 1 data point even though it took 10 times as long to train. Again, not very useable.

Now let’s do the same thing for research questions 2.

#predicting new values and getting performance metrics for all 3 models run with RQ 2
#KNN model
knn_predictions_RQ2 <- predict(KNN_model_RQ_2, newdata = testing_data_RQ_2)

performance(knn_predictions_RQ2)

     mmce 
0.3546099

calculateConfusionMatrix(knn_predictions_RQ2)

        predicted
true      0 1 -err.-
  0      91 0      0
  1      50 0     50
  -err.- 50 0     50

This KNN model had the same issue in that it predicted everything would be 0. (These models really don’t think much of our dog’s intelligence!)

#Decision Tree
decision_tree_predictions_RQ2 <- predict(decision_tree_model_RQ_2, newdata = testing_data_RQ_2)

performance(decision_tree_predictions_RQ2)

     mmce 
0.3546099

calculateConfusionMatrix(decision_tree_predictions_RQ2)

        predicted
true      0 1 -err.-
  0      91 0      0
  1      50 0     50
  -err.- 50 0     50

This decision tree did just as bad as the KNN.

#SVM
SVM_predictions_RQ2 <- predict(SVM_model_RQ_2, newdata = testing_data_RQ_2)

performance(SVM_predictions_RQ2)

     mmce 
0.3546099

calculateConfusionMatrix(SVM_predictions_RQ2)

        predicted
true      0 1 -err.-
  0      91 0      0
  1      50 0     50
  -err.- 50 0     50

This algorithm had similar issues to all our other models.

Now let’s do the same thing for research question 3!

#predicting new values and getting performance metrics for all 3 models run with RQ 2
#KNN model
knn_predictions_RQ3 <- predict(KNN_model_RQ_3, newdata = testing_data_RQ_3_factor)

performance(knn_predictions_RQ3)

     mmce 
0.3049645

#Dig deeper into predictions with a confusion matrix
calculateConfusionMatrix(knn_predictions_RQ3)

        predicted
true      0 1 -err.-
  0      98 0      0
  1      43 0     43
  -err.- 43 0     43

Again, the algorithm is predicted that everything will be 0 (i.e., that all dogs will perform better at the ostensive task than the nonostensive). This is also not a very helpful algorithm to use in the future.

#Random Forest
randomforest_predictions_RQ3 <- predict(random_forest_model_RQ_3, newdata = testing_data_RQ_3_factor)

#Measure how well the model did at predictions
performance(randomforest_predictions_RQ3)

     mmce 
0.3049645

#Dig deeper into predictions with a confusion matrix
calculateConfusionMatrix(randomforest_predictions_RQ3)

        predicted
true      0 1 -err.-
  0      98 0      0
  1      43 0     43
  -err.- 43 0     43

The random forest algorithm has the exact same results as the KNN.