We will use the twoClass dataset from *Applied Predictive Modeling*, the book of M. Kuhn and K. Johnson to illustrate the most classical supervised classification algorithms. We will use some *advanced* **R** packages: the **ggplot2** package for the figures and the **caret** package for the learning part. **caret** that provides an unified interface to many other packages. We will also use **h2o**, a package dedicated to in memory learning of large data set, for its *deep* learning algorithm.

```
library("plyr")
library("dplyr")
library("ggplot2")
library("gridExtra")
library("caret")
library("h2o")
library("doMC")
registerDoMC(cores = 3)
```

We read first the dataset and use **ggplot2** to display it.

```
library(AppliedPredictiveModeling)
library(RColorBrewer)
data(twoClassData)
twoClass=cbind(as.data.frame(predictors),classes)
twoClassColor <- brewer.pal(3,'Set1')[1:2]
names(twoClassColor) <- c('Class1','Class2')
ggplot(data = twoClass,aes(x = PredictorA, y = PredictorB)) +
geom_point(aes(color = classes), size = 6, alpha = .5) +
scale_colour_manual(name = 'classes', values = twoClassColor) +
scale_x_continuous(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0))
```

We create a few functions that will be useful to display our classifiers.

```
nbp <- 250;
PredA <- seq(min(twoClass$PredictorA), max(twoClass$PredictorA), length = nbp)
PredB <- seq(min(twoClass$PredictorB), max(twoClass$PredictorB), length = nbp)
Grid <- expand.grid(PredictorA = PredA, PredictorB = PredB)
PlotGrid <- function(pred,title) {
surf <- (ggplot(data = twoClass, aes(x = PredictorA, y = PredictorB,
color = classes)) +
geom_tile(data = cbind(Grid, classes = pred), aes(fill = classes)) +
scale_fill_manual(name = 'classes', values = twoClassColor) +
ggtitle("Decision region") + theme(legend.text = element_text(size = 10)) +
scale_colour_manual(name = 'classes', values = twoClassColor)) +
scale_x_continuous(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0))
pts <- (ggplot(data = twoClass, aes(x = PredictorA, y = PredictorB,
color = classes)) +
geom_contour(data = cbind(Grid, classes = pred), aes(z = as.numeric(classes)),
color = "red", level = 1.5) +
geom_point(size = 4, alpha = .5) +
ggtitle("Decision boundary") +
theme(legend.text = element_text(size = 10)) +
scale_colour_manual(name = 'classes', values = twoClassColor)) +
scale_x_continuous(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0))
grid.arrange(surf, pts, main = textGrob(title, gp = gpar(fontsize = 20)), ncol = 2)
}
```

As explained in the introduction, we will use **caret** for the learning part. This package provides a unified interface to a huge number of classifier available in **R**. It is a very powerful tool when exploring the different models. In particular, it proposes to compute a *resampling* accuracy estimate and gives the user the choice of the specific methodology. We will use a repeated V-fold strategy with \(10\) folds and \(2\) repetitions. We will reuse the same *seed* for each model in order to be sure that the same folds are used.

```
library("caret")
V <- 10
T <- 4
TrControl <- trainControl(method = "repeatedcv",
number = V,
repeats = T)
Seed <- 345
```

Finally, we provide a function that will store the accuracies for every resample and every model. We will also compute an accuracy based on the same data than the one used to learn in order to show the over-fitting phenomenon.

```
ErrsCaret <- function(Model, Name) {
Errs <- data.frame(t(postResample(predict(Model, newdata = twoClass), twoClass[["classes"]])),
Resample = "None", model = Name)
rbind(Errs, data.frame(Model$resample, model = Name))
}
Errs <- data.frame()
```

We are now ready to define a function that take in input the current collection of accuracies, a name of a model, the corresponding formula and methods, as well as more parameters used to specify the model, and computes the trained model, displays its prediction in a figure and add the errors in the collection.

```
CaretLearnAndDisplay <- function (Errs, Name, Formula, Method, ...) {
set.seed(Seed)
Model <- train(as.formula(Formula), data = twoClass, method = Method, trControl = TrControl, ...)
Pred <- predict(Model, newdata = Grid)
PlotGrid(Pred, Name)
Errs <- rbind(Errs, ErrsCaret(Model, Name))
}
```

We can apply this function to any model available in **caret**. We will pick a few models and sort them depending on the heuristic used to define them. We will distinguish models coming from a statistical point of view in which one try to estimate the conditional law and plug it into the Bayes classifier and from an optimization point of view in which one try to enforce a *small* training error by minimizing a relaxed criterion.

In the generative modeling approach, one propose to estimate the joint law of the covariates and the label and to derive the conditional law using the Bayes formula. The generative models differ by the choice of the density estimator (often specified by an intra class model). We consider here the LDA and QDA methods, in which a Gaussian model is used, and two variants of the Naive Bayes method, in which all the features are assumed to be independent.

`Errs <- CaretLearnAndDisplay(Errs, "Linear Discrimant Analysis", "classes ~ .", "lda");`

`Errs <- CaretLearnAndDisplay(Errs, "Quadratic Discrimant Analysis", "classes ~ . ", "qda");`

```
Errs <- CaretLearnAndDisplay(Errs, "Naive Bayes with Gaussian model", "classes ~ .", "nb",
tuneGrid = data.frame(usekernel = c(FALSE), fL = c(0)))
```

```
Errs <- CaretLearnAndDisplay(Errs, "Naive Bayes with kernel density estimates", "classes ~ .", "nb",
tuneGrid = data.frame(usekernel = c(TRUE), fL = c(0)))
```

The most classical model is probably the logistic model, which is the canonical example of a parametric conditional regression estimate.

`Errs <- CaretLearnAndDisplay(Errs, "Logistic", "classes ~ .", "glm")`

We are not restricted to the use of two features but may use any transformation of them. For instance, we may use all the possible monomials up to degree \(2\).

```
Errs <- CaretLearnAndDisplay(Errs, "Quadratic Logistic", "classes ~ PredictorA + PredictorB +
+ I(PredictorA^2) + I(PredictorB^2)
+ I(PredictorA*PredictorB)", "glm")
```

In the nearest neighbor method, we need to supply a parameter: \(k\) the number of neighbors used to define the kernel. We will compare visually the solution obtained with a few \(k\) values.

```
ErrsKNN <- data.frame()
KNNKS <- c(1, 5, 9, 13, 17, 21, 25, 29)
for (k in KNNKS) {
ErrsKNN <- CaretLearnAndDisplay(ErrsKNN, sprintf("k-NN with k=%i", k),
"classes ~ .","knn", tuneGrid = data.frame(k = c(k)))
}
```