2  Support Vector Machines

The support vector machine (SVM) algorithm estimates hyper-planes to separate our response species. In the following we use the ‘e1071’ package which supports a variety of different SVM algorithms (Meyer et al. (2022)) (Python: ‘scikit-learn’ (Pedregosa et al. (2011)), Julia: ‘MLJ’ (Blaom et al. (2019))).

2.1 Classification

library(e1071)
X = scale(iris[,1:4])
Y = iris$Species

sv = svm(X, Y, probability = TRUE) 
summary(sv)

Call:
svm.default(x = X, y = Y, probability = TRUE)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 

Number of Support Vectors:  51

 ( 8 22 21 )


Number of Classes:  3 

Levels: 
 setosa versicolor virginica

Make predictions (class probabilities):

head(attr(predict(sv, newdata = X, probability = TRUE), "probabilities"), n = 3)
     setosa versicolor  virginica
1 0.9791731 0.01135581 0.00947110
2 0.9716762 0.01816135 0.01016248
3 0.9777791 0.01198490 0.01023600
from sklearn import svm
from sklearn import datasets
from sklearn.preprocessing import scale
iris = datasets.load_iris()
X = scale(iris.data)
Y = iris.target

model = svm.SVC(probability=True).fit(X, Y)

# Make predictions (class probabilities):

model.predict_proba(X)[0:10,:]
array([[0.97956765, 0.01168732, 0.00874504],
       [0.97215052, 0.01844973, 0.00939975],
       [0.9783134 , 0.01226308, 0.00942351],
       [0.9742125 , 0.01567632, 0.01011118],
       [0.97870322, 0.01206444, 0.00923234],
       [0.97312428, 0.01729716, 0.00957855],
       [0.97486896, 0.01395157, 0.01117947],
       [0.97946381, 0.01179526, 0.00874092],
       [0.96530784, 0.02294644, 0.01174573],
       [0.97603545, 0.01443107, 0.00953347]])
import StatsBase;
using MLJ;
SVM_classifier = @load NuSVC pkg=LIBSVM;
using RDatasets;
using StatsBase;
using DataFrames;
iris = dataset("datasets", "iris");
X = mapcols(StatsBase.zscore, iris[:, 1:4]);
Y = iris[:, 5];

Models:

model = fit!(machine(SVM_classifier(), X, Y))
trained Machine; caches model-specific representations of data
  model: NuSVC(kernel = RadialBasis, …)
  args: 
    1:  Source @424 ⏎ Table{AbstractVector{Continuous}}
    2:  Source @132 ⏎ AbstractVector{Multiclass{3}}

Predictions:

MLJ.predict(model, X)[1:5]
5-element CategoricalArrays.CategoricalArray{String,1,UInt8}:
 "setosa"
 "setosa"
 "setosa"
 "setosa"
 "setosa"

2.2 Regression

library(e1071)
X = scale(iris[,2:4])
Y = iris[,1]

sv = svm(X, Y) 
summary(sv)

Call:
svm.default(x = X, y = Y)


Parameters:
   SVM-Type:  eps-regression 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  0.3333333 
    epsilon:  0.1 


Number of Support Vectors:  124

Make predictions (class probabilities):

head(predict(sv, newdata = X), n = 3)
       1        2        3 
5.042085 4.711768 4.836291 
from sklearn import svm
from sklearn import datasets
from sklearn.preprocessing import scale
iris = datasets.load_iris()
data = iris.data
X = scale(data[:,1:4])
Y = data[:,0]

model = svm.SVR().fit(X, Y)

# Make predictions:

model.predict(X)[0:10]
array([5.03583855, 4.69496586, 4.81438855, 4.77951854, 5.10018373,
       5.29981857, 4.97308737, 4.98199033, 4.63701656, 4.78431078])
import StatsBase;
using MLJ;
SVM_regressor =  @load NuSVR pkg=LIBSVM;
using RDatasets;
using DataFrames;
iris = dataset("datasets", "iris");
X = mapcols(StatsBase.zscore, iris[:, 2:4]);
Y = iris[:, 1];

Model:

model = fit!(machine(SVM_regressor(), X, Y))
trained Machine; caches model-specific representations of data
  model: NuSVR(kernel = RadialBasis, …)
  args: 
    1:  Source @758 ⏎ Table{AbstractVector{Continuous}}
    2:  Source @893 ⏎ AbstractVector{Continuous}

Predictions:

MLJ.predict(model, X)[1:5]
5-element Vector{Float64}:
 5.058471741834634
 4.6717512552719604
 4.799641470830148
 4.75734816087994
 5.133728219775252