Tuning the Hyperparameters of SVM

We have two parameters to tune for the SVM model: cost and gamma.

  1. Cost is a parameter that adjusts the level of soft margin.

  2. Gamma is the one that adjusts the level of curvature of our decision boundaries.

We used the radial kernel function since it appeared to work the best. In order to tune the parameters, we used 5-fold cross-validation, but since we needed to find the best combination of these two parameters, we used a grid-search algorithm to find the best combination.

We will perform 5 fold Cross Validation twice - one for accuracy and another one for area under the ROC - so that we can find the combination that will give us the most balanced result.

5 Folds Cross Validation For Accuracy Using Grid Search Algorithm

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
import numpy as np 
import pandas as pd 
import pickle
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 from sklearn.model_selection import GridSearchCV
      2 from sklearn.svm import SVC
      3 import numpy as np 

ModuleNotFoundError: No module named 'sklearn'
feature_matrix_train = pd.read_csv("./data/final_feature_matrix.csv", index_col = 0)
X = feature_matrix_train.drop("fraudulent", axis = 1).values
y = feature_matrix_train.fraudulent.values
param_grid_acc = {'C': [0.01, 0.1, 1, 10, 100], 
              'gamma': [10, 1, 0.1, 0.01, 0.001],
              'kernel': ['rbf','poly']} 

grid_acc = GridSearchCV(SVC(), param_grid_acc)
grid_acc.fit(X, y)
GridSearchCV(estimator=SVC(),
             param_grid={'C': [0.01, 0.1, 1, 10, 100],
                         'gamma': [10, 1, 0.1, 0.01, 0.001],
                         'kernel': ['rbf', 'poly']})
dictionary_acc = grid_acc.cv_results_
print(grid_acc.best_params_)
{'C': 10, 'gamma': 1, 'kernel': 'rbf'}

5 Folds Cross Validation For ROC AUC Using Grid Search Algorithm

param_grid_auc = {'C': [0.01, 0.1, 1, 10, 100], 
              'gamma': [10, 1, 0.1, 0.01, 0.001],
              'kernel': ['rbf', 'poly']} 

grid_auc = GridSearchCV(SVC(), param_grid_auc, scoring = "roc_auc")
grid_auc.fit(X,y)
GridSearchCV(estimator=SVC(),
             param_grid={'C': [0.01, 0.1, 1, 10, 100],
                         'gamma': [10, 1, 0.1, 0.01, 0.001],
                         'kernel': ['rbf', 'poly']},
             scoring='roc_auc')
dictionary_auc = grid_auc.cv_results_
print(grid_auc.best_params_)
{'C': 1, 'gamma': 1, 'kernel': 'rbf'}

Result Summary

result_df = pd.DataFrame({"C/gamma/kernel" : dictionary_acc['params'],
                          "Accuracy" : dictionary_acc['mean_test_score'],
                         "ROC_AUC": dictionary_auc['mean_test_score']})
result_df.sort_values("Accuracy", ascending = False).reset_index(drop = True)[0:10]
C/gamma/kernel Accuracy ROC_AUC
0 {'C': 10, 'gamma': 1, 'kernel': 'rbf'} 0.973434 0.932281
1 {'C': 100, 'gamma': 0.1, 'kernel': 'rbf'} 0.973294 0.926156
2 {'C': 100, 'gamma': 1, 'kernel': 'rbf'} 0.971966 0.925867
3 {'C': 1, 'gamma': 1, 'kernel': 'poly'} 0.971756 0.918377
4 {'C': 0.1, 'gamma': 1, 'kernel': 'poly'} 0.971197 0.922597
5 {'C': 100, 'gamma': 0.1, 'kernel': 'poly'} 0.971197 0.922602
6 {'C': 10, 'gamma': 0.1, 'kernel': 'rbf'} 0.970638 0.926423
7 {'C': 1, 'gamma': 1, 'kernel': 'rbf'} 0.969869 0.933279
8 {'C': 10, 'gamma': 10, 'kernel': 'rbf'} 0.969729 0.924050
9 {'C': 100, 'gamma': 0.01, 'kernel': 'rbf'} 0.969379 0.913966

The combination of {C = 10, gamma = 1, kernel = ‘rbf’} gives the highest accuracy, while the combination of {C = 1, gamma =1, kernel = ‘rbf’} gives the highest AUC.

We will choose {C = 10, gamma = 1, kernel = ‘rbf’} as our final parameters for the SVM, since it has the most balanced score for both accuracy and ROC AUC.