SVM Result: Train Dataset
SVM Result: Train DatasetΒΆ
In this section, we will fit the SVM to our training dataset and see its performance. We will use the tuned hyperparameters in the previous section.
from sklearn.svm import SVC
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.metrics import classification_report
import numpy as np
import pandas as pd
import pickle
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from sklearn.svm import SVC
2 import matplotlib.pyplot as plt
3 from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
ModuleNotFoundError: No module named 'sklearn'
feature_matrix_train = pd.read_csv("./data/final_feature_matrix.csv", index_col = 0)
X = feature_matrix_train.drop("fraudulent", axis = 1).values
y = feature_matrix_train.fraudulent.values
svm_model = SVC(C = 10,
gamma = 1,
kernel = 'rbf')
svm_model.fit(X, y)
SVC(C=10, gamma=1)
y_predict =svm_model.predict(X)
cm = confusion_matrix(y, y_predict, labels=svm_model.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=svm_model.classes_)
disp.plot()
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1cbadd350a0>

print(classification_report(y, y_predict))
precision recall f1-score support
0 0.99 1.00 0.99 13611
1 0.99 0.79 0.88 693
accuracy 0.99 14304
macro avg 0.99 0.89 0.94 14304
weighted avg 0.99 0.99 0.99 14304
The performance of the SVM on the train data looks fair. The SVM performs worse than the random forest model, especially in recall rate. However, the result implies that the SVM overfits less than the SVM in the previous work, so it is excellent that we fixed the overfitting in the previous work!
with open('./pickle/svm_model.pkl', 'wb') as f:
pickle.dump(svm_model, f)