Random Forest Result: Test Dataset

Random Forest Result: Test DatasetΒΆ

In this section, we will see the performance of our trained random forest model using the test dataset.

from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.metrics import classification_report
import numpy as np 
import pandas as pd 
import pickle
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 from sklearn.ensemble import RandomForestClassifier
      2 import matplotlib.pyplot as plt
      3 from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

ModuleNotFoundError: No module named 'sklearn'
feature_matrix_test = pd.read_csv("./data/final_feature_matrix_test.csv", index_col = 0)
X = feature_matrix_test.drop("fraudulent", axis = 1).values
y = feature_matrix_test.fraudulent.values
with open('./pickle/rf_model.pkl', 'rb') as f:
    rf_model = pickle.load(f)
y_predict =rf_model.predict(X)
cm = confusion_matrix(y, y_predict, labels=rf_model.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=rf_model.classes_)
disp.plot()
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1e3df417dc0>
_images/RFTest_6_1.png
print(classification_report(y, y_predict))
              precision    recall  f1-score   support

           0       0.98      1.00      0.99      3403
           1       0.96      0.64      0.77       173

    accuracy                           0.98      3576
   macro avg       0.97      0.82      0.88      3576
weighted avg       0.98      0.98      0.98      3576

The performance of the random forest model on the test set seems adequate. Compared to the previous work, we see an improvement in accuracy (97.4% to 98.1%) but a drawback in recall rate (67% to 64%).