Random Forest Result: Test Dataset
Random Forest Result: Test DatasetΒΆ
In this section, we will see the performance of our trained random forest model using the test dataset.
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.metrics import classification_report
import numpy as np
import pandas as pd
import pickle
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from sklearn.ensemble import RandomForestClassifier
2 import matplotlib.pyplot as plt
3 from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
ModuleNotFoundError: No module named 'sklearn'
feature_matrix_test = pd.read_csv("./data/final_feature_matrix_test.csv", index_col = 0)
X = feature_matrix_test.drop("fraudulent", axis = 1).values
y = feature_matrix_test.fraudulent.values
with open('./pickle/rf_model.pkl', 'rb') as f:
rf_model = pickle.load(f)
y_predict =rf_model.predict(X)
cm = confusion_matrix(y, y_predict, labels=rf_model.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=rf_model.classes_)
disp.plot()
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1e3df417dc0>

print(classification_report(y, y_predict))
precision recall f1-score support
0 0.98 1.00 0.99 3403
1 0.96 0.64 0.77 173
accuracy 0.98 3576
macro avg 0.97 0.82 0.88 3576
weighted avg 0.98 0.98 0.98 3576
The performance of the random forest model on the test set seems adequate. Compared to the previous work, we see an improvement in accuracy (97.4% to 98.1%) but a drawback in recall rate (67% to 64%).