Random Forest

Random Forest

import pandas as pd
import numpy as np
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import pandas as pd
      2 import numpy as np

ModuleNotFoundError: No module named 'pandas'
text_features = pd.read_csv("./data/final_feature_matrix.csv", index_col = 0)
text_features
administr_desc answer_desc assist_desc bill_desc call_desc cash_desc desir_desc duti_desc earn_desc entri_desc ... industry_Accounting industry_Leisure, Travel & Tourism industry_NAN industry_Oil & Energy company_profile telecommuting has_company_logo has_questions required_education fraudulent
0 0.000000 0.0 0.000000 0.0 0.092456 0.000000 0.0 0.000000 0.000000 0.0 ... 0.0 0.0 0.0 0.0 0 0 1 0 0 0
1 0.045662 0.0 0.034465 0.0 0.000000 0.000000 0.0 0.000000 0.000000 0.0 ... 0.0 0.0 0.0 0.0 0 0 1 0 0 0
2 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.000000 0.000000 0.0 ... 0.0 0.0 0.0 0.0 0 0 1 1 1 0
3 0.000000 0.0 0.047975 0.0 0.000000 0.085044 0.0 0.051481 0.000000 0.0 ... 0.0 0.0 0.0 0.0 0 0 1 0 1 0
4 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.000000 0.053905 0.0 ... 0.0 0.0 0.0 0.0 0 0 1 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
14299 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.000000 0.000000 0.0 ... 0.0 0.0 0.0 0.0 0 0 1 1 0 0
14300 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.000000 0.000000 0.0 ... 0.0 0.0 1.0 0.0 0 0 1 1 1 0
14301 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.000000 0.000000 0.0 ... 0.0 0.0 0.0 0.0 0 0 1 0 0 0
14302 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.120147 0.000000 0.0 ... 0.0 0.0 0.0 0.0 0 0 1 0 0 0
14303 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.000000 0.000000 0.0 ... 0.0 0.0 0.0 0.0 1 0 0 0 1 0

14304 rows × 86 columns

Since we finished with feature engineering, we are finally moving on to actual machine learning analysis. In this chapter, we will use the random forest model and try to make the best prediction on the test dataset. We will thoroughly go over all steps, from tuning the hyperparameters to fitting the algorithm to the test dataset.