Random Forest¶

import pandas as pd
import numpy as np

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import pandas as pd
      2 import numpy as np

ModuleNotFoundError: No module named 'pandas'

text_features = pd.read_csv("./data/final_feature_matrix.csv", index_col = 0)

text_features

	administr_desc	answer_desc	assist_desc	bill_desc	call_desc	cash_desc	desir_desc	duti_desc	earn_desc	entri_desc	...	industry_Accounting	industry_Leisure, Travel & Tourism	industry_NAN	industry_Oil & Energy	company_profile	telecommuting	has_company_logo	has_questions	required_education	fraudulent
0	0.000000	0.0	0.000000	0.0	0.092456	0.000000	0.0	0.000000	0.000000	0.0	...	0.0	0.0	0.0	0.0	0	0	1	0	0	0
1	0.045662	0.0	0.034465	0.0	0.000000	0.000000	0.0	0.000000	0.000000	0.0	...	0.0	0.0	0.0	0.0	0	0	1	0	0	0
2	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0	0.000000	0.000000	0.0	...	0.0	0.0	0.0	0.0	0	0	1	1	1	0
3	0.000000	0.0	0.047975	0.0	0.000000	0.085044	0.0	0.051481	0.000000	0.0	...	0.0	0.0	0.0	0.0	0	0	1	0	1	0
4	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0	0.000000	0.053905	0.0	...	0.0	0.0	0.0	0.0	0	0	1	0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
14299	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0	0.000000	0.000000	0.0	...	0.0	0.0	0.0	0.0	0	0	1	1	0	0
14300	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0	0.000000	0.000000	0.0	...	0.0	0.0	1.0	0.0	0	0	1	1	1	0
14301	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0	0.000000	0.000000	0.0	...	0.0	0.0	0.0	0.0	0	0	1	0	0	0
14302	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0	0.120147	0.000000	0.0	...	0.0	0.0	0.0	0.0	0	0	1	0	0	0
14303	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0	0.000000	0.000000	0.0	...	0.0	0.0	0.0	0.0	1	0	0	0	1	0

14304 rows × 86 columns

Since we finished with feature engineering, we are finally moving on to actual machine learning analysis. In this chapter, we will use the random forest model and try to make the best prediction on the test dataset. We will thoroughly go over all steps, from tuning the hyperparameters to fitting the algorithm to the test dataset.

Classifying Fake Job Posting Using Machine Learning Algorithm

Random Forest

Random Forest¶