Random Forest
Random Forest¶
import pandas as pd
import numpy as np
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import pandas as pd
2 import numpy as np
ModuleNotFoundError: No module named 'pandas'
text_features = pd.read_csv("./data/final_feature_matrix.csv", index_col = 0)
text_features
administr_desc | answer_desc | assist_desc | bill_desc | call_desc | cash_desc | desir_desc | duti_desc | earn_desc | entri_desc | ... | industry_Accounting | industry_Leisure, Travel & Tourism | industry_NAN | industry_Oil & Energy | company_profile | telecommuting | has_company_logo | has_questions | required_education | fraudulent | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.092456 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 1 | 0 | 0 | 0 |
1 | 0.045662 | 0.0 | 0.034465 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 1 | 0 | 0 | 0 |
2 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 1 | 1 | 1 | 0 |
3 | 0.000000 | 0.0 | 0.047975 | 0.0 | 0.000000 | 0.085044 | 0.0 | 0.051481 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 1 | 0 | 1 | 0 |
4 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.053905 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 1 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
14299 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 1 | 1 | 0 | 0 |
14300 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 1.0 | 0.0 | 0 | 0 | 1 | 1 | 1 | 0 |
14301 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 1 | 0 | 0 | 0 |
14302 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.120147 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 1 | 0 | 0 | 0 |
14303 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 1 | 0 | 0 | 0 | 1 | 0 |
14304 rows × 86 columns
Since we finished with feature engineering, we are finally moving on to actual machine learning analysis. In this chapter, we will use the random forest model and try to make the best prediction on the test dataset. We will thoroughly go over all steps, from tuning the hyperparameters to fitting the algorithm to the test dataset.