Summary
Summary¶
This is a summary of how we will preprocess each column in the dataset. You can find the complete coding of this preprocessing here.
import pandas as pd
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import pandas as pd
ModuleNotFoundError: No module named 'pandas'
train_data = pd.read_csv("./data/train_set.csv")
train_data.head()
Unnamed: 0.1 | Unnamed: 0 | job_id | title | location | department | salary_range | company_profile | description | requirements | benefits | telecommuting | has_company_logo | has_questions | employment_type | required_experience | required_education | industry | function | fraudulent | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 7530 | 7531 | Contact Center Representatives | US, VA, Virginia Beach | NaN | NaN | Tidewater Finance Co. was established in 1992 ... | tidewat financ compani locat virginia beach va... | The position requires the following qualificat... | Our company offers a competitive salary plus B... | 0 | 1 | 0 | Full-time | Entry level | Unspecified | Financial Services | Customer Service | 0 |
1 | 1 | 129 | 130 | Customer Service Associate | US, TX, Dallas | NaN | NaN | Novitex Enterprise Solutions, formerly Pitney ... | custom servic associ base dalla tx right candi... | QualificationsMinimum of 1 year customer servi... | NaN | 0 | 1 | 0 | Full-time | Entry level | High School or equivalent | Telecommunications | Customer Service | 0 |
2 | 2 | 4640 | 4641 | Automated Test Analyst | NZ, , Auckland | Permanent | NaN | SilverStripe CMS & Framework is an open so... | look dedic passion softwar test analyst team p... | NaN | NaN | 0 | 1 | 1 | Full-time | Mid-Senior level | NaN | Information Technology and Services | NaN | 0 |
3 | 3 | 402 | 403 | Inside Sales Professional-Omaha | US, NE, Omaha | NaN | NaN | ABC Supply Co., Inc. is the nation’s largest w... | sale repres provid assist custom purchas mater... | As a Sales Representative, you must have the a... | Your benefits package as a Sales Representativ... | 0 | 1 | 0 | Full-time | NaN | NaN | Building Materials | Sales | 0 |
4 | 4 | 13218 | 13219 | Content Marketing/SEO Manager | US, CA, Los Angeles | Marketing | NaN | MeUndies is a lifestyle brand that is transfor... | meundi lifestyl brand transform way peopl perc... | REQUIREMENTS/QUALIFICATIONS/PERSONAL ATTRIBUTE... | WHY MEUNDIES?We're a fast-growing, VC-backed c... | 0 | 1 | 0 | Full-time | Mid-Senior level | Bachelor's Degree | Internet | Marketing | 0 |
job_id
: Eliminated.title
,description
,requirements
,benefits
: Word Feature Extraction. Click here for more detailed explanation.location
: Only state will be extracted and used as features using OHE. Click here for more detailed information.department
,salary_range
: Eliminated. Click here for more detailed information.company_profile
,required_education
: Binarized as NA (1) and NON-NA (0). Click here for more detailed information.telecommuting
,has_company_logo
,has_questions
: Already binarized. No preprocessing needed.employment_type
,required_experience
,industry
,function
: One Hot Encoded. Click here for more detailed information.