{ "cells": [ { "cell_type": "markdown", "id": "eb9109f0-7eeb-47b7-95ef-8a4905a079c5", "metadata": {}, "source": [ "# Summary \n", "\n", "This is a summary of how we will preprocess each column in the dataset. You can find the complete coding of this preprocessing [here](Pipeline.ipynb). " ] }, { "cell_type": "code", "execution_count": 4, "id": "34a1e503-7b9f-4827-9b4d-8137801babc1", "metadata": { "tags": [ "hide-output" ] }, "outputs": [], "source": [ "import pandas as pd " ] }, { "cell_type": "code", "execution_count": 5, "id": "a883baed-f347-41b0-8fe7-0c214a609647", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0.1Unnamed: 0job_idtitlelocationdepartmentsalary_rangecompany_profiledescriptionrequirementsbenefitstelecommutinghas_company_logohas_questionsemployment_typerequired_experiencerequired_educationindustryfunctionfraudulent
0075307531Contact Center RepresentativesUS, VA, Virginia BeachNaNNaNTidewater Finance Co. was established in 1992 ...tidewat financ compani locat virginia beach va...The position requires the following qualificat...Our company offers a competitive salary plus B...010Full-timeEntry levelUnspecifiedFinancial ServicesCustomer Service0
11129130Customer Service AssociateUS, TX, DallasNaNNaNNovitex Enterprise Solutions, formerly Pitney ...custom servic associ base dalla tx right candi...QualificationsMinimum of 1 year customer servi...NaN010Full-timeEntry levelHigh School or equivalentTelecommunicationsCustomer Service0
2246404641Automated Test AnalystNZ, , AucklandPermanentNaNSilverStripe CMS & Framework is an open so...look dedic passion softwar test analyst team p...NaNNaN011Full-timeMid-Senior levelNaNInformation Technology and ServicesNaN0
33402403Inside Sales Professional-OmahaUS, NE, OmahaNaNNaNABC Supply Co., Inc. is the nation’s largest w...sale repres provid assist custom purchas mater...As a Sales Representative, you must have the a...Your benefits package as a Sales Representativ...010Full-timeNaNNaNBuilding MaterialsSales0
441321813219Content Marketing/SEO ManagerUS, CA, Los AngelesMarketingNaNMeUndies is a lifestyle brand that is transfor...meundi lifestyl brand transform way peopl perc...REQUIREMENTS/QUALIFICATIONS/PERSONAL ATTRIBUTE...WHY MEUNDIES?We're a fast-growing, VC-backed c...010Full-timeMid-Senior levelBachelor's DegreeInternetMarketing0
\n", "
" ], "text/plain": [ " Unnamed: 0.1 Unnamed: 0 job_id title \\\n", "0 0 7530 7531 Contact Center Representatives \n", "1 1 129 130 Customer Service Associate \n", "2 2 4640 4641 Automated Test Analyst \n", "3 3 402 403 Inside Sales Professional-Omaha \n", "4 4 13218 13219 Content Marketing/SEO Manager \n", "\n", " location department salary_range \\\n", "0 US, VA, Virginia Beach NaN NaN \n", "1 US, TX, Dallas NaN NaN \n", "2 NZ, , Auckland Permanent NaN \n", "3 US, NE, Omaha NaN NaN \n", "4 US, CA, Los Angeles Marketing NaN \n", "\n", " company_profile \\\n", "0 Tidewater Finance Co. was established in 1992 ... \n", "1 Novitex Enterprise Solutions, formerly Pitney ... \n", "2 SilverStripe CMS & Framework is an open so... \n", "3 ABC Supply Co., Inc. is the nation’s largest w... \n", "4 MeUndies is a lifestyle brand that is transfor... \n", "\n", " description \\\n", "0 tidewat financ compani locat virginia beach va... \n", "1 custom servic associ base dalla tx right candi... \n", "2 look dedic passion softwar test analyst team p... \n", "3 sale repres provid assist custom purchas mater... \n", "4 meundi lifestyl brand transform way peopl perc... \n", "\n", " requirements \\\n", "0 The position requires the following qualificat... \n", "1 QualificationsMinimum of 1 year customer servi... \n", "2 NaN \n", "3 As a Sales Representative, you must have the a... \n", "4 REQUIREMENTS/QUALIFICATIONS/PERSONAL ATTRIBUTE... \n", "\n", " benefits telecommuting \\\n", "0 Our company offers a competitive salary plus B... 0 \n", "1 NaN 0 \n", "2 NaN 0 \n", "3 Your benefits package as a Sales Representativ... 0 \n", "4 WHY MEUNDIES?We're a fast-growing, VC-backed c... 0 \n", "\n", " has_company_logo has_questions employment_type required_experience \\\n", "0 1 0 Full-time Entry level \n", "1 1 0 Full-time Entry level \n", "2 1 1 Full-time Mid-Senior level \n", "3 1 0 Full-time NaN \n", "4 1 0 Full-time Mid-Senior level \n", "\n", " required_education industry \\\n", "0 Unspecified Financial Services \n", "1 High School or equivalent Telecommunications \n", "2 NaN Information Technology and Services \n", "3 NaN Building Materials \n", "4 Bachelor's Degree Internet \n", "\n", " function fraudulent \n", "0 Customer Service 0 \n", "1 Customer Service 0 \n", "2 NaN 0 \n", "3 Sales 0 \n", "4 Marketing 0 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_data = pd.read_csv(\"./data/train_set.csv\")\n", "train_data.head()" ] }, { "cell_type": "markdown", "id": "b9a711c3-8064-4dfe-94b3-45663498f7ee", "metadata": {}, "source": [ "1. `job_id` : Eliminated. \n", "2. `title`, `description`, `requirements`,`benefits`: Word Feature Extraction. Click [here](FT.ipynb) for more detailed explanation. \n", "3. `location`: Only state will be extracted and used as features using OHE. Click [here](PF.ipynb) for more detailed information.\n", "4. `department`, `salary_range`: Eliminated. Click [here](PF.ipynb) for more detailed information.\n", "5. `company_profile`, `required_education`: Binarized as NA (1) and NON-NA (0). Click [here](PF.ipynb) for more detailed information.\n", "6. `telecommuting`, `has_company_logo`,`has_questions`: Already binarized. No preprocessing needed. \n", "7. `employment_type`, `required_experience`, `industry`, `function`: One Hot Encoded. Click [here](OHE.ipynb) for more detailed information." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }