Resume Filtering Save Abandoned

A resume filtering based on natural language processing

Project README

Resume Filtering Using Machine Learning

Resume filtering on the basis of Job Descriptions(JDs). It was a summer internship project with Skybits Technologies Pvt. Ltd.

Introduction

The main feature of the current project is that it searches the entire resume database to select and display the resumes which fit the best for the provided job description(JD). This is, in its current form, achieved by assigning a score to each CV by intelligently comparing them against the corresponding Job Description. This reduces the window to a fraction of an original size of applicants. Resumes in the final window can be manually checked for further analysis. The project uses techniques in Machine Learning and Natural Language Processing to automate the process.

Directory Structure

.
├── Data
│   ├── CVs
│   ├── collectCV.py
│   └── jd.csv
├── Model
│   ├── Model_Training.ipynb
│   ├── Sentence_Extraction.ipynb
│   ├── paragraph_extraction_from_posts.ipynb
│   ├── sample_bitcoin.stackexchange_paras.txt
│   ├── sample_bitcoin.stackexchange_sentences.txt
├── Scoring
│   ├── CV_ranking.ipynb
│   ├── Using Spacy Model.ipynb
│   ├── With Word2Vec.ipynb
│   ├── context.jpg
│   └── prc_data.csv
└── Section Extraction
    ├── Section_Extraction.ipynb
    ├── convertDocxToText.py
    ├── convertPDFToText.py
    ├── extract.py
    └── get_jd.ipynb

Directory Details

Data

CVs : Contains 250 extracted resumes in text format from indeed.com
collectCV.py : Python script to automate the process of extracting CVs from indeed.com. While this program is running, every new text copied to clipboard is saved as a CV in CVs/ directory in text format.
jd.csv : CSV file containing cleaned job descriptions from Kaggle. Dataset can be found here

Model

Model_Training.ipynb : Notebook to train the word2vec model using gensim. The model was saved in ./model/ subdirectory (locally).
Sentence_Extraction.ipynb : Notebook for extracting cleaned sentences from extracted paragraphs.
paragraph_extraction_from_posts.ipynb : Notebook for extracting paragraphs from Posts.xml
sample_bitcoin.stackexchange_sentences.txt : It is the sentences.txt (pure sentences) file for bitcoin.stackexchange.com subdirectory of the dataset. It was generated from the corresponding paras.txt generated earlier using the code in sentence_extraction_from_paras.txt.ipynb. The process took around 12.5 hours to complete.
sample_bitcoin.stackexchange_paras.txt : It is the paras.txt (paragraph in html tags) file for bitcoin.stackexchange.com subdirectory of the dataset. It was generated from the Posts.xml using the code in paragraph_extraction_from_Posts.xml.ipynb

Scoring

CV_ranking.ipynb : Notebook for ranking the CVs according to JDs(Job Description)
Using Spacy Model.ipynb : Demonstrates the need for a custom Word2Vector model rather than a general model trained otherwise. The similarity values generated by en_core_web_md spaCy model trained on Google News articles, do not reflect the technological sharpness required for the project.
With Word2Vec.ipynb : Demonstrates how to use word2vec to get similar words by words and similar words by vector. It also implements sent2vec() function. This function takes a sentence as a argument and returns a average vector for the sentence. Root Mean Square is used to average the vectors. The advantage of this function is to use it to find similar words for phrases which makes more sense while searching for roles etc.

For example:

'web engineer' will give 'engineer' as a similar word

context.jpg : Pie Chart showing the top three most frequent titles of Job Descriptions
prc_data.csv : CSV file storing processed sections of different resumes.

Section Extraction

Section_Extraction.ipynb : Notebook for extracting sections from different resumes.
convertDocxToText.py : Python script for converting a .docx file to `.txt'.
convertPDFToText.py : Python scrtpt for converting a .pdf file to .txt.
extract.py : Python script for extracting compressed files in 7z.
get_jd.ipynb : Notebook for cleaning and extracting relevant portions of original jd.csv file from Kaggle.

Author

 Prateek Gupta

Open Source Agenda is not affiliated with "Resume Filtering" Project. README Source: prateekguptaiiitk/Resume_Filtering

Stars

Open Issues

Last Commit

1 year ago

License

Apache-2.0

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/resume-filtering"><img src="https://www.opensourceagenda.com/projects/resume-filtering/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022