A resume filtering based on natural language processing
Resume filtering on the basis of Job Descriptions(JDs). It was a summer internship project with Skybits Technologies Pvt. Ltd.
The main feature of the current project is that it searches the entire resume
database to select and display the resumes which fit the best for the provided job description(JD)
. This is, in its current form, achieved by assigning a score to each CV by intelligently comparing them against the corresponding Job Description. This reduces the window to a fraction of an original size of applicants. Resumes in the final window can be manually checked for further analysis. The project uses techniques in Machine Learning
and Natural Language Processing
to automate the process.
. ├── Data │ ├── CVs │ ├── collectCV.py │ └── jd.csv ├── Model │ ├── Model_Training.ipynb │ ├── Sentence_Extraction.ipynb │ ├── paragraph_extraction_from_posts.ipynb │ ├── sample_bitcoin.stackexchange_paras.txt │ ├── sample_bitcoin.stackexchange_sentences.txt ├── Scoring │ ├── CV_ranking.ipynb │ ├── Using Spacy Model.ipynb │ ├── With Word2Vec.ipynb │ ├── context.jpg │ └── prc_data.csv └── Section Extraction ├── Section_Extraction.ipynb ├── convertDocxToText.py ├── convertPDFToText.py ├── extract.py └── get_jd.ipynb
./model/
subdirectory (locally).Posts.xml
sentences.txt
(pure sentences) file for bitcoin.stackexchange.com
subdirectory of the dataset. It was generated from the corresponding paras.txt generated earlier using the code in sentence_extraction_from_paras.txt.ipynb. The process took around 12.5 hours to complete.paras.txt
(paragraph in html tags) file for bitcoin.stackexchange.com
subdirectory of the dataset. It was generated from the Posts.xml
using the code in paragraph_extraction_from_Posts.xml.ipynb
en_core_web_md
spaCy model trained on Google News articles, do not reflect the technological sharpness
required for the project.For example:
'web engineer' will give 'engineer' as a similar word
Job Descriptions
.docx
file to `.txt'..pdf
file to .txt
.7z
.
|