Ibrahimsharaf Doc2vec Save Abandoned

:notebook: Long(er) text representation and classification using Doc2Vec embeddings

Project README

Doc2Vec Text Classification

Text classification model which uses gensim Doc2Vec for generating paragraph embeddings and scikit-learn Logistic Regression for classification.

Dataset

25,000 IMDB movie reviews, specially selected for sentiment analysis. The sentiment of reviews is binary (1 for postive, 0 for negative).

This source dataset was collected in association with the following publication:

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). "Learning Word Vectors for Sentiment Analysis." The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

Usage

Install the required tools

pip install -r requirements.txt
Run the script

python text_classifier.py

References

Kaggle – Bag of Words Meets Bags of Popcorn (https://www.kaggle.com/c/word2vec-nlp-tutorial)
Gensim – Deep learning with paragraph2vec (https://radimrehurek.com/gensim/models/doc2vec.html)
Quoc Le and Tomas Mikolov. Distributed Representations of Sentences and Documents (https://arxiv.org/pdf/1405.4053v2.pdf)

Open Source Agenda is not affiliated with "Ibrahimsharaf Doc2vec" Project. README Source: ibrahimsharaf/doc2vec

Stars

101

Open Issues

Last Commit

1 year ago

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/ibrahimsharaf-doc2vec"><img src="https://www.opensourceagenda.com/projects/ibrahimsharaf-doc2vec/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022