Word Embeddings And Document Vectors Save

An evaluation of word-embeddings for classification

Project README

Word Embeddings and Document Vectors

This is the source code to go along with the series of blog articles

The code employs,

Elasticsearch (localhost:9200) as the repository
1. to save tokens to, and get them as needed.
2. to save word-vectors (pre-trained or custom) to, and get them as needed.
See the Pipfle for Python dependencies

Usage

Generate tokens for the 20-news corpus & the movie review data set and save them to Elasticsearch.
- The dataset for 20-news is downloaded as part of the script. But you need to download the movie review dataset separately.
- The shell script & python code in the folders text-data/twenty-news & text-data/acl-imdb
Generate custom word vectors for the two text corpus in 1 above and save them to Elasticsearch. text-data/twenty-news/vectors & text-data/acl-imdb/vectors directories have the scripts
Process pre-trained vectors and save them to Elasticsearch. Look into pre-trained-vectors/ for the code. You need to download the actual published vectors from their sources. We have used Word2Vec, Glove and FastText in these articles.
The script run.sh can be configured to run whichever combination of the pipeline steps.
The logs contain the F-scores and timing results. Create a "logs" directory before running the run.sh script

mkdir logs

Open Source Agenda is not affiliated with "Word Embeddings And Document Vectors" Project. README Source: ashokc/Word-Embeddings-and-Document-Vectors

Stars

Open Issues

Last Commit

5 years ago

Repository

ashokc/Word-Embeddings-and-Document-Vectors

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/word-embeddings-and-document-vectors"><img src="https://www.opensourceagenda.com/projects/word-embeddings-and-document-vectors/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022