Document Classifier LSTM Save

A bidirectional LSTM with attention for multiclass/multilabel text classification.

Project README

Document-Classifier-LSTM

Recurrent Neural Networks for multilclass, multilabel classification of texts. The models that learn to tag samll texts with 169 different tags from arxiv.

In classifier.py is implemented a standard BLSTM network with attention.

In hatt_classifier.py you can find the implementation of Hierarchical Attention Networks for Document Classification.

The neural networks were built using Keras and Tensorflow.

The best performing model is the attention BLSTM that achieves a micro f-score of 0.67 on the test set.

The Hierarchical Attention Network achieves only 0.65 micro f-score.

I am using 500k paper abstracts from arxiv. In order to download your own data refer to the arxiv OAI api.

Pretrained word embeddings can be used. The embeddings can either be GloVe or Word2Vec. You can download the GoogleNews-vectors-negative300.bin or the GloVe embeddings.

Usage:

In order to train your own model you must prepare your data set using the data_prep.py script. The preprocessing converts to lower case, tokenizes and removes very short words. The preprocessed files and label files should be saved in a /data folder.
You can now run classifier.py or hatt_classifier.py to build and train the models.
The trained models are exported to json and the weights to h5 for later use.
You can use utils.visualize_attention to visualize the attention weights.

Requirements

Python
NLTK
NumPy
Pandas
SciPy
OpenCV
scikit-learn
Tensorflow
Keras

Run pip install -r requirements.txt to install the requirements.

Open Source Agenda is not affiliated with "Document Classifier LSTM" Project. README Source: AlexGidiotis/Document-Classifier-LSTM

Stars

169

Open Issues

Last Commit

5 months ago

Repository

AlexGidiotis/Document-Classifier-LSTM

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/document-classifier-lstm"><img src="https://www.opensourceagenda.com/projects/document-classifier-lstm/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022