Project README

NLP paper implementation relevant to classification with PyTorch

The papers were implemented in using korean corpus

Prelimnary & Usage

preliminary

pyenv virualenv 3.7.7 nlp
pyenv activate nlp
pip install -r requirements.txt

Usage

python build_dataset.py
python build_vocab.py
python train.py # default training parameter
python evaluate.py # defatul evaluation parameter

Single sentence classification (sentiment classification task)

Using the Naver sentiment movie corpus v1.0 (a.k.a. nsmc)
Configuration
- conf/model/{type}.json (e.g. type = ["sencnn", "charcnn",...])
- conf/dataset/nsmc.json
Structure

# example: Convolutional_Neural_Networks_for_Sentence_Classification
├── build_dataset.py
├── build_vocab.py
├── conf
│   ├── dataset
│   │   └── nsmc.json
│   └── model
│       └── sencnn.json
├── evaluate.py
├── experiments
│   └── sencnn
│       └── epochs_5_batch_size_256_learning_rate_0.001
├── model
│   ├── data.py
│   ├── __init__.py
│   ├── metric.py
│   ├── net.py
│   ├── ops.py
│   ├── split.py
│   └── utils.py
├── nsmc
│   ├── ratings_test.txt
│   ├── ratings_train.txt
│   ├── test.txt
│   ├── train.txt
│   ├── validation.txt
│   └── vocab.pkl
├── train.py
└── utils.py

Model \ Accuracy	Train (120,000)	Validation (30,000)	Test (50,000)	Date
SenCNN	91.95%	86.54%	85.84%	20/05/30
CharCNN	86.29%	81.69%	81.38%	20/05/30
ConvRec	86.23%	82.93%	82.43%	20/05/30
VDCNN	86.59%	84.29%	84.10%	20/05/30
SAN	90.71%	86.70%	86.37%	20/05/30
ETRIBERT	91.12%	89.24%	88.98%	20/05/30
SKTBERT	92.20%	89.08%	88.96%	20/05/30

Convolutional Neural Networks for Sentence Classification (as SenCNN)
- https://arxiv.org/abs/1408.5882
Character-level Convolutional Networks for Text Classification (as CharCNN)
- https://arxiv.org/abs/1509.01626
Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers (as ConvRec)
- https://arxiv.org/abs/1602.00367
Very Deep Convolutional Networks for Text Classification (as VDCNN)
- https://arxiv.org/abs/1606.01781
A Structured Self-attentive Sentence Embedding (as SAN)
- https://arxiv.org/abs/1703.03130
BERT_single_sentence_classification (as ETRIBERT, SKTBERT)
- https://arxiv.org/abs/1810.04805

Pairwise-text-classification (paraphrase detection task)

Creating dataset from https://github.com/songys/Question_pair
Configuration
- conf/model/{type}.json (e.g. type = ["siam", "san",...])
- conf/dataset/qpair.json
Structure

# example: Siamese_recurrent_architectures_for_learning_sentence_similarity
├── build_dataset.py
├── build_vocab.py
├── conf
│   ├── dataset
│   │   └── qpair.json
│   └── model
│       └── siam.json
├── evaluate.py
├── experiments
│   └── siam
│       └── epochs_5_batch_size_64_learning_rate_0.001
├── model
│   ├── data.py
│   ├── __init__.py
│   ├── metric.py
│   ├── net.py
│   ├── ops.py
│   ├── split.py
│   └── utils.py
├── qpair
│   ├── kor_pair_test.csv
│   ├── kor_pair_train.csv
│   ├── test.txt
│   ├── train.txt
│   ├── validation.txt
│   └── vocab.pkl
├── train.py
└── utils.py

Model \ Accuracy	Train (6,136)	Validation (682)	Test (758)	Date
Siam	93.00%	83.13%	83.64%	20/05/30
SAN	89.47%	82.11%	81.53%	20/05/30
Stochastic	89.26%	82.69%	80.07%	20/05/30
ETRIBERT	95.07%	94.42%	94.06%	20/05/30
SKTBERT	95.43%	92.52%	93.93%	20/05/30

A Structured Self-attentive Sentence Embedding (as SAN)
- https://arxiv.org/abs/1703.03130
Siamese recurrent architectures for learning sentence similarity (as Siam)
- https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12195
Stochastic Answer Networks for Natural Language Inference (as Stochastic)
- https://arxiv.org/abs/1804.07888
BERT_pairwise_text_classification (as ETRIBERT, SKTBERT)
- https://arxiv.org/abs/1810.04805

Open Source Agenda is not affiliated with "Nlp Classification" Project. README Source: seopbo/nlp_classification

Stars

232

Open Issues

Last Commit

1 year ago

Repository

seopbo/nlp_classification

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/nlp-classification"><img src="https://www.opensourceagenda.com/projects/nlp-classification/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

Nlp Classification Save

NLP paper implementation relevant to classification with PyTorch

Prelimnary & Usage

Single sentence classification (sentiment classification task)

Pairwise-text-classification (paraphrase detection task)

Open Source Agenda Badge

From the blog

How to Choose Which Programming Language to Learn First?

From the blog

How to Choose Which Programming Language to Learn First?