Nlp Classification Save

Implementing nlp papers relevant to classification with PyTorch, gluonnlp

Project README

NLP paper implementation relevant to classification with PyTorch

The papers were implemented in using korean corpus

Prelimnary & Usage

  • preliminary
pyenv virualenv 3.7.7 nlp
pyenv activate nlp
pip install -r requirements.txt
  • Usage
python build_dataset.py
python build_vocab.py
python train.py # default training parameter
python evaluate.py # defatul evaluation parameter

Single sentence classification (sentiment classification task)

  • Using the Naver sentiment movie corpus v1.0 (a.k.a. nsmc)
  • Configuration
    • conf/model/{type}.json (e.g. type = ["sencnn", "charcnn",...])
    • conf/dataset/nsmc.json
  • Structure
# example: Convolutional_Neural_Networks_for_Sentence_Classification
├── build_dataset.py
├── build_vocab.py
├── conf
│   ├── dataset
│   │   └── nsmc.json
│   └── model
│       └── sencnn.json
├── evaluate.py
├── experiments
│   └── sencnn
│       └── epochs_5_batch_size_256_learning_rate_0.001
├── model
│   ├── data.py
│   ├── __init__.py
│   ├── metric.py
│   ├── net.py
│   ├── ops.py
│   ├── split.py
│   └── utils.py
├── nsmc
│   ├── ratings_test.txt
│   ├── ratings_train.txt
│   ├── test.txt
│   ├── train.txt
│   ├── validation.txt
│   └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy Train (120,000) Validation (30,000) Test (50,000) Date
SenCNN 91.95% 86.54% 85.84% 20/05/30
CharCNN 86.29% 81.69% 81.38% 20/05/30
ConvRec 86.23% 82.93% 82.43% 20/05/30
VDCNN 86.59% 84.29% 84.10% 20/05/30
SAN 90.71% 86.70% 86.37% 20/05/30
ETRIBERT 91.12% 89.24% 88.98% 20/05/30
SKTBERT 92.20% 89.08% 88.96% 20/05/30

Pairwise-text-classification (paraphrase detection task)

# example: Siamese_recurrent_architectures_for_learning_sentence_similarity
├── build_dataset.py
├── build_vocab.py
├── conf
│   ├── dataset
│   │   └── qpair.json
│   └── model
│       └── siam.json
├── evaluate.py
├── experiments
│   └── siam
│       └── epochs_5_batch_size_64_learning_rate_0.001
├── model
│   ├── data.py
│   ├── __init__.py
│   ├── metric.py
│   ├── net.py
│   ├── ops.py
│   ├── split.py
│   └── utils.py
├── qpair
│   ├── kor_pair_test.csv
│   ├── kor_pair_train.csv
│   ├── test.txt
│   ├── train.txt
│   ├── validation.txt
│   └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy Train (6,136) Validation (682) Test (758) Date
Siam 93.00% 83.13% 83.64% 20/05/30
SAN 89.47% 82.11% 81.53% 20/05/30
Stochastic 89.26% 82.69% 80.07% 20/05/30
ETRIBERT 95.07% 94.42% 94.06% 20/05/30
SKTBERT 95.43% 92.52% 93.93% 20/05/30
Open Source Agenda is not affiliated with "Nlp Classification" Project. README Source: seopbo/nlp_classification
Stars
232
Open Issues
15
Last Commit
1 year ago
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating