A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
The wiki contains my notes and summaries of over 150 recent publications related to neural dialog modeling.
:floppy_disk: Run your own trainings or experiment with pre-trained models
:white_check_mark: 4 different dialog datasets integrated with tensor2tensor
:twisted_rightwards_arrows: Seemlessly works with any model or hyperparameter set in tensor2tensor
:rocket: Easily extendable base class for dialog problems
Run setup.py which installs required packages and steps you through downloading additional data:
python setup.py
You can download all trained models used in this paper from here. Each training contains two checkpoints, one for the validation loss minimum and another after 150 epochs. The data and the trainings folder structure match each other exactly.
python t2t_csaky/main.py --mode=train
The mode argument can be one of the following four: {generate_data, train, decode, experiment}. In the experiment mode you can speficy what to do inside the experiment function of the run file. A detailed explanation is given below, for what each mode does.
You can control the flags and parameters of each mode directly in this file. For each run that you initiate this file will be copied to the appropriate directory, so you can quickly access the parameters of any run. There are some flags that you have to set for every mode (the FLAGS dictionary in the config file):
This mode will download and preprocess the data and generate source and target pairs. Currently there are 6 registered problems, that you can use besides the ones given by tensor2tensor:
BIANCA_m0 what good stuff ? CAMERON_m0
The PROBLEM_HPARAMS dictionary in the config file contains problem specific parameters that you can set before generating data:
This mode allows you to train a model with the specified problem and hyperparameters. The code just calls the tensor2tensor training script, so any model that is in tensor2tensor can be used. Besides these, there is also a subclassed model with small modifications:
There are several additional flags that you can specify for a training run in the FLAGS dictionary in the config file, some of which are:
With this mode you can decode from the trained models. The following parameters affect the decoding (in the FLAGS dictionary in the config file):
The following results are from these two papers.
TRF is the Transformer model, while RT means randomly selected responses from the training set and GT means ground truth responses. For an explanation of the metrics see the paper.
S2S is a simple seq2seq model with LSTMs trained on Cornell, others are Transformer models. Opensubtitles F is pre-trained on Opensubtitles and finetuned on Cornell.
TRF is the Transformer model, while RT means randomly selected responses from the training set and GT means ground truth responses. For an explanation of the metrics see the paper.
New problems can be registered by subclassing WordChatbot, or even better to subclass CornellChatbotBasic or OpensubtitleChatbot, because they implement some additional functionalities. Usually it's enough to override the preprocess and create_data functions. Check the documentation for more details and see daily_dialog_chatbot for an example.
New models and hyperparameters can be added by following the tensor2tensor tutorial.
This project is licensed under the MIT License - see the LICENSE file for details.
Please include a link to this repo if you use it in your work and consider citing the following paper:
@InProceedings{Csaky:2017,
title = {Deep Learning Based Chatbot Models},
author = {Csaky, Richard},
year = {2019},
publisher={National Scientific Students' Associations Conference},
url ={https://tdk.bme.hu/VIK/DownloadPaper/asdad},
note={https://tdk.bme.hu/VIK/DownloadPaper/asdad}
}