Compare six baseline deep learning models on TrecQA
In a QA system that needs to infer from unstructured corpus, one challenge is to choose the sentence that contains best answer information for the given question.
These files provide six baseline models, i.e. average pooling, RNN, CNN, RNNCNN, QA-LSTM/CNN+attention (Tan, 2015; state-of-art 2015), AP-LSTM/CNN (Santos, 2016; state-of-art 2016) for the TrecQA task (wang et al. 2007).
All models were trained on train-all using Keras 2.1.2.
You can download the glove parameters at here http://nlp.stanford.edu/data/glove.6B.zip
Batch normalization was used to improve the performance of the models over the results of the pasky's experiments.
https://github.com/brmson/dataset-sts/tree/master/data/anssel/wang
If you see the other performance records on this dataset, visit here. https://aclweb.org/aclwiki/Question_Answering_(State_of_the_art)
Model | devMRR | testMRR | etc |
---|---|---|---|
Avg. | 0.855998 | 0.810032 | pdim=0.5, Ddim=1 |
CNN | 0.865507 | 0.859114 | pdim=0.5, p_layers=1, Ddim = 1 |
RNN(LSTM) | 0.842302 | 0.827154 | sdim=5~7, rnn=CuDNNLSTM, rnnbidi_mode=concatenate, Ddim = 2, proj=False |
RNN+CNN | 0.862692 | 0.803874 | Ddim=2, p_layers=2, pdim=0.5, rnn=CuDNNLSTM, rnnbidi_mode=concatenate sdim=1 |
QA-LSTM/CNN+attention | 0.875321 | 0.832281 | Ddim=[1, 1/2], p_layers=2, pdim=0.5, rnn=CuDNNLSTM, rnnbidi_mode=concatenate sdim=1, adim=0.5, state-of-art 2015 |
AP-LSTM/CNN (Attentive Pooling) | 0.883974 | 0.850000 | Ddim=0.1, p_layers=1, pdim=0.5, rnn=CuDNNLSTM, rnnbidi_mode=concatenate sdim=5, w_feat_model=rnn, sdim=4, state-of-art 2016 |
Model | testMRR | etc |
---|---|---|
HyperQA | 0.865 | Tay et al. (2017) |
BiMPM | 0.875 | Wang et al. (2017) |
Compare-Aggregate | 0.899 | Bian et al. (2017) |
IWAN | 0.889 | Shen et al. (2017) |