TensorFlow implementation of "Detecting Incongruity Between News Headline and Body Text via a Deep Hierarchical Encoder," AAAI-19
This repository contains the source code & data corpus used in the following paper,
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarchical Encoder, AAAI-19, paper
tensorflow==1.4 (tested on cuda-8.0, cudnn-6.0)
python==2.7
scikit-learn==0.20.0
nltk==3.3
download preprocessed dataset with the following script
cd data
sh download_processed_dataset_aaai-19.sh
the downloaded dataset will be placed into the following path of the project
/data/aaai-19/para
/data/aaai-19/whole
format (example)
test_title.npy: [100000, 49] - (#samples, #token (index))
test_body: [100000, 1200] - (#samples, #token (index))
test_label: [100000] - (#samples)
dic_mincutN.txt: dictionary
whole-type: using the codes in the ./src_whole
para-type: using the codes in the ./src_para
train_reference_scripts.sh
<< for example >>
train dataset with AHDE model and "whole" method
python AHDE_Model.py --batch_size 256 --encoder_size 80 --context_size 10 --encoderR_size 49 --num_layer 1 --hidden_dim 300 --num_layer_con 1 --hidden_dim_con 300 --embed_size 300 --lr 0.001 --num_train_steps 100000 --is_save 1 --graph_prefix 'ahde' --corpus 'aaai-19_whole' --data_path '../data/target_aaai-19_whole/'
<< for example >>
evaluate test dataset with AHDE model and "whole" method
src_whole$ sh eval_AHDE.sh
whole case
data | Samples | tokens (avg) headline |
tokens (avg) body text |
---|---|---|---|
train | 1,700,000 | 13.71 | 499.81 |
dev | 100,000 | 13.69 | 499.03 |
test | 100,000 | 13.55 | 769.23 |
Note
We crawled articles for "dev" and "test" dataset from different media outlets.
cd data
sh download_processed_dataset_nela-17.sh
python AHDE_Model.py --batch_size 64 --encoder_size 200 --context_size 50 --encoderR_size 25 --num_layer 1 --hidden_dim 100 --num_layer_con 1 --hidden_dim_con 100 --embed_size 300 --use_glove 1 --lr 0.001 --num_train_steps 100000 --is_save 1 --graph_prefix 'ahde' --corpus 'nela-17_whole' --data_path '../data/target_nela-17_whole/'
@inproceedings{yoon2019detecting,
title={Detecting Incongruity between News Headline and Body Text via a Deep Hierarchical Encoder},
author={Yoon, Seunghyun and Park, Kunwoo and Shin, Joongbo and Lim, Hongjun and Won, Seungpil and Cha, Meeyoung and Jung, Kyomin},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
pages={791--800},
year={2019}
}