Tensorflow Implementation of Recurrent Convolutional Neural Network for Relation Extraction
Tensorflow Implementation of Deep Learning Approach for Relation Extraction Challenge(SemEval-2010 Task #8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals) via Recurrent Convolutional Neural Networks.
train data is located in "<U>SemEval2010_task8_all_data/SemEval2010_task8_training/TRAIN_FILE.TXT</U>".
"GoogleNews-vectors-negative300" is used as pre-trained word2vec model.
Display help message:
python train.py --help
ptional arguments:
-h, --help show this help message and exit
--train_dir TRAIN_DIR
Path of train data
--dev_sample_percentage DEV_SAMPLE_PERCENTAGE
Percentage of the training data to use for validation
--max_sentence_length MAX_SENTENCE_LENGTH
Max sentence length in train(98)/test(70) data
(Default: 100)
--word2vec WORD2VEC Word2vec file with pre-trained embeddings
--text_embedding_dim TEXT_EMBEDDING_DIM
Dimensionality of word embedding (Default: 300)
--position_embedding_dim POSITION_EMBEDDING_DIM
Dimensionality of position embedding (Default: 100)
--filter_sizes FILTER_SIZES
Comma-separated filter sizes (Default: 2,3,4,5)
--num_filters NUM_FILTERS
Number of filters per filter size (Default: 128)
--dropout_keep_prob DROPOUT_KEEP_PROB
Dropout keep probability (Default: 0.5)
--l2_reg_lambda L2_REG_LAMBDA
L2 regularization lambda (Default: 3.0)
--batch_size BATCH_SIZE
Batch Size (Default: 64)
--num_epochs NUM_EPOCHS
Number of training epochs (Default: 100)
--display_every DISPLAY_EVERY
Number of iterations to display training info.
--evaluate_every EVALUATE_EVERY
Evaluate model on dev set after this many steps
--checkpoint_every CHECKPOINT_EVERY
Save model after this many steps
--num_checkpoints NUM_CHECKPOINTS
Number of checkpoints to store
--learning_rate LEARNING_RATE
Which learning rate to start with. (Default: 1e-3)
--allow_soft_placement [ALLOW_SOFT_PLACEMENT]
Allow device soft device placement
--noallow_soft_placement
--log_device_placement [LOG_DEVICE_PLACEMENT]
Log placement of ops on devices
--nolog_device_placement
Train Example:
python train.py --word2vec "GoogleNews-vectors-negative300.bin"
test data is located in "<U>SemEval2010_task8_all_data/SemEval2010_task8_testing_keys/TEST_FILE_FULL.TXT</U>".
You must give "checkpoint_dir" argument, path of checkpoint(trained neural model) file, like below example.
Evaluation Example:
python eval.py --checkpoint_dir "runs/1523902663/checkpoints"
Official Evaluation of SemEval 2010 Task #8
$ cd SemEval2010_task8_all_data/SemEval2010_task8_scorer-v1.2
$ perl semeval2010_task8_format_checker.pl ../../result/prediction.txt
$ perl semeval2010_task8_scorer-v1.2.pl ../../result/prediction.txt ../../result/answer.txt
SemEval-2010 Task #8 Dataset [Download]
Relation | Train Data | Test Data | Total Data |
---|---|---|---|
Cause-Effect | 1,003 (12.54%) | 328 (12.07%) | 1331 (12.42%) |
Instrument-Agency | 504 (6.30%) | 156 (5.74%) | 660 (6.16%) |
Product-Producer | 717 (8.96%) | 231 (8.50%) | 948 (8.85%) |
Content-Container | 540 (6.75%) | 192 (7.07%) | 732 (6.83%) |
Entity-Origin | 716 (8.95%) | 258 (9.50%) | 974 (9.09%) |
Entity-Destination | 845 (10.56%) | 292 (10.75%) | 1137 (10.61%) |
Component-Whole | 941 (11.76%) | 312 (11.48%) | 1253 (11.69%) |
Member-Collection | 690 (8.63%) | 233 (8.58%) | 923 (8.61%) |
Message-Topic | 634 (7.92%) | 261 (9.61%) | 895 (8.35%) |
Other | 1,410 (17.63%) | 454 (16.71%) | 1864 (17.39%) |
Total | 8,000 (100.00%) | 2,717 (100.00%) | 10,717 (100.00%) |