Code and data of "Methods for Detoxification of Texts for the Russian Language" paper
This repository contains models and evaluation methodology for the detoxification task of Russian texts. The original paper "Methods for Detoxification of Texts for the Russian Language" was presented at Dialogue-2021 conference.
In this repository, we release two best models detoxGPT and condBERT (see Methodology for more details). You can try detoxification inference example in this notebook or .
Also, you can test our models via web-demo or you can pour out your anger on our Telegram bot.
In our research, we tested several approaches:
Based on ruGPT models. This method requires parallel dataset for training. We tested ruGPT-small, ruGPT-medium, and ruGPT-large models in several setups:
Based on BERT model. This method does not require parallel dataset for training. One of the tasks on which original BERT was pretrained -- predicting the word that should was replaced with a [MASK] token -- suits delete-retrieve-generate style transfer method. We tested RuBERT and Geotrend pre-trained models in several setups:
The evaluation consists of three types of metrics:
Finally, aggregation metric: geometric mean between STA, CS and PPL.
You can run ru_metric.py
script for evaluation. The fine-tuned weights for toxicity classifier can be found here.
Method | STA↑ | CS↑ | WO↑ | BLEU↑ | PPL↓ | GM↑ |
---|---|---|---|---|---|---|
Baselines | ||||||
Duplicate | 0.00 | 1.00 | 1.00 | 1.00 | 146.00 | 0.05 ± 0.0012 |
Delete | 0.27 | 0.96 | 0.85 | 0.81 | 263.55 | 0.10 ± 0.0007 |
Retrieve | 0.91 | 0.85 | 0.07 | 0.09 | 65.74 | 0.22 ± 0.0010 |
detoxGPT-small | ||||||
zero-shot | 0.93 | 0.20 | 0.00 | 0.00 | 159.11 | 0.10 ± 0.0005 |
few-shot | 0.17 | 0.70 | 0.05 | 0.06 | 83.38 | 0.11 ± 0.0009 |
fine-tuned | 0.51 | 0.70 | 0.05 | 0.05 | 39.48 | 0.20 ± 0.0011 |
detoxGPT-medium | ||||||
fine-tuned | 0.49 | 0.77 | 0.18 | 0.21 | 86.75 | 0.16 ± 0.0009 |
detoxGPT-large | ||||||
fine-tuned | 0.61 | 0.77 | 0.22 | 0.21 | 36.92 | 0.23 ± 0.0010 |
condBERT | ||||||
DeepPavlov zero-shot | 0.53 | 0.80 | 0.42 | 0.61 | 668.58 | 0.08 ± 0.0006 |
DeepPavlov fine-tuned | 0.52 | 0.86 | 0.51 | 0.53 | 246.68 | 0.12 ± 0.0007 |
Geotrend zero-shot | 0.62 | 0.85 | 0.54 | 0.64 | 237.46 | 0.13 ± 0.0009 |
Geotrend fine-tuned | 0.66 | 0.86 | 0.54 | 0.64 | 209.95 | 0.14 ± 0.0009 |
Folder data
consists of all used train datasets, test data and naive example of style transfer result:
data/train
: RuToxic dataset, list of Russian rude words, and 200 samples of parallel sentences that were used for ruGPT fine-tuning;data/test
: 10,000 samples that were used for approaches evaluation;data/results
: example of style transfer output format illustrated with naive duplication.If you find this repository helpful, feel free to cite our publication:
@article{DBLP:journals/corr/abs-2105-09052,
author = {Daryna Dementieva and
Daniil Moskovskiy and
Varvara Logacheva and
David Dale and
Olga Kozlova and
Nikita Semenov and
Alexander Panchenko},
title = {Methods for Detoxification of Texts for the Russian Language},
journal = {CoRR},
volume = {abs/2105.09052},
year = {2021},
url = {https://arxiv.org/abs/2105.09052},
archivePrefix = {arXiv},
eprint = {2105.09052},
timestamp = {Mon, 31 May 2021 16:16:57 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2105-09052.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
For any questions please contact Daryna Dementieva via email or Telegram.