TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Full Changelog: https://github.com/QData/TextAttack/compare/v0.3.9...v0.3.10
this release mainly is about
Full Changelog: https://github.com/QData/TextAttack/compare/v0.3.8...v0.3.9
#689: Add more type annotations and do some code cleanup in AttackedText notably removed some code that did Chinese word segmentation because it did not properly support words_from_text, which caused issues with various transformations.
#691: Optimize comparison between two AttackedText objects (thanks @plasmashen!)
#693: Fix bug with writing parameters twice in AttackedText (thanks @89x98!)
#700: Lots of miscellaneous bug fixes and some helper function implementation
#701: Fix bugs with loading TedTalk translation dataset, using T5, seq2sick/text-to-text goal functions
transformers>=4.21.0
datasets==2.4.0
sentence_transformers==2.2.0
gensim==4.1.2
tensorflow==2.7.0
(Thanks @VijayKalmath !!!!)textattack train
#653 (thanks @VijayKalmath)Thanks to everyone who contributed to TextAttack this summer, and a special shoutout once more to @VijayKalmath for all the hard work and attention to detail. Glad to see TextAttack so healthy 🙂
TA_DEVICE
env variableMaxNumWordsModified
GreedyWordSwapWIR
to allow passing of specific unk token__eq__
method of AttackedText
in textattack/shared/attacked_text.py
by @wenh06 in https://github.com/QData/TextAttack/pull/509
Full Changelog: https://github.com/QData/TextAttack/compare/v0.3.3...v0.3.4
Merge pull request #508 from QData/example_bug_fix
Merge pull request #505 from QData/s3-model-fix
Merge pull request #503 from QData/multilingual-doc
Merge pull request #502 from QData/Notebook-10-bug-fix
Merge pull request #500 from QData/docstring-rework-missing
Merge pull request #497 from QData/dependabot/pip/docs/tensorflow-2.4.2
Merge pull request #495 from QData/readthedoc-fix
Merge pull request #473 from cogeid/file-redirection-fix
Merge pull request #469 from xinzhel/allennlp_doc
Merge pull request #477 from cogeid/Fix-RandomSwap-and-RandomSynonymI…
Merge pull request #484 from QData/update-torch-version
Merge pull request #490 from QData/scipy-version-plus-two-doc-updates
Merge pull request #420 from QData/multilingual
Merge pull request #495 from QData/readthedoc-fix
We have added two new classes called Attacker
and Trainer
that can be used to perform adversarial attacks and adversarial training with full logging support and multi-GPU parallelism. This is intended to provide an alternative way of performing attacks and training for custom models and datasets.
Attacker
: Running Adversarial AttacksBelow is an example use of Attacker
to attack BERT model finetuned on IMDB dataset using TextFooler method. AttackArgs
class is used to set the parameters of the attacks, including the number of examples to attack, CSV file to log the results, and the interval at which to save checkpoint.
More details about Attacker
and AttackArgs
can be found here.
Trainer
: Running Adversarial TrainingPreviously, TextAttack supported adversarial training in a limited manner. Users could only train models using the CLI command, and not every aspects of training was available for tuning.
Trainer
class introduces an easy way to train custom PyTorch/Transformers models on a custom dataset. Below is an example where we finetune BERT on IMDB dataset with an adversarial attack called DeepWordBug.
Dataset
Previously, datasets passed to TextAttack were simply expected to be an iterable of (input, target)
tuples. While this offers flexibility, it prevents users from passing key information about the dataset that TextAttack can use to provide better experience (e.g. label names, label remapping, input column names used for printing).
We instead explicitly define Dataset
class that users can use or subclass for their own datasets.
We have added a new attack proposed by "Contextualized Perturbation for Textual Adversarial Attack" (Li et al., 2020). There's also a corresponding augmenter recipe using CLARE. Thanks to @Hanyu-Liu-123, @cookielee77.
We have added support for custom word embedding via AbstractWordEmbedding
, WordEmbedding
, GensimWordEmbedding
fromtextattack.shared
. These three classes allow users to use their own custom word embeddings for transformations and constraints that require custom word embeddings. Thanks @tsinggggg and @alexander-zap for contributing!
ModelWrapper
to not require get_grad
method to be defined. (#381)WordSwapMaskedLM
that was causing words with lowest probability to be picked first. (#396)