Basic Utilities for PyTorch Natural Language Processing (NLP)
torchnlp.random
for finer grain control of random state building on PyTorch's fork_rng
. This module controls the random state of torch
, numpy
and random
.import random
import numpy
import torch
from torchnlp.random import fork_rng
with fork_rng(seed=123): # Ensure determinism
print('Random:', random.randint(1, 2**31))
print('Numpy:', numpy.random.randint(1, 2**31))
print('Torch:', int(torch.randint(1, 2**31, (1,))))
torchnlp.samplers
enabling pipelining. For example:from torchnlp.samplers import DeterministicSampler
from torchnlp.samplers import BalancedSampler
data = ['a', 'b', 'c'] + ['c'] * 100
sampler = BalancedSampler(data, num_samples=3)
sampler = DeterministicSampler(sampler, random_seed=12)
print([data[i] for i in sampler]) # ['c', 'b', 'a']
torchnlp.samplers.balanced_sampler
for balanced sampling extending Pytorch's WeightedRandomSampler
.torchnlp.samplers.deterministic_sampler
for deterministic sampling based on torchnlp.random
.torchnlp.samplers.distributed_batch_sampler
for distributed batch sampling.torchnlp.samplers.oom_batch_sampler
to sample large batches first in order to force an out-of-memory error.torchnlp.utils.lengths_to_mask
to help create masks from a batch of sequences.torchnlp.utils.get_total_parameters
to measure the number of parameters in a model.torchnlp.utils.get_tensors
to measure the size of an object in number of tensor elements. This is useful for dynamic batch sizing and for torchnlp.samplers.oom_batch_sampler
.from torchnlp.utils import get_tensors
random_object_ = tuple([{'t': torch.tensor([1, 2])}, torch.tensor([2, 3])])
tensors = get_tensors(random_object_)
assert len(tensors) == 2
snli
example (https://github.com/PetrochukM/PyTorch-NLP/pull/84).gitignore
to support Python's virtual environments (https://github.com/PetrochukM/PyTorch-NLP/pull/84)requests
and pandas
dependency. There are only two dependencies remaining. This is useful for production environments. (https://github.com/PetrochukM/PyTorch-NLP/pull/84)LazyLoader
to reduce dependency requirements. (https://github.com/PetrochukM/PyTorch-NLP/commit/4e84780a8a741d6a90f2752edc4502ab2cf89ecb)torchnlp.datasets.Dataset
class in favor of basic Python dictionary lists and pandas
. (https://github.com/PetrochukM/PyTorch-NLP/pull/84)tar.gz
files and unpacking them faster. (https://github.com/PetrochukM/PyTorch-NLP/commit/eb61fee854576c8a57fd9a20ee03b6fcb89c493a)itos
and stoi
to index_to_token
and token_to_index
respectively. (https://github.com/PetrochukM/PyTorch-NLP/pull/84)batch_encode
, batch_decode
, and enforce_reversible
for torchnlp.encoders.text
(https://github.com/PetrochukM/PyTorch-NLP/pull/69)FastText
vector downloads (https://github.com/PetrochukM/PyTorch-NLP/pull/72)LockedDropout
(https://github.com/PetrochukM/PyTorch-NLP/pull/73)weight_drop
(https://github.com/PetrochukM/PyTorch-NLP/pull/76)stack_and_pad_tensors
now returns a named tuple for readability (https://github.com/PetrochukM/PyTorch-NLP/pull/84)torchnlp.utils.split_list
in favor of torchnlp.utils.resplit_datasets
. This is enabled by the modularity of torchnlp.random
. (https://github.com/PetrochukM/PyTorch-NLP/pull/84)torchnlp.utils.datasets_iterator
in favor of Pythons itertools.chain
. (https://github.com/PetrochukM/PyTorch-NLP/pull/84)torchnlp.utils.shuffle
in favor of torchnlp.random
. (https://github.com/PetrochukM/PyTorch-NLP/pull/84)torchnlp.samplers.repeat_sampler
following up on this issue: https://github.com/pytorch/pytorch/issues/15849
LabelEncoder
. Furthermore, added broad support for batch_encode
, batch_decode
and enforce_reversible
.torch.utils.data.dataloader.DataLoader
. For example:from functools import partial
from torchnlp.utils import collate_tensors
from torchnlp.encoders.text import stack_and_pad_tensors
collate_fn = partial(collate_tensors, stack_tensors=stack_and_pad_tensors)
torch.utils.data.dataloader.DataLoader(*args, collate_fn=collate_fn, **kwargs)
pandas
and collections
warnings.Encoder
via enforce_reversible
. For example:
encoder = Encoder().enforce_reversible()
Ensuring Encoder.decode(Encoder.encode(object)) == object
Minor release fixing some issues and bugs.
torchnlp.downloads
torchnlp.word_to_vector
.set
operation to torchnlp.datasets.Dataset
with support for slices, columns and rowsbiggest_batches_first
in torchnlp.samplers
to be more efficient at approximating memory then Pickletorch.utils.pad_tensor
and torch.utils. pad_batch
to support N dimensional tensorstorch.text_encoders
__getitem()__
for _PretrainedWordVectors
. For example:from torchnlp.word_to_vector import FastText
vectors = FastText()
tokenized_sentence = ['this', 'is', 'a', 'sentence']
vectors[tokenized_sentence]
__contains__
for _PretrainedWordVectors
. For example:>>> from torchnlp.word_to_vector import FastText
>>> vectors = FastText()
>>> 'the' in vectors
True
>>> 'theqwe' in vectors
False