Tensorflow Datasets Versions Save

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

v4.9.4

4 months ago

Added

  • A new CroissantBuilder which initializes a DatasetBuilder based on a Croissant metadata file.
  • New conversion options between different bounding boxes formats.
  • Better support for HuggingfaceDatasetBuilder.
  • A script to convert a dataset from one format to another.

Changed

Deprecated

  • Python 3.9 support. TFDS now uses Python 3.10

Removed

Fixed

Security

v4.9.3

8 months ago

Added

Changed

  • Hugging Face datasets accept None values for any features. TFDS has no tfds.features.Optional, so None values are converted to default values. Those default values used to be 0 and 0.0 for int and float. Now, it's -inf as defined by NumPy (e.g., np.iinfo(np.int32).min or np.finfo(np.float32).min). This avoids ambiguous values when 0 and 0.0 exist in the values of the dataset. The roadmap is to implement tfds.features.Optional.

Deprecated

  • Python 3.8 support. As per NEP 29, TFDS now uses Python>=3.9.

Removed

Fixed

Security

v4.9.2

1 year ago

Added

  • [Experimental] A list of freeform text tags can now be attached to a BuilderConfig. For example:
    BUILDER_CONFIGS = [
        tfds.core.BuilderConfig(name="foo", tags=["foo", "live"]),
        tfds.core.BuilderConfig(name="bar", tags=["bar", "old"]),
    ]
    
    The tags are recorded with the dataset metadata and can later be retrieved using the info object:
    builder.info.config_tags  # ["foo", "live"]
    
    This feature is experimental and there are no guidelines on tags format.

Changed

Deprecated

Removed

Fixed

  • Fixed generated proto files (see issue 4858).

Security

v4.9.1

1 year ago

Added

Changed

Deprecated

Removed

Fixed

  • The installation on macOS now works (see issues 4805 and 4852). The ArrayRecord dependency is lazily loaded, so the TensorFlow-less path is not possible at the moment on macOS. A fix for this will follow soon.

Security

v4.9.0

1 year ago

Added

Changed

  • Support for tensorflow=2.12.

Deprecated

Removed

Fixed

Security

v4.8.3

1 year ago

Added

Changed

Deprecated

  • Python 3.7 support: this version and future version use Python 3.8.

Removed

Fixed

  • Flag ignore_verifications from Hugging Face's datasets.load_dataset is deprecated, and used to cause errors in tfds.load(huggingface:foo).

Security

v4.8.2

1 year ago

Deprecated

  • Python 3.7 support: this is the last version of TFDS supporting Python 3.7. Future versions will use Python 3.8.

Fixed

  • tfds new and tfds build better support the new recommended datasets organization, where individual datasets have their own package under datasets/, builder class is called Builder and is defined within module ${dsname}_dataset_builder.py.

Security

v4.8.1

1 year ago

Changed

  • Added file valid_tags.txt to not break builds.
  • TFDS no longer relies on TensorFlow DTypes. We chose NumPy DTypes to keep the typing expressiveness, while dropping the heavy dependency on TensorFlow. We migrated all our internal datasets. Please, migrate accordingly:
    • tf.bool: np.bool_
    • tf.string: np.str_
    • tf.int64, tf.int32, etc: np.int64, np.int32, etc
    • tf.float64, tf.float32, etc: np.float64, np.float32, etc

v4.8.0

1 year ago

Added

  • [API] DatasetBuilder's description and citations can be specified in dedicated README.md and CITATIONS.bib files, within the dataset package (see https://www.tensorflow.org/datasets/add_dataset).
  • Tags can be associated to Datasets, in the TAGS.txt file. For now, they are only used in the generated documentation.
  • [API][Experimental] New ViewBuilder to define datasets as transformations of existing datasets. Also adds tfds.transform with functionality to apply transformations.
  • Loggers are also called on tfds.as_numpy(...), base Logger class has a new corresponding method.
  • tfds.core.DatasetBuilder can have a default limit for the number of simultaneous downloads. tfds.download.DownloadConfig can override it.
  • tfds.features.Audio supports storing raw audio data for lazy decoding.
  • The number of shards can be overridden when preparing a dataset: builder.download_and_prepare(download_config=tfds.download.DownloadConfig(num_shards=42)). Alternatively, you can configure the min and max shard size if you want TFDS to compute the number of shards for you, but want to have control over the shard sizes.

Changed

Deprecated

Removed

Fixed

Security

v4.7.0

1 year ago

Added

  • [API] Added TfDataBuilder that is handy for storing experimental ad hoc TFDS datasets in notebook-like environments such that they can be versioned, described, and easily shared with teammates.
  • [API] Added options to create format-specific dataset builders. The new API now includes a number of NLP-specific builders, such as:
  • [API] Added tfds.beam.inc_counter to reduce beam.metrics.Metrics.counter boilerplate
  • [API] Added options to group together existing TFDS datasets into dataset collections and to perform simple operations over them.
  • [Documentation] update, specifically:
    • New guide on format-specific dataset builders;
    • New guide on adding new dataset collections to TFDS;
    • Updated TFDS CLI documentation.
  • [TFDS CLI] Supports custom config through Json (e.g. tfds build my_dataset --config='{"name": "my_custom_config", "description": "Abc"}')
  • New datasets:
  • Updated datasets:
    • C4 was updated to version 3.1.
    • common_voice was updated to a more recent snapshot.
    • wikipedia was updated with the 20220620 snapshot.
  • New dataset collections, such as xtreme and LongT5

Changed

  • The base Logger class expects more information to be passed to the as_dataset method. This should only be relevant to people who have implemented and registered custom Logger class(es).
  • You can set DEFAULT_BUILDER_CONFIG_NAME in a DatasetBuilder to change the default config if it shouldn't be the first builder config defined in BUILDER_CONFIGS.

Deprecated

Removed

Fixed

  • Various datasets
  • In Linux, when loading a dataset from a directory that is not your home (~) directory, a new ~ directory is not created in the current directory (fixes #4117).

Security