TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
HuggingfaceDatasetBuilder
.None
values for any features. TFDS has no
tfds.features.Optional
, so None
values are converted to default values.
Those default values used to be 0
and 0.0
for int and float. Now, it's
-inf
as defined by NumPy (e.g., np.iinfo(np.int32).min
or
np.finfo(np.float32).min
). This avoids ambiguous values when 0
and 0.0
exist in the values of the dataset. The roadmap is to implement
tfds.features.Optional
.BuilderConfig
. For example:
BUILDER_CONFIGS = [
tfds.core.BuilderConfig(name="foo", tags=["foo", "live"]),
tfds.core.BuilderConfig(name="bar", tags=["bar", "old"]),
]
The tags are recorded with the dataset metadata and can later be retrieved
using the info object:
builder.info.config_tags # ["foo", "live"]
This feature is experimental and there are no guidelines on tags format.tensorflow=2.12
.tfds new
and tfds build
better support the new recommended datasets
organization, where individual datasets have their own package under
datasets/
, builder class is called Builder
and is defined within module
${dsname}_dataset_builder.py
.valid_tags.txt
to not break builds.tf.bool
: np.bool_
tf.string
: np.str_
tf.int64
, tf.int32
, etc: np.int64
, np.int32
, etctf.float64
, tf.float32
, etc: np.float64
, np.float32
, etcDatasetBuilder
's description and citations can be specified in
dedicated README.md
and CITATIONS.bib
files, within the dataset package
(see https://www.tensorflow.org/datasets/add_dataset).TAGS.txt
file. For
now, they are only used in the generated documentation.ViewBuilder
to define datasets as transformations
of existing datasets. Also adds tfds.transform
with functionality to apply
transformations.tfds.as_numpy(...)
, base Logger
class has a
new corresponding method.tfds.core.DatasetBuilder
can have a default limit for the number of
simultaneous downloads. tfds.download.DownloadConfig
can override it.tfds.features.Audio
supports storing raw audio data for lazy decoding.builder.download_and_prepare(download_config=tfds.download.DownloadConfig(num_shards=42))
.
Alternatively, you can configure the min and max shard size if you want TFDS
to compute the number of shards for you, but want to have control over the
shard sizes.tfds.beam.inc_counter
to reduce beam.metrics.Metrics.counter
boilerplatetfds build my_dataset --config='{"name": "my_custom_config", "description": "Abc"}'
)20220620
snapshot.Logger
class expects more information to be passed to the as_dataset
method. This should only be relevant to people who have implemented and registered custom Logger
class(es).DEFAULT_BUILDER_CONFIG_NAME
in a DatasetBuilder
to change the default config if it shouldn't be the first builder config defined in BUILDER_CONFIGS
.~
) directory, a new ~
directory is not created in the current directory (fixes #4117).