Tensorforce Versions Save

Tensorforce: a TensorFlow library for applied reinforcement learning

0.6.5

2 years ago
Agents:
  • Renamed agent argument reward_preprocessing to reward_processing, and in case of Tensorforce agent moved to reward_estimation[reward_processing]
Distributions:
  • New categorical distribution argument skip_linear to not add the implicit linear logits layer
Environments:
  • Support for multi-actor parallel environments via new function Environment.num_actors()
    • Runner uses multi-actor parallelism by default if environment is multi-actor
  • New optional Environment function episode_return() which returns the true return of the last episode, if cumulative sum of environment rewards is not a good metric for runner display
Examples:
  • New vectorized_environment.py and multiactor_environment.py script to illustrate how to setup a vectorized/multi-actor environment.

0.6.4

2 years ago
Agents:
  • Agent argument update_frequency / update[frequency] now supports float values > 0.0, which specify the update-frequency relative to the batch-size
  • Changed default value for argument update_frequency from 1.0 to 0.25 for DQN, DoubleDQN, DuelingDQN agents
  • New argument return_processing and advantage_processing (where applicable) for all agent sub-types
  • New function Agent.get_specification() which returns the agent specification as dictionary
  • New function Agent.get_architecture() which returns a string representation of the network layer architecture
Modules:
  • Improved and simplified module specification, for instance: network=my_module instead of network=my_module.TestNetwork, or environment=envs.custom_env instead of environment=envs.custom_env.CustomEnvironment (module file needs to be in the same directory or a sub-directory)
Networks:
  • New argument single_output=True for some policy types which, if False, allows the specification of additional network outputs for some/all actions via registered tensors
  • KerasNetwork argument model now supports arbitrary functions as long as they return a tf.keras.Model
Layers:
  • New layer type SelfAttention (specification key: self_attention)
Parameters:
  • Support tracking of non-constant parameter values
Runner:
  • Rename attribute episode_rewards as episode_returns, and TQDM status reward as return
  • Extend argument agent to support Agent.load() keyword arguments to load an existing agent instead of creating a new one.
Examples:
  • Added action_masking.py example script to illustrate an environment implementation with built-in action masking.
Buxfixes:
  • Customized device placement was not applied to most tensors

0.6.3

3 years ago
Agents:
  • New agent argument tracking and corresponding function tracked_tensors() to track and retrieve the current value of predefined tensors, similar to summarizer for TensorBoard summaries
  • New experimental value trace_decay and gae_decay for Tensorforce agent argument reward_estimation, soon for other agent types as well
  • New options "early" and "late" for value estimate_advantage of Tensorforce agent argument reward_estimation
  • Changed default value for Agent.act() argument deterministic from False to True
Networks:
  • New network type KerasNetwork (specification key: keras) as wrapper for networks specified as Keras model
  • Passing a Keras model class/object as policy/network argument is automatically interpreted as KerasNetwork
Distributions:
  • Changed Gaussian distribution argument global_stddev=False to stddev_mode='predicted'
  • New Categorical distribution argument temperature_mode=None
Layers:
  • New option for Function layer argument function to pass string function expression with argument "x", e.g. "(x+1.0)/2.0"
Summarizer:
  • New summary episode-length recorded as part of summary label "reward"
Environments:
  • Support for vectorized parallel environments via new function Environment.is_vectorizable() and new argument num_parallel for Environment.reset()
    • See tensorforce/environments.cartpole.py for a vectorizable environment example
    • Runner uses vectorized parallelism by default if num_parallel > 1, remote=None and environment supports vectorization
    • See examples/act_observe_vectorized.py for more details on act-observe interaction
  • New extended and vectorizable custom CartPole environment via key custom_cartpole (work in progress)
  • New environment argument reward_shaping to provide a simple way to modify/shape rewards of an environment, can be specified either as callable or string function expression
run.py script:
  • New option for command line arguments --checkpoints and --summaries to add comma-separated checkpoint/summary filename in addition to directory
  • Added episode lengths to logging plot besides episode returns
Buxfixes:
  • Temporal horizon handling of RNN layers
  • Critical bugfix for late horizon value prediction (including DQN variants and DPG agent) in combination with baseline RNN
  • GPU problems with scatter operations

0.6.2

3 years ago
Buxfixes:
  • Critical bugfix for DQN variants and DPG agent

0.6.1

3 years ago
Agents:
  • Removed default value "adam" for Tensorforce agent argument optimizer (since default optimizer argument learning_rate removed, see below)
  • Removed option "minimum" for Tensorforce agent argument memory, use None instead
  • Changed default value for dqn/double_dqn/dueling_dqn agent argument huber_loss from 0.0 to None
Layers:
  • Removed default value 0.999 for exponential_normalization layer argument decay
  • Added new layer batch_normalization (generally should only be used for the agent arguments reward_processing[return_processing] and reward_processing[advantage_processing])
  • Added exponential/instance_normalization layer argument only_mean with default False
  • Added exponential/instance_normalization layer argument min_variance with default 1e-4
Optimizers:
  • Removed default value 1e-3 for optimizer argument learning_rate
  • Changed default value for optimizer argument gradient_norm_clipping from 1.0 to None (no gradient clipping)
  • Added new optimizer doublecheck_step and corresponding argument doublecheck_update for optimizer wrapper
  • Removed linesearch_step optimizer argument accept_ratio
  • Removed natural_gradient optimizer argument return_improvement_estimate
Saver:
  • Added option to specify agent argument saver as string, which is interpreted as saver[directory] with otherwise default values
  • Added default value for agent argument saver[frequency] as 10 (save model every 10 updates by default)
  • Changed default value of agent argument saver[max_checkpoints] from 5 to 10
Summarizer:
  • Added option to specify agent argument summarizer as string, which is interpreted as summarizer[directory] with otherwise default values
  • Renamed option of agent argument summarizer from summarizer[labels] to summarizer[summaries] (use of the term "label" due to earlier version, outdated and confusing by now)
  • Changed interpretation of agent argument summarizer[summaries] = "all" to include only numerical summaries, so all summaries except "graph"
  • Changed default value of agent argument summarizer[summaries] from ["graph"] to "all"
  • Changed default value of agent argument summarizer[max_summaries] from 5 to 7 (number of different colors in TensorBoard)
  • Added option summarizer[filename] to agent argument summarizer
Recorder:
  • Added option to specify agent argument recorder as string, which is interpreted as recorder[directory] with otherwise default values
run.py script:
  • Added --checkpoints/--summaries/--recordings command line argument to enable saver/summarizer/recorder agent argument specification separate from core agent configuration
Examples:
  • Added save_load_agent.py example script to illustrate regular agent saving and loading
Buxfixes
  • Fixed problem with optimizer argument gradient_norm_clipping not being applied correctly
  • Fixed problem with exponential_normalization layer not updating moving mean and variance correctly
  • Fixed problem with recent memory for timestep-based updates sometimes sampling invalid memory indices

0.6.0

3 years ago
  • Removed agent arguments execution, buffer_observe, seed
  • Renamed agent arguments baseline_policy/baseline_network/critic_network to baseline/critic
  • Renamed agent reward_estimation arguments estimate_horizon to predict_horizon_values, estimate_actions to predict_action_values, estimate_terminal to predict_terminal_values
  • Renamed agent argument preprocessing to state_preprocessing
  • Default agent preprocessing linear_normalization
  • Moved agent arguments for reward/return/advantage processing from preprocessing to reward_preprocessing and reward_estimation[return_/advantage_processing]
  • New agent argument config with values buffer_observe, enable_int_action_masking, seed
  • Renamed PPO/TRPO/DPG argument critic_network/_optimizer to baseline/baseline_optimizer
  • Renamed PPO argument optimization_steps to multi_step
  • New TRPO argument subsampling_fraction
  • Changed agent argument use_beta_distribution default to false
  • Added double DQN agent (double_dqn)
  • Removed Agent.act() argument evaluation
  • Removed agent function arguments query (functionality removed)
  • Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf): save/load functions and saver argument changed
  • Default behavior when specifying saver is not to load agent, unless agent is created via Agent.load
  • Agent summarizer functionality changed: summarizer argument changed, some summary labels and other options removed
  • Renamed RNN layers internal_{rnn/lstm/gru} to rnn/lstm/gru and rnn/lstm/gru to input_{rnn/lstm/gru}
  • Renamed auto network argument internal_rnn to rnn
  • Renamed (internal_)rnn/lstm/gru layer argument length to horizon
  • Renamed update_modifier_wrapper to optimizer_wrapper
  • Renamed optimizing_step to linesearch_step, and UpdateModifierWrapper argument optimizing_iterations to linesearch_iterations
  • Optimizer subsampling_step accepts both absolute (int) and relative (float) fractions
  • Objective policy_gradient argument ratio_based renamed to importance_sampling
  • Added objectives state_value and action_value
  • Added Gaussian distribution arguments global_stddev and bounded_transform (for improved bounded action space handling)
  • Changed default memory device argument to CPU:0
  • Renamed rewards summaries
  • Agent.create() accepts act-function as agent argument for recording
  • Singleton states and actions are now consistently handled as singletons
  • Major change to policy handling and defaults, in particular parametrized_distributions, new default policies parametrized_state/action_value
  • Combined long and int type
  • Always wrap environment in EnvironmentWrapper class
  • Changed tune.py arguments

0.5.5

3 years ago
  • Changed independent mode of agent.act to use final values of dynamic hyperparameters and avoid TensorFlow conditions
  • Extended "tensorflow" format of agent.save to include an optimized Protobuf model with an act-only graph as .pb file, and Agent.load format "pb-actonly" to load act-only agent based on Protobuf model
  • Support for custom summaries via new summarizer argument value custom to specify summary type, and Agent.summarize(...) to record summary values
  • Added min/max-bounds for dynamic hyperparameters min/max-bounds to assert valid range and infer other arguments
  • Argument batch_size now mandatory for all agent classes
  • Removed Estimator argument capacity, now always automatically inferred
  • Internal changes related to agent arguments memory, update and reward_estimation
  • Changed the default bias and activation argument of some layers
  • Fixed issues with sequence preprocessor
  • DQN and dueling DQN properly constrained to int actions only
  • Added use_beta_distribution argument with default True to many agents and ParametrizedDistributions policy, so default can be changed

0.5.4

4 years ago
  • DQN/DuelingDQN/DPG argument memory now required to be specified explicitly, plus update_frequency default changed
  • Removed (temporarily) conv1d/conv2d_transpose layers due to TensorFlow gradient problems
  • Agent, Environment and Runner can now be imported via from tensorforce import ...
  • New generic reshape layer available as reshape
  • Support for batched version of Agent.act and Agent.observe
  • Support for parallelized remote environments based on Python's multiprocessing and socket (replacing tensorforce/contrib/socket_remote_env/ and tensorforce/environments/environment_process_wrapper.py), available via Environment.create(...), Runner(...) and run.py
  • Removed ParallelRunner and merged functionality with Runner
  • Changed run.py arguments
  • Changed independent mode for Agent.act: additional argument internals and corresponding return value, initial internals via Agent.initial_internals(), Agent.reset() not required anymore
  • Removed deterministic argument for Agent.act unless independent mode
  • Added format argument to save/load/restore with supported formats tensorflow, numpy and hdf5
  • Changed save argument append_timestep to append with default None (instead of 'timesteps')
  • Added get_variable and assign_variable agent functions

0.5.3

4 years ago
  • Added optional memory argument to various agents
  • Improved summary labels, particularly "entropy" and "kl-divergence"
  • linear layer now accepts tensors of rank 1 to 3
  • Network output / distribution input does not need to be a vector anymore
  • Transposed convolution layers (conv1d/2d_transpose)
  • Parallel execution functionality contributed by @jerabaul29, currently under tensorforce/contrib/
  • Accept string for runner save_best_agent argument to specify best model directory different from saver configuration
  • saver argument steps removed and seconds renamed to frequency
  • Moved Parallel/Runner argument max_episode_timesteps from run(...) to constructor
  • New Environment.create(...) argument max_episode_timesteps
  • TensorFlow 2.0 support
  • Improved Tensorboard summaries recording
  • Summary labels graph, variables and variables-histogram temporarily not working
  • TF-optimizers updated to TensorFlow 2.0 Keras optimizers
  • Added TensorFlow Addons dependency, and support for TFA optimizers
  • Changed unit of target_sync_frequency from timesteps to updates for dqn and dueling_dqn agent

0.5.2

4 years ago
  • Improved unittest performance
  • Added updates and renamed timesteps/episodes counter for agents and runners
  • Renamed critic_{network,optimizer} argument to baseline_{network,optimizer}
  • Added Actor-Critic (ac), Advantage Actor-Critic (a2c) and Dueling DQN (dueling_dqn) agents
  • Improved "same" baseline optimizer mode and added optional weight specification
  • Reuse layer now global for parameter sharing across modules
  • New block layer type (block) for easier sharing of layer blocks
  • Renamed PolicyAgent/-Model to TensorforceAgent/-Model
  • New Agent.load(...) function, saving includes agent specification
  • Removed PolicyAgent argument (baseline-)network
  • Added policy argument temperature
  • Removed "same" and "equal" options for baseline_* arguments and changed internal baseline handling
  • Combined state/action_value to value objective with argument value either "state" or "action"