AlphaZero.jl Versions Save

A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.

v0.5.4

1 year ago

AlphaZero v0.5.4

Diff since v0.5.3

Closed issues:

  • Deprecate Util.mapreduce in favor of something more standard (#54)
  • Use Base.Logging and ProgressLogging (#55)
  • How to use multiple GPUs on a single node (#69)
  • Cloud computing (#80)
  • Iterative vs continuous learning (#81)
  • Does Alpha Zero require a static representation of a scenario (#83)
  • Does it make sense to attempt to apply AlphaZero to "Agricola" (#100)
  • Multiplayer capability (#101)
  • Issue while running this in local (#107)
  • The strength of the Mancala bot (#110)
  • 6 dependencies errored (#112)
  • Issue with dummy_run() (#114)
  • StackOverflowError (during training) (#116)
  • To continue a training (#118)
  • How important is GI.vectorize_state function? (#119)
  • When exploring a position, what these abbreviations mean? (#121)
  • Cloud service for AlphaZero.jl (#122)
  • Number of network parameters (#126)
  • Scripts.explore (#136)
  • How to disable benchmarks? (#137)
  • A log of played games during training (#138)
  • Which hyperparameters? (#139)
  • Cannot run sample (#143)
  • GPU vs CPU (#144)
  • MCTS.RolloutOracle(gspec) (#145)
  • num_filters=128 (#150)
  • NVIDIA GeForce GTX 1650 isn't good? (#151)
  • What's the best OS for AlphaZero.jl ? (#153)
  • Does it work with Tesla? (#154)

Merged pull requests:

  • CompatHelper: bump compat for Distributions to 0.25, (keep existing compat) (#98) (@github-actions[bot])
  • CompatHelper: bump compat for Flux to 0.13, (keep existing compat) (#108) (@github-actions[bot])
  • Fix typo in readme (#109) (@LilithHafner)
  • Update report.jl (#111) (@gwario)
  • CompatHelper: bump compat for Setfield to 1, (keep existing compat) (#128) (@github-actions[bot])
  • Fix parameter access (#130) (@gwario)
  • CompatHelper: bump compat for LoggingExtras to 1, (keep existing compat) (#152) (@github-actions[bot])

v0.5.3

2 years ago

AlphaZero v0.5.3

Diff since v0.5.2

Closed issues:

  • CUDA Error (#57)
  • τ=0.5 errors (#67)
  • Using continuous rewards (i.e., non ternary games) (#77)
  • Support for singleplayer games (#79)
  • Debugging in VS Code (#84)
  • Desired Type hierarchy for adding GNN's (#85)
  • MCTS.explore! must be called before MCTS.policy (#86)

Merged pull requests:

  • Extend documentation in CommonRLInterface (#70) (@johannes-fischer)
  • Fix error with discounting in RolloutOracle (#73) (@johannes-fischer)
  • Replace mkdir by mkpath (#74) (@johannes-fischer)
  • Use joinpath to make code more robust on Windows machines (#75) (@johannes-fischer)
  • Update experiment.md (#82) (@yutaizhou)
  • Fix memory analysis (#89) (@johannes-fischer)
  • call batch on vectors (not generators) (#91) (@CarloLucibello)
  • add CompatHelper (#92) (@CarloLucibello)
  • CompatHelper: bump compat for Setfield to 0.8, (keep existing compat) (#94) (@github-actions[bot])
  • CompatHelper: bump compat for ExprTools to 0.1, (keep existing compat) (#95) (@github-actions[bot])
  • CompatHelper: bump compat for Documenter to 0.27, (keep existing compat) (#96) (@github-actions[bot])
  • CompatHelper: bump compat for Distributions to 0.25, (keep existing compat) (#97) (@github-actions[bot])

v0.5.2

2 years ago

AlphaZero v0.5.2

Diff since v0.5.1

Closed issues:

  • Support for OpenSpiel games? (#15)
  • How to use AlphaZero.jl for Openspiel games? (#46)
  • Current status of Multi-threading MCTS Benchmarking? (#56)
  • Performance Docs (#58)
  • isprobvec(p) error? (#59)
  • Do these readout look correct? (#60)
  • Benchmark Questions? (#61)
  • Does Scripts.play("connect-four") cheat? (#62)
  • isprobvec(p) whenever using Benchmark.NetworkOnly(τ=0.5) (#63)
  • How exactly does Alphazero's MCTS work? (#64)
  • Any idea what's causing this? (#65)

Merged pull requests:

  • OpenSpiel.jl support (#68) (@michelangelo21)

v0.5.1

2 years ago

AlphaZero v0.5.1

Diff since v0.5.0

Closed issues:

  • API discussion (#4)
  • self play takes more and more time (#41)
  • Supervised learning (#48)
  • MCTS Optimization for sparse actions (#49)
  • Training on the cloud / multiple instances / clusters (#50)
  • Any Tips for per-player tracking? (#51)
  • Sanity Checks (#52)
  • Speed issues? (#53)

Merged pull requests:

  • Mancala - fixed set_state!() (#44) (@michelangelo21)
  • Invert temperature in formula (documentation) (#45) (@johannes-fischer)

v0.5.0

3 years ago

AlphaZero v0.5.0

Diff since v0.4.0

  • Improved the inference server so that it is now possible to keep MCTS workers running while a batch of requests is being processed by the GPU. Concretely, this translates into SimParams now having two separate num_workers and batch_size parameters.
  • The inference server is now spawned on a separate thread to ensure minimal latency.

Together, the two aforementioned improvements result in a 30% global speedup on the connect-four benchmark.

v0.4.0

3 years ago

AlphaZero v0.4.0

This release brings many new features to AlphaZero.jl including:

  • Added support for CommonRLInterface.jl.
  • Added a grid-world MDP example illustrating this new interface.
  • Added support for distributed training: it is now equally easy to train an agent on a cluster of machines than on a single computer.
  • Replaced the async MCTS implementation by a more straightforward synchronous implementation. Network inference requests are now batched across game simulations.
  • Added the Experiment and Scripts module to simplify common tasks.

See CHANGELOD.md for details.

Closed issues:

  • Connect Four training must be restarted about every 24 hours due to an OOM error (#1)
  • The Flux backend is currently broken (#2)
  • Importation of training parameters from JSON is broken (#3)
  • UndefVarError: lib not defined when training a connect four agent (#5)
  • Possibility to skip initial benchmark (#6)
  • Assertion error during apply_symmetry (#7)
  • Checkpoint evaluation randomly fails (#8)
  • MDP Version (#9)
  • Suggestion: replace Oracle with just a function (#10)
  • @unimplemented (#11)
  • Some issues with installing the package (#12)
  • Register package with General registry (#13)
  • Missing repository's website (#16)
  • fail to explore (#17)
  • CuDNN error (#18)
  • using AlphaZero (#19)
  • UndefVarError: lib not defined (#20)
  • LoadError: CUBLASError (#21)
  • Error building Knet (#22)
  • LoadError: InitError: CUDA.jl does not yet support CUDA with nvdisasm 11.1.74; (#23)
  • CuDNN error 8 on Ubuntu 18.04, Julia 1.5.2 (#24)
  • Stateful Game-structs throw errors (#25)
  • LSTM support (#28)
  • CUDA vs CUDAnative? (#29)
  • Embed trained network in javascript web app for browser-based inference? (#30)
  • Connect Four iteration training time is taking a long time (#31)
  • Question about symmetries (#32)
  • Question about function test_symmetry (#33)
  • Migrate neural net agents across AlphaZero.jl instances? (#34)
  • Can a game know its players' types? (#35)
  • Exploit several CPU (#36)
  • Exploit multiple GPUs (#37)
  • Enumerating actions without state (#38)
  • fatal: Remote branch v0.4.0 not found in upstream origin (#39)

Merged pull requests:

  • Mancala (#42) (@michelangelo21)