Gpauloski Kfac Pytorch Versions Save

Distributed K-FAC Preconditioner for PyTorch

v0.4.1

2 years ago

Major Changes

  • Critical bug fix in BaseKFACPreconditioner (#48)

Minor Changes

  • Updated Issue Templates to not include tag in title (#46).
  • Example training scripts now collect environment information for easier debugging (#47).

v0.4.0

2 years ago

Complete refactor of kfac-pytorch

See Pull Requests #38, #40, #41, and #42.

DevOps changes

  • kfac requires torch>=1.8 and Python >=3.7
  • tox used for testing environments and automation
  • pre-commit updated. Major changes include prefer single-quotes, mypy, flake8 plugins
  • Switch to setup.cfg for package metadata and tox/flake8/mypy/coverage configuration
  • Add requirement-dev.txt that contains all dependencies needed to run the test suite

Code quality and testing

  • Complete type annotations for all code
    • Passes mypy
  • Separated testing utilities and unit tests into testing/ and tests/ respectively
  • Expansive unit testing suite that achieves 100% code coverage
  • New testing utilities include wrappers for simulating distributed environments and small test models
  • Added end-to-end training tests
    • small unit test (included in pytest) that checks loss decreases when training with K-FAC
    • MNIST integration test (not run with pytest) that verifies training with K-FAC achieves higher accuracy

kfac package improvements

  • KFAC layers separated from PyTorch module wrappers
    • KFACBaseLayer handles general K-FAC computations and communications for an arbitrary layer
    • ModuleHelper implementations provide a unified interface for interacting with supported PyTorch modules
      • Provides methods that return the size of the factors for the layer so the size of factors can be determined prior to training
      • Provides methods for getting the current gradients, updating the gradients, and computing the factors from the intermediate data
    • Each KFACBaseLayer instance is passed a ModuleHelper instance corresponding to the module in the model being preconditioned
  • Removed broken LSTM/RNN/Embedding layer support
  • Module registration utilities moved out of the preconditioner class and into the kfac.layers.register module
  • Replaced the comm module with the distributed module that provide a more exhaustive set of distributed communication utilties
    • All communication ops now return futures to the return value to allow more aggressive asynchronous communication
    • Added allreduce bucketing for factor allreduce (closes #32)
    • Added get_rank and get_world_size methods to enable K-FAC training when torch.distributed is not initialized
  • Enum types moved to enums module for convenience with type annotations
  • KFACBaseLayer is now agnostic of its placement
    • I.e., the KFACBaseLayer expects some other object to correctly execute its operations according to some placement strategy.
    • This change was made to allow other preconditioner implementations to use the math/communication operations provided by the KFACBaseLayer without being beholded to some placement strategy.
  • Created the BaseKFACPreconditioner which provides the minimal set of functionality for preconditioning with K-FAC
    • Provides state dict saving/loading, a step() method, hook registration to KFACBaseLayer, and some small bookkeeping functionality
    • The BaseKFACPreconditioner takes as input already registered KFACBaseLayers and an initialized WorkAssignment object.
    • This change was made to factor out the strategy specific details from the core preconditioning functions with the goal of having preconditioner implementations that interact more closely with other frameworks such as DeepSpeed
    • Added reset_batch() to clear the staged factors for the batch in the case of a bad batch of data (e.g., if the gradients overflowed)
    • memory_usage() includes the intermediate factors accumulated for the current batch
    • state_dict now includes K-FAC hyperparameters and steps in addition to factors
  • Added KFACPreconditioner, a subclass of BaseKFACPreconditioner, that implements the full functionality described in the KAISA paper.
  • New WorkAssignment interface that provides a schematic for the methods needed by BaseKFACPreconditioner to determine where to perform computations and communications
    • Added the KAISAAssignment implementation that provides the KAISA gradient worker fraction-based strategy
  • K-FAC hyperparameter schedule changes
    • Old inflexible KFACParamScheduler replace with a LambdaParamScheduler modeled on PyTorch's LambdaLRSchedule
    • BaseKFACPreconditioner can be passed functions the return the current K-FAC hyperparameters rather than static float values
  • All printing done via logging and KFACBasePreconditioner takes an optional loglevel parameter (closes #33)

Example script changes

  • Added examples/requirements.txt
  • Usage instructions for examples moved to examples/README.md
  • Update examples to use new kfac API
  • Examples are now properly type annotated
  • Removed non-working language model example

Other changes + future goals

  • Removed a lot of content from the README that should eventually be moved to a wiki
    • Previously, the README was quite verbose and made it difficult to find the important content
  • Updated README examples, publications, and development instructions
  • Future changes include:
    • GitHub actions for running code formatting, unit tests, integration tests
    • Issue/PR templates
    • Added badges to README
    • wiki

v0.3.2

2 years ago

README and Package dependency updates.

v0.3.1

2 years ago

v1.0.0

3 years ago