XLearning Versions Save

AI on Hadoop

v1.4

4 years ago

Release XLearning 1.4

Major Features And Improvements

  • Support the application running on the docker
  • Support the mpi application
  • ClusterDef is avaliable for TensorFlow Distribution Strategy API
  • Allow the amount of memory to be set separately for chief and estimator worker for TensorFlow Application
  • Specify the Yarn node label for job execution
  • Multi-threads upload the output
  • Allow the inter-result incremental upload
  • Support the regular matching for input path

Bug Fixes and Other Changes

  • The memory usage adjustment prompt is only displayed when the application finish status is successed.

v1.3

5 years ago

Release XLearning 1.3

Major Features And Improvements

  • Support the lightLDA, see examples/lightLDA for use
  • Support the xflow, see examples/xflow for use
  • By submitting the configuration parameter to support the user-defined environment variable settings
  • Setting the last worker as estimator role of the distribute TensorFlow application if the user set the tf-evaluator as true, see examples/tfEstimators for use
  • Define the single worker index to save the output by set the output-index
  • Port reservation mechanism optimization
  • Local data container allocation priority mechanism
  • Display resource application and usage information
  • ps role function expansion: more convenient metrics use information rendering and output output upload

Bug Fixes and Other Changes

  • Container waits for the remaining machine port addresses to be stuck in the process due to the failure of the Container in distributed mode
  • After the worker applies, the number of redundant applications is released, and the remove request operation is added
  • Application failed due to excessive environment variables too long of the input in PLACEHOLDER mode
  • Job execution judgment failure condition control
  • The status code returns incorrectly when the Container successfully exits

v1.2

6 years ago

Release XLearning 1.2

Major Features And Improvements

  • Client print the containers status information when the state changes
  • add the xlearning.localresource.timeout configuration to control the local resource download
  • support the VisualDL, see examples/mxnetVisualDL for use
  • support the local cache when input strategy is inputformat with epoch greater than 1

Bug Fixes and Other Changes

  • Add the exception handling for process of board and metrics

v1.1

6 years ago

Release XLearning 1.1

Major Features And Improvements

  • worker or ps memory auto scaled when application retry after failed
  • application exit as fail when container allocated exceed limit time
  • support the user's job jar using the --jars when application submit
  • add the cpu metrics on the web display. Note that if hadoop version lower than 2.6.4, please see the FAQ first.
  • support more distribute deep learning frameworks, such xgboost, LightGBM. Specific usage details please see the FAQ.

Bug Fixes and Other Changes

  • fix nullPoint at the AppController
  • more examples especially for the distribute mode application
  • FAQ provides detailed instructions on how to use the new features