ClusterDef is avaliable for TensorFlow Distribution Strategy API
Allow the amount of memory to be set separately for chief and estimator worker for TensorFlow Application
Specify the Yarn node label for job execution
Multi-threads upload the output
Allow the inter-result incremental upload
Support the regular matching for input path
Bug Fixes and Other Changes
The memory usage adjustment prompt is only displayed when the application finish status is successed.
v1.3
5 years ago
Release XLearning 1.3
Major Features And Improvements
Support the lightLDA, see examples/lightLDA for use
Support the xflow, see examples/xflow for use
By submitting the configuration parameter to support the user-defined environment variable settings
Setting the last worker as estimator role of the distribute TensorFlow application if the user set the tf-evaluator as true, see examples/tfEstimators for use
Define the single worker index to save the output by set the output-index
Port reservation mechanism optimization
Local data container allocation priority mechanism
Display resource application and usage information
ps role function expansion: more convenient metrics use information rendering and output output upload
Bug Fixes and Other Changes
Container waits for the remaining machine port addresses to be stuck in the process due to the failure of the Container in distributed mode
After the worker applies, the number of redundant applications is released, and the remove request operation is added
Application failed due to excessive environment variables too long of the input in PLACEHOLDER mode
Job execution judgment failure condition control
The status code returns incorrectly when the Container successfully exits
v1.2
6 years ago
Release XLearning 1.2
Major Features And Improvements
Client print the containers status information when the state changes
add the xlearning.localresource.timeout configuration to control the local resource download
support the VisualDL, see examples/mxnetVisualDL for use
support the local cache when input strategy is inputformat with epoch greater than 1
Bug Fixes and Other Changes
Add the exception handling for process of board and metrics
v1.1
6 years ago
Release XLearning 1.1
Major Features And Improvements
worker or ps memory auto scaled when application retry after failed
application exit as fail when container allocated exceed limit time
support the user's job jar using the --jars when application submit
add the cpu metrics on the web display. Note that if hadoop version lower than 2.6.4, please see the FAQ first.
support more distribute deep learning frameworks, such xgboost, LightGBM. Specific usage details please see the FAQ.
Bug Fixes and Other Changes
fix nullPoint at the AppController
more examples especially for the distribute mode application
FAQ provides detailed instructions on how to use the new features