Resource scheduling and cluster management for AI
Marketplace related update
Alert manager
Webportal
Others
Marketplace related update
New job submission page
Advanced
with More info
and places it under each section to improve user experience.Know Issue: Tensorboard tool is not implemented in the new submission page yet. If you need to use it, please use the old version.
Alert system enhancement
kill-low-efficiency-job-alert
email templates #5384Support sort by completionTime
for get job list API #5347
Deployment
paictl.py
#5321 #5167. (Warning config.yaml
need to be added for this feature. refer: https://github.com/microsoft/pai/blob/master/docs/manual/cluster-admin/how-to-add-and-remove-nodes.md#pull--modify-cluster-settings)Before upgrade, we recommend you to check this issue first.
Job protocol update: Add prerequisites #5145
Marketplace related update
Introduce an optional docker cache in cluster #5290
A regular GPU utilization report can be set up for admins #5281, #5294, #5324, #5331
pai-bearer-token
in the alert-manager
section. The old configuration still works but is deprecated. If you have configured pai-bearer-token
of alert-manager
, please refer to #5331 to modify the previous configuration.Users can save frequently-used SSH publish keys on the profile page #5223
Improve log experience #5271 #5272
Reduce ansible logs when deploy #5305
Improve Web Portal Experience
Create a new page for yaml editor #5172
Marketplace related update
Support different types of computing hardware #5138
Deployment process refinement
master.csv
+ worker.csv
-> layout.yaml
config.yaml
, layout.yaml
under quick-start folder, remove all the argument parse logiclayout.yaml
#5179Log manager
updateUserGroupList
API issue (#5121)