Kubedl Versions Save

Run your deep learning workloads on Kubernetes more easily and efficiently.

1 year ago

Features

In this release, we have brought some major features that helps cluster admins to manage workloads easier and run more effciently.

Enable data caching across different jobs and decouple lifecycle between job and cache system.
Introduce job-coordinator to schedule and admit jobs in multi tenants queues.
Introduce a new workload named ElasticBatch job, which abstracts offline inference jobs.

1 year ago

1 year ago

support distributed communication style of torch-elastic both on normal container network/host network.
upgrade vendor to k8s 1.21 to improve performance and other optimizations.

2 years ago

Version v0.4.1 is a stable release, which introduces a lot of stability fixes, API improvements and code optimizations.

Introduce modelPath, description, imageTag to Model/ModelVersion specification.
Introduce CacheBackend to integrate with cloud native distributed cache systems for training jobs.
Introduce Notebook to enable juypter virtual environment capability.

2 years ago

3 years ago

Change the CRD definition to the training.kubedl.io group

3 years ago

3 years ago

v0.1.0 is the first formally release version of KubeDL, including a list of stable features:

Support running prevalent ML/DL workloads in a single operator.
Support submitting a job with artifacts synced from remote source such as github without rebuilding the image.
Support advanced scheduling features such as gang scheduling with pluggable backend schedulers.
Instrumented with unified prometheus metrics for different types of DL jobs, such as job launch delay, current number of pending/running jobs.
Support job metadata persistency with a pluggable storage backend such as Mysql.
Enable specific workload type according to the installed CRDs automatically or through the startup flags explicitly.
A modular architecture that can be easily extended for more types of DL/ML workloads with shared libraries, see how to add a custom job workload.

The official docker.io/kubedl/kubedl:v0.1.0 is hosted under dockerhub