A Cloud Native Batch System (Project under CNCF)
In Kubernetes cluster with multiple schedulers, different kinds of workloads should be mapped to certain scheduler sometimes. For example, K8s native workloads such as deployment in namespace kube-system
are mapped to default-sheduler while AI and Big data jobs are mapped to Volcano. This feature aims to implements that automaticallty. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/multi-scheduler.md. (https://github.com/volcano-sh/volcano/pull/1576, https://github.com/volcano-sh/volcano/pull/1521, @huone1 @william-wang )
In order to make full use of scarce resources such as GPU, one solution is to bind them with other resources as shares. For example, it is common to see a lot of CPU-intensive workloads are scheduled to GPU nodes. When GPU-intensive workloads come, they cannot be scheduled because of lack of CPU or Memory in GPU nodes. If workloads requiring both GPU, CPU, Memory at certatin range can be scheduled to GPU nodes first, it is possible to make full use of GPUs. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/proportional.md. (https://github.com/volcano-sh/volcano/pull/1527, @king-jingxiang )
As to CPU-intensive workloads especially in AI, Big Data and HPC fileds, It will result in a significant performance improvement if CPU NUMA is enabled. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/numa-aware.md. (https://github.com/volcano-sh/volcano/pull/1493, @huone1 )
In this release, A kind of framework for Volcano stress test is provided. (https://github.com/volcano-sh/volcano/pull/1516, @rudeigerc )
fix diff function(https://github.com/volcano-sh/volcano/pull/1712, @Thor-wl )
make 'existing pods anti-affinity rules' work(https://github.com/volcano-sh/volcano/pull/1668, @eggiter)
add setting MinResources to pg for normal pod(https://github.com/volcano-sh/volcano/pull/1666, @huone1 )
fix OOM will occur if pod info is sync before node info(https://github.com/volcano-sh/volcano/pull/1662, @huone1 )
fix addmission parsing bug(https://github.com/volcano-sh/volcano/pull/1656, @hacker-qian)
fix overused judgement when deal with allocate and proportion(https://github.com/volcano-sh/volcano/pull/1637, @Thor-wl )
fix bugs about resource comparison(https://github.com/volcano-sh/volcano/pull/1628, @Thor-wl )
reset task.NodeName after call DeallocateFunc(https://github.com/volcano-sh/volcano/pull/1618, @merryzhou )
adds the missing field to the Webhook(https://github.com/volcano-sh/volcano/pull/1615, @hwdef )
fix a problem about equivalence ecache feature (https://github.com/volcano-sh/volcano/pull/1593, @huone1 )
func FeasibleNodesToFind to use list with a centain order(https://github.com/volcano-sh/volcano/pull/1574, @lowang-bh )
fix static check warning in pkg folder(https://github.com/volcano-sh/volcano/pull/1552, @hwdef )
fix bug in predicates plugin(https://github.com/volcano-sh/volcano/pull/1547, @hacker-qian)
fix(scheduler): reclaim action minus and comparison bug(https://github.com/volcano-sh/volcano/pull/1540, @shinytang6 )
fix resource comparasion bug in task topology(https://github.com/volcano-sh/volcano/pull/1546, @Thor-wl )
fix select wrong queue when proportion
is disable(https://github.com/volcano-sh/volcano/pull/1497, @zen-xu )
fix the error for make verify(https://github.com/volcano-sh/volcano/pull/1508, @huone1 )
Just as the minAvailable at job level, minAvailable at task level will regard replicases at the same task as group and decide whether to schedule pods at the task. Only when the minAvailable is meet will the pods will be scheduled together. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/task-minavailable.md. (https://github.com/volcano-sh/volcano/pull/1459, @shinytang6 )
Support to configure the least number of pods belonging to the job. It's useful to mark the status of job when minsuccess reaches or not and accelerates the job status judgement. (https://github.com/volcano-sh/volcano/pull/1384, @zen-xu )
In big data processing jobs like Tensorflow & Spark, tasks transmitted a large amount of data between each other, causing transmission delay took a large proportion in job execution time. So task topology plugin was proposed to modify scheduling strategy according to transmission topology inside a job, so as to cut the data amount to be transmitted between nodes, decrease transmission delay proportion in job execution time, and improve resource utilization. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/task-topology-plugin.md. (https://github.com/volcano-sh/volcano/pull/1353, @jiangkaihua )
Separate apis from volcano.sh/volcanosh. Any downstream projects can introduce the CRD clientset/lister/informer with the K8s version it needs. (https://github.com/volcano-sh/apis, @Thor-wl )
update-development-yaml
in Makefile(https://github.com/volcano-sh/volcano/pull/1386, @zen-xu )Job
(https://github.com/volcano-sh/volcano/pull/1385, @zen-xu )make generate-yaml
(https://github.com/volcano-sh/volcano/pull/1374, @zen-xu )bindingTasks
to judge whether adding node to the snapshot.(https://github.com/volcano-sh/volcano/pull/1388, @zen-xu )TDM(Time Division Multiplexing) plugin aims to provide a mechanism for nodes, which can be used for K8S and other cluster(such as Yarn) in separate time.(https://github.com/volcano-sh/volcano/pull/1269, @yahaa )
SLA(Service Level Agreement) plugin works for job resource reservation feature. Users can set SLA for jobs to ensure specified jobs to be scheduled in time. It provides an better design and implementation for job resource reservation. (https://github.com/volcano-sh/volcano/pull/1303, @jiangkaihua )
SUPPORT_PLUGINS
(https://github.com/volcano-sh/volcano/pull/1266, @zen-xu )musl-gcc
build image, because vc-scheduler
default image is alpine
, which only has musl-libc
(https://github.com/volcano-sh/volcano/pull/1225, @zen-xu)Separate plugin implementation with scheduler. Support implement custom plugins and load to vc-scheduler dynamically.(https://github.com/volcano-sh/volcano/pull/1218, @zen-xu)
Support configure MaxRequeueNum in config file of vc-scheduler, default to 15 times.(https://github.com/volcano-sh/volcano/pull/1087, @shinytang6)
Give the design of CPU careful regulation in socket level.(https://github.com/volcano-sh/volcano/pull/1051, @ProgramerGu)
Monitor compontent added support display some base metrics about Volcano.(https://github.com/volcano-sh/volcano/pull/1066, @alcorj-mizar)
Reserve resource for pending job which is at highest priority among pending jobs and waits for a long time. The big job is recognized by scheduler automatically.(https://github.com/volcano-sh/volcano/pull/1044, @Thor-wl)
Hierarchical dominant resource fairness is configured with a weighted tree, such that each node in the tree has a positive weight value.(https://github.com/volcano-sh/volcano/pull/928, @ggaaooppeenngg)
OnJobDelete
(https://github.com/volcano-sh/volcano/pull/1005, @hzxuzhonghu)no-root
flag(https://github.com/volcano-sh/volcano/pull/996, @hzxuzhonghu)This is to support go get