A Kubernetes Scheduler Extender to provide gang scheduling support for Spark on Kubernetes
This project is no longer maintained.
k8s-spark-scheduler-extender
is a Kubernetes Scheduler Extender that is designed to provide gang scheduling capabilities for running Apache Spark on Kubernetes.
Running Spark applications at scale on Kubernetes with the default kube-scheduler
is prone to resource starvation and oversubscription. Naively scheduling driver pods can occupy space that should be reserved for their executors. Using k8s-spark-scheduler-extender
guarantees that a driver will only be scheduled if there is space in the cluster for all of its executors. It can also guarantee scheduling order for drivers, with respect to their creation timestamp.
Requirements:
Spark scheduler extender is a Witchcraft server, and uses Godel for testing and building. It is meant to be deployed with a new kube-scheduler
instance, running alongside the default scheduler. This way, non-spark pods can continue to be scheduled by the default scheduler, and opt-in pods are scheduled using the spark-sdcheduler.
To set up the scheduler extender as a new scheduler named spark-scheduler
, run:
kubectl apply -f examples/extender.yml
This will create a new service account, a cluster binding for permissions, a config map and a deployment, all under namespace spark
. It is worth noting that this example sets up the new scheduler with a super user. k8s-spark-scheduler-extender
groups nodes in the cluster with a label specified in its configuration. Nodes that this scheduler will consider should have this label set. FIFO order is preserved for pods that have a node affinity or a node selector set for the same instance-group
label. The given example configuration sets this label as instance-group
.
Refer to Spark's website for documentation on running Spark with Kubernetes. To schedule a spark application using spark-scheduler, you must apply the following metadata to driver and executor pods.
apiVersion: v1
kind: Pod
metadata:
labels:
spark-app-id: my-custom-id
annotations:
spark-driver-cpu: 1
spark-driver-mem: 1Gi
spark-executor-cpu: 2
spark-executor-mem: 4Gi
spark-executor-count: 8
spec:
schedulerName: spark-scheduler
apiVersion: v1
kind: Pod
metadata:
labels:
spark-app-id: my-custom-id
spec:
schedulerName: spark-scheduler
As of f6cc354d83, spark supports specifying pod templates for driver and executors. Although spark configuration can also be used to apply label and annotations, the pod template feature in spark is the only way of setting schedulerName. To apply the above overrides, you should save them as files and set these configuration overrides:
"spark.kubernetes.driver.podTemplateFile": "/path/to/driver.template",
"spark.kubernetes.executor.podTemplateFile": "/path/to/executor.template"
k8s-spark-scheduler-extender
also supports running Spark applications in dynamic allocation mode. You can find more information about how to configure Spark to make use of dynamic allocation in the Spark documentation.
To inform k8s-spark-scheduler-extender
that you are running an application with dynamic allocation enabled, you should omit setting the spark-executor-count
annotation on the driver pod, and instead set the following three annotations:
spark-dynamic-allocation-enabled
: "true"spark-dynamic-allocation-min-executor-count
: minimum number of executors to always reserve resources for. Should be equal to the spark.dynamicAllocation.minExecutors
value you set in the Spark configurationspark-dynamic-allocation-max-executor-count
: maximum number of executors to allow your application to request at a given time. Should be equal to the spark.dynamicAllocation.maxExecutors
value you set in the Spark configurationIf dynamic allocation is enabled, k8s-spark-scheduler-extender
will guarantee that your application will only get scheduled if the driver and executors until the minimum executor count fit to the cluster. Executors over the minimum are not reserved for, and are only scheduled if there is capacity to do so when they are requested by the application.
k8s-spark-scheduler-extender
is a witchcraft service, and supports configuration options detailed in the github documentation. Additional configuration options are:
distribute-evenly
and tightly-pack
, the former being the default. They differ on how they distribute the executors, distribute-evenly
round-robin's available nodes, whereas tightly-pack
fills one node before moving to the next.Use ./godelw docker build
to build an image using the Dockerfile template. Built image will use the default configuration. Deployment created by kubectl apply -f examples/extender.yml
can be used to iterate locally.
Use ./examples/submit-test-spark-app.sh <id> <executor-count> <driver-cpu> <driver-mem> <driver-nvidia-gpus> <executor-cpu> <executor-mem> <executor-nvidia-gpus>
to mock a spark application launch. Created pods will have a node selector for instance-group: main
, so desired nodes in the cluster should be modified to have this label set.
Use ./godelw verify
to run tests and style checks
The team welcomes contributions! To make changes:
This project is made available under the Apache 2.0 License.