Kubernetes operator that prescales cluster nodes to ensure a cronjobs start exactly on time
Please be aware that this code base has been marked as ARCHIVED amd is not actively maintained.
Prior to archving, the code in this project was tested against a matrix of Kubernetes builds for each pull request (see "CI" build for details). The code was also built against the latest version of Kubernetes each week (see "Weekly CI" build for details).
The main purpose of this project is to provide a mechanism whereby cronjobs can be run on auto-scaling clusters, and ensure that the cluster is scaled up to their desired size prior to the time at which the CronJob
workload needs to begin.
For a workload to start at 16:30 exactly, a node in the cluster has to be available and warm at that time. The PrescaledCronJob
CRD and operator will ensure that a cronjob gets scheduled n minutes earlier to force the cluster to prepare a node, and then a custom init container will run, blocking the workload running until the correct time.
PreScaledCronJob
; and an Operator that will reconcile said kind.PreScaledCronJob
is created in a cluster, this Operator will create an associated CronJob
object that will execute X minutes prior to the real workload and ensure any necessary agent pool machines are "warmed up".
CronJob
schedule can be found in the Primed Cronjob Schedules
documentation here
CronJob
is associated to the PreScaledCronJob
using the Kubernetes OwnerReference
mechanism. Thus enabling us to automatically delete the CronJob
when the PreScaledCronJob
resource is deleted. For more information please check out the Kubernetes documentation here
PreScaledCronJob
objects can check for changes on their associated CronJob
objects via a generated hash. If this hash does not match that which the PreScaledCronJob
expects, we update the CronJob
spec.CronJob
uses an initContainer
spec to spin-wait thus warming up the agent pool and forcing it to scale up to our desired state ahead of the real workload. For more information please check out the Init Container documentation here
In order to ensure a smooth deployment process, for both local and remote deployments, we recommend you use the dev container provided within this repo.
This container provides you with all the assemblies and cli tools required to perform the actions below
For more information about dev containers, please refer to https://code.visualstudio.com/docs/remote/containers
If you are using the development container you have the option of deploying the Operator into a local test Kubernetes Cluster provided by the KIND toolset
To deploy to a local K8s/Kind instance:
make deploy-kind
make docker-build-initcontainer docker-push-initcontainer INIT_IMG=<some-registry>/initcontainer:<tag>
make docker-build docker-push IMG=<some-registry>/prescaledcronjoboperator:<tag> INIT_IMG=<some-registry>/initcontainer:<tag>
make deploy-cluster IMG=<some-registry>/prescaledcronjoboperator:<tag> INIT_IMG=<some-registry>/initcontainer:<tag>
kubectl get all -n psc-system
A sample yaml
is provided for you in the config folder.
kubectl apply -f config/samples/psc_v1alpha1_prescaledcronjob.yaml
kubectl get prescaledcronjobs -A
kubectl get cronjobs -A
NAMESPACE NAME AGE
psc-system prescaledcronjob-sample 30s
NAMESPACE NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
psc-system autogen-prescaledcronjob-sample 45,15 * * * * False 0 <none> 39s
If you do not see the ouput above then please review the debugging documentation. Deleting the PrescaledCronJob
resource will clean up the CronJob
automatically.
Before the actual cronjob kicks off, an init container pre-warms the cluster so all nodes are immediately available when the cronjob is intended to run.
There are two ways to define this primer schedule:
warmUpTimeMins
under the PreScaledCronJob spec. This will generate a primed cronjob schedule based on your original schedule and the amount of minutes you want to pre-warm your cluster. This can be defined as follows (An example yaml is provided in config/samples/psc_v1alpha1_prescaledcronjob.yaml
):kind: PreScaledCronJob
spec:
warmUpTimeMins: 5
cronJob:
spec:
schedule: "5/30 * * * *"
primerSchedule
under the PreScaledCronJob. The pre-defined primer schedule below results in the exact same pre-warming and cron schedule as the schedule above. (An example yaml is provided in config/samples/psc_v1alpha1_prescaledcronjob_primerschedule.yaml
)kind: PreScaledCronJob
spec:
primerSchedule: "*/30 * * * *"
cronJob:
spec:
schedule: "5/30 * * * *"
Please review the debugging documentation
Please review the monitoring documentation
This repo contains 3 types of tests, which are logically separated:
go test
.
make unit-tests
.make kind-tests
make kind-long-tests
make fmt
to automatically format your codeMany samples in the Kubernetes docs show requests
and limits
of a container using plain integer values, such as:
requests:
nvidia.com/gpu: 1
The generated yaml schema definition for the PrescaledCronJob
just sets the validation for these properties to string
s, rather than what they should be (integer
| string
with a fixed regex format). This means we need to apply a patch (/config/crd/patches/resource-type-patch.yaml
) to override the autogenerated type. This information may come in handy in future if other edge cases are found.