A cookbook with the best practices to working with kubernetes.
In most cases, you learn to use platforms to meet the current business need or on standalone projects. The silver lining is the encouragement of learning and at some point this becomes knowledge, however, hands-on work can lead to cuts in paths that later cause a series of problems in productive environments. Therefore, the purpose of this guide is to help with the learning curve, helping to prepare a more stable, reliable and functional environment.
I don't intend to go into infrastructure best practices, but we can say that the standard 'paperwork', private VPC, multiple networks, firewall rules etc. also apply for a kubernetes cluster. The points that need to be highlighted are:
Network: Set aside a network for the cluster and make sure there is enough space for the pods and services. So find out how many pods per node you want to use and make calculations in CIDR based on that. It's worth noting that each cloud provider can have its own variation and rules, so check the documentation. Practical example: The GCP reserves double the IP for specific ranges based on the maximum pods per node, starting from 8 to 110. So, a direct translation is::
Private: Leave nodes and API restricted and/or inaccessible on the internet. So, use private clusters and, if your team is large enough, separate (project/account, private VPC...) them into different environments (development, production...).
Infrastructure as Code: Keep all infrastructure versioned and well-documented with tools like Terraform, CloudFormation or Ansible. For deployment management, I particularly think applications deserve a proper CD tool.
Use namespace profusely!
Simply put, the namespace is a way to organize objects, products and teams in Kubernetes. Namespaces provide granularity to separate teams and/or products, in large companies, it's quite common not to know all teams, as well as development models. Therefore, it's important to isolate and have the freedom to build a fast and secure development flow, respecting the limits. Of course, it's important to analyze each environment, in a small company, we don't need so much logical separation, because everyone knows each other and the cost has to make sense with the business.
Here is an example of how to do it (if possible, set quota for each namespace):
kubectl create namespace my-first-namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: my-first-namespace
spec:
hard:
requests.cpu: "10"
requests.memory: 10Gi
limits.cpu: "20"
limits.memory: 20Gi
Just as we want to separate teams and/or products into namespaces to "walk" freely, we also need to be responsible with security in the cluster. In other words, we don't want a security breach to happen that spreads all over the cluster, after all, behind the cluster we have baremetal susceptible to this. Apply all security fine tuning and, if possible, don't run container with root permission.
Build a table with mandatory labels to be used on objects deployed in the cluster. Despite being something simple and trivial, having descriptive labels helps in the maintenance, visualization and understanding of the resource. Therefore, create a best practices table with the recommended labels plus what your team understands is necessary.
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/name: mysql
app.kubernetes.io/instance: mysql-abcxzy
app.kubernetes.io/version: "5.7.21"
app.kubernetes.io/component: database
app.kubernetes.io/part-of: wordpress
app.kubernetes.io/managed-by: helm
app.kubernetes.io/created-by: controller-manager
In any environment, it's necessary to develop the application thinking about how to check if the health is good. In Kuberentes, liveliness is responsible for this. The probes constantly check the application's health, in case of failure the container is restarted and, consequently, stops serving requests. For most cases, an HTTP endpoint /health with a return of 200 OK is sufficient, however it is also possible to check by command or TCP.
Here is an example of how to do it:
apiVersion: v1
kind: Pod
metadata:
labels:
app: liveness
name: liveness-example
spec:
containers:
- name: liveness
image: gcr.io/google-samples/hello-app:1.0
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 3
periodSeconds: 2
Like Liveness, the readiness probe is responsible for controlling whether the application is ready to receive requests. In short, when the return is positive, it means that all the processes necessary for the application to work have already been carried out and it is ready to receive a request. For most cases, an HTTP endpoint /ready with a return of 200 OK is sufficient, however it is also possible to check by command or TCP.
Here is an example of how to do it:
apiVersion: v1
kind: Pod
metadata:
labels:
app: readiness
name: readiness-example
spec:
containers:
- name: readiness
image: gcr.io/google-samples/hello-app:1.0
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 3
periodSeconds: 1
Explicitly set resources on each Pod/Deployment, this makes kubernetes have great node and scale management. In practice, with well defined features, kubernetes will place applications on correct nodes, as well as control the scalability of node pools and applications, and prevent applications from being killed.
Defining a resource for an application is not a very simple task, however, with time assertiveness starts to appear. A good way is to use some load testing application, such as Locust, and stress the application and see how resources are being used. At the same time, it is also useful to use a VPA in recommendation mode to compare the hints with the defined final value.
One suggestion is to set the requested memory value equal to the limit, as for cpu, we can just set the requested value. This reason is simple, basically memory is a non-compressible resource!
Here is an example of how to do it:
apiVersion: v1
kind: Pod
metadata:
labels:
app: hello-resource
name: hello-resource
spec:
containers:
- name: hello-resource
image: gcr.io/google-samples/hello-app:1.0
ports:
- containerPort: 8080
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "64Mi"
livenessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 3
periodSeconds: 1
Choose the scalability model according to the application's characteristics. In kubernetes, it's very common to use a Horizontal Pod Autoscaler (HPA) or Vertical Pod Autoscaler (VPA).
For most cases, HPA is used with the trigger based on CPU usage. In this case, a good practice to define the target is:
(CPU-HB - safety)/(CPU-HB + growth)
Where:
A practical example is an application where we set the limit at 100% usage for cpu, a safety threshold of 15% with an expected traffic growth of 45% in 5 minutes:
(1 - 0.15)/(1 + 0.45) = 0.58
Here is an example of how to do it:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 58
Regarding ReplicaSet deployment strategies, there are:
Specifically about the means of deployments, we can highlight:
Blue-Green:
A blue/green deployment duplicates the environment with two parallel versions, in other words, two versions will be available. It's a great way to reduce service downtime and ensure all traffic is transferred immediately.
To take advantage of this strategy, you need to use extensions (recommended) such as service mesh or knative. However, for small environments, we can also do this manually as this reduces the complexity and again the cost has to make good business sense. The image below shows a way to do this manually, once the versions are online, we just need to switch traffic to the new version (green) with a load balancer/ingress.
Canary:
Canary deployment is a relevant way to test new versions without driving all the traffic right away. The idea is to separate a small part of customers for the new version and gradually increase it until the entire flow is validated or discarded.
As well as blue-green, it is also highly recommended to use other solutions such as HaProxy, Ngnix, Spinnaker. However, we can also do this something manually as follows:
kind: Service
apiVersion: v1
metadata:
name: my-app
spec:
sessionAffinity: ClientIP # It's important to secure the customer's session.
selector:
app: my-app
ports:
- protocol: TCP
port: 8080
targetPort: 8080
type: NodePort
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 9
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
version: 1.0
spec:
containers:
- name: my-app
image: gcr.io/google-samples/hello-app:1.0
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
selector:
matchLabels:
app: my-app
version: 2.0
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: gcr.io/google-samples/hello-app:2.0
In this example, we have a service that exposes two deployment versions (1.0 and 2.0), where the first has 9 instances and the second only 1, so it's expected that a large part of the traffic will be directed to the first version. Anyway, it's important to highlight that in order to guarantee the % of traffic, as well as the automated and smarter implementation, it's necessary to use other solutions like the ones mentioned above. Therefore, the example here is just a solution for specific cases that should not be taken as something definitive and ideal.
The kubernetes termination cycle is as follows:
Based on the cycle above, we need to ensure that our application is prepared to go through with all events and finish in a good manner without compromising the user experience. Therefore, it's very important to use the preStopp hook, sigterm and grace period so that we don't process any more requests and finish the ones that are in progress.
Here is an example of how to configure:
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-terminating
spec:
containers:
- name: lifecycle-terminating
image: random-image
terminationGracePeriodSeconds: 60
lifecycle:
preStop:
exec:
command: [ "/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done" ]
Develop a strong CI/CD to ensure all mandatory steps are followed, as well as smooth the deployment flow for all teams. In a way, we can put as mandatory features: