This sample shows how to create a private AKS cluster using Terraform and Azure DevOps
page_type: sample languages:
This sample shows how to create a private AKS clusters using:
In a private AKS cluster, the API server endpoint is not exposed via a public IP address. Hence, to manage the API server, you will need to use a virtual machine that has access to the AKS cluster's Azure Virtual Network (VNet). This sample deploys a jumpbox virtual machine in the hub virtual network peered with the virtual network that hosts the private AKS cluster. There are several options for establishing network connectivity to the private cluster.
Creating a virtual machine in the same virtual network as the AKS cluster or in a peered virtual network is the easiest option. Express Route and VPNs add costs and require additional networking complexity. Virtual network peering requires you to plan your network CIDR ranges to ensure there are no overlapping ranges. For more information, see Create a private Azure Kubernetes Service cluster. For more information on Azure Private Links, see What is Azure Private Link?
In addition, the sample creates a private endpoint to access all the managed services deployed by the Terraform modules via a private IP address:
NOTE
If you want to deploy a private AKS cluster using a public DNS zone to simplify the DNS resolution of the API Server to the private IP address of the private endpoint, you can use this project under my GitHub account or on Azure Quickstart Templates.
The following picture shows the high-level architecture created by the Terraform modules included in this sample:
The following picture provides a more detailed view of the infrastructure on Azure.
The architecture is composed of the following elements:
A private AKS cluster has the following limitations:
There are some requirements you need to complete before we can deploy Terraform modules using Azure DevOps.
When you deploy an Azure Firewall into a hub virtual network and your private AKS cluster in a spoke virtual network, and you want to use the Azure Firewall to control the egress traffic using network and application rule collections, you need to make sure to properly configure the ingress traffic to any public endpoint exposed by any service running on AKS to enter the system via one of the public IP addresses used by the Azure Firewall. In order to route the traffic of your AKS workloads to the Azure Firewall in the hub virtual network, you need to create and associate a route table to each subnet hosting the worker nodes of your cluster and create a user-defined route to forward the traffic for 0.0.0.0/0
CIDR to the private IP address of the Azure firewall and specify Virtual appliance
as next hop type
. For more information, see Tutorial: Deploy and configure Azure Firewall using the Azure portal.
When you introduce an Azure firewall to control the egress traffic from your private AKS cluster, you need to configure the internet traffic to go throught one of the public Ip address associated to the Azure Firewall in front of the Public Standard Load Balancer used by your AKS cluster. This is where the problem occurs. Packets arrive on the firewall's public IP address, but return to the firewall via the private IP address (using the default route). To avoid this problem, create an additional user-defined route for the firewall's public IP address as shown in the picture below. Packets going to the firewall's public IP address are routed via the Internet. This avoids taking the default route to the firewall's private IP address.
For more information, see:
In order to deploy Terraform modules to Azure you can use Azure DevOps CI/CD pipelines. Azure DevOps provides developer services for support teams to plan work, collaborate on code development, and build and deploy applications and infrastructure components using IaC technologies such as ARM Templates, Bicep, and Terraform.
Terraform stores state about your managed infrastructure and configuration in a special file called state file. This state is used by Terraform to map real-world resources to your configuration, keep track of metadata, and to improve performance for large infrastructures. Terraform state is used to reconcile deployed resources with Terraform configurations. When using Terraform to deploy Azure resources, the state allows Terraform to know what Azure resources to add, update, or delete. By default, Terraform state is stored in a local file named "terraform.tfstate", but it can also be stored remotely, which works better in a team environment. Storing the state in a local file isn't ideal for the following reasons:
Each Terraform configuration can specify a backend, which defines where and how operations are performed, where state snapshots are stored. The Azure Provider or azurerm can be used to configure infrastructure in Microsoft Azure using the Azure Resource Manager API's. Terraform provides a backend for the Azure Provider that allows to store the state as a Blob with the given Key within a given Blob Container inside a Blob Storage Account. This backend also supports state locking and consistency checking via native capabilities of the Azure Blob Storage. When using Azure DevOps to deploy services to a cloud environment, you should use this backend to store the state to a remote storage account. For more information on how to create to use a storage account to store remote Terraform state, state locking, and encryption at rest, see Store Terraform state in Azure Storage. Under the storage-account folder in this sample, you can find a Terraform module and bash script to deploy an Azure storage account where you can persist the Terraform state as a blob.
If you plan to use Azure DevOps, you can't use Azure DevOps Microsoft-hosted agents to deploy your workloads to a private AKS cluster as they don't have access to its API server. In order to deploy workloads to your private SAKS cluster you need to provision and use an Azure DevOps self-hosted agent in the same virtual network of your private AKS cluster or in peered virtual network. In this latter case, make sure to the create a virtual network link between the Private DNS Zone of the AKS cluster in the node resource group and the virtual network that hosts the Azure DevOps self-hosted agent. You can deploy a single Windows or Linux Azure DevOps agent using a virtual machine, or use a virtual machine scale set (VMSS). For more information, see Azure virtual machine scale set agents. For more information, see:
As an alternative, you can set up a self-hosted agent in Azure Pipelines to run inside a Windows Server Core (for Windows hosts), or Ubuntu container (for Linux hosts) with Docker and deploy it as a pod with one or multiple replicas in your private AKS cluster. If the subnets hosting the node pools of your private AKS cluster are configured to route the egress traffic to an Azure Firewall via a route table and user-defined route, make sure to create the proper application and network rules to allow the agent to access external sites to download and install tools like Docker, kubectl, Azure CLI, and Helm to the agent virtual machine. For more informations, see Run a self-hosted agent in Docker and Build and deploy Azure DevOps Pipeline Agent on AKS.
The cd-self-hosted-agent pipeline in this sample deploys a self-hosted Linux agent as an Ubuntu Linux virtual machine in the same virtual network hosting the private AKS cluster. The pipeline uses a Terraform module under the agent folder to deploy the virtual machine. Make sure to specify values for the variables in the cd-self-hosted-agent and in the agent.tfvars. The following picture represents the network topology of Azure DevOps and self-hosted agent.
The key-vault folder contains a bash script that uses Azure CLI to store the following data to an Azure Key Vault. This sensitive data will be used by Azure DevOps CD pipelines via variable groups. Variable groups store values and secrets that you want to pass into a YAML pipeline or make available across multiple pipelines. You can share use variables groups in multiple pipelines in the same project. You can Link an existing Azure key vault to a variable group and map selective vault secrets to the variable group. You can link an existing Azure Key Vault to a variable group and select which secrets you want to expose as variables in the variable group. For more information, see Link secrets from an Azure Key Vault.
The YAML pipelines in this sample use a variable group shown in the following picture:
The variable group is configured to use the following secrets from an existing Key Vault:
Variable | Description |
---|---|
terraformBackendContainerName | Name of the blob container holding the Terraform remote state |
terraformBackendResourceGroupName | Resource group name of the storage account that contains the Terraform remote state |
terraformBackendStorageAccountKey | Key of the storage account that contains the Terraform remote state |
terraformBackendStorageAccountName | Name of the storage account that contains the Terraform remote state |
sshPublicKey | Key used by Terraform to configure the SSH public key for the administrator user of the virtual machine and AKS worker nodes |
azureDevOpsUrl | Url of your Azure DevOps Organization (e.g. https://dev.azure.com/contoso) |
azureDevOpsPat | Personal access token used by an Azure DevOps self-hosted agent |
azureDevOpsAgentPoolName | Name of the agent pool of the Azure DevOps self-hosted agent |
You can use Azure DevOps YAML pipelines to deploy resources to the target environment. Pipelines are part of the same Git repo that contains the artifacts such as Terraform modules and scripts and as such pipelines can be versioned as any other file in the Git reppsitory. You can follow a pull-request process to ensure changes are verified and approved before being merged. The following picture shows the key concepts of an Azure DevOps pipeline.
For more information on Azure DevOps pipelines, see:
This sample provides three pipelines to deploy the infrastructure using Terraform modules, and one to undeploy the infrastructure.
Pipeline Name | Description |
---|---|
cd-validate-plan-apply-one-stage-tfvars | In Terraform, to set a large number of variables, you can specify their values in a variable definitions file (with a filename ending in either .tfvars or .tfvars.json ) and then specify that file on the command line with a -var-file parameter. For more information, see Input Variables. The sample contains three different .tfvars files under the tfvars folder. Each file contains a different value for each variable and can be used to deploy the same infrastructure to three distinct environment: production, staging, and test. |
cd-validate-plan-apply-one-stage-vars | This pipeline specifies variable values for Terraform plan and apply commands with the -var command line option. For more information, see Input Variables. |
cd-validate-plan-apply-separate-stages.yml | This pipeline is composed of three distinct stages for validate, plan, and apply. Each stage can be run separately. |
destroy-aks-deployment | This pipeline uses the destroy command to fully remove the resource group and all the Azure resources. |
cd-self-hosted-agent. | This pipeline can be used to deploy an Azure DevOps self-hosted agent as an Ubuntu virtual machine in the same subnet of the jump-box virtual machine. This deployment requires you to pass as a paramater the following information:
|
cd-redmine-via-helm | This pipeline can be used to deploy the Bitnami redmine project management web application using a Helm chart from ArtifactHub. This pipeline creates all the necessary Azure resources to front the Public IP of the Standard Load Balancer used by the service with the Azure Firewall in the Hub virtual network and expose the service with a hostname defined in an Azure public DNS zone. For more information, see: |
destroy-self-hosted-agent | This pipeline can be used to destroy the Azure DevOps self-hosted agent. |
destroy-redmine-via-helm | This pipeline can be used to uninstall the Bitnami redmine project management we application using a Helm chart and destroy all the Azure resources used to exposed the service via the Azure Firewall and the AKS cluster Standard Load Balancer. |
ci-test-web-app | This pipeline can be used to build the container image of the test web application and store it to an Azure Container Registry. In addition, the pipeline stores the Helm chart to another repository inside the registry. |
cd-test-web-app | This pipeline can be used to deploy the test web application using a Helm chart. This pipeline creates all the necessary Azure resources to front the Public IP of the Standard Load Balancer used by the service with the Azure Firewall in the Hub virtual network and expose the service with a hostname defined in an Azure public DNS zone. For more information, see: |
All the pipelines make use of the tasks of the Terraform extension. This extension provides the following components:
The Terraform tool installer task acquires a specified version of Terraform from the Internet or the tools cache and prepends it to the PATH of the Azure Pipelines Agent (hosted or private). This task can be used to change the version of Terraform used in subsequent tasks. Adding this task before the Terraform task in a build definition ensures you are using that task with the right Terraform version.
The Terraform task enables running Terraform commands as part of Azure Build and Release Pipelines providing support for the following Terraform commands
This extension is intended to run on Windows, Linux and MacOS agents. As an alternative, you can use the [Bash Task](https://docs.microsoft.com/en-us/azure/devops/pipelines/tasks/utility/bash? view=azure-devops) or PowerShell Task to install Terraform to the agent and run Terraform commands.
The following picture shows the resources deployed by the ARM template in the target resource group using one of the Azure DevOps pipelines in this reporitory.
The following picture shows the resources deployed by the ARM template in the MC resource group associated to the AKS cluster:
Resource definitions in the Terraform modules make use of the lifecycle meta-argument to customize the actions when Azure resources are changed outside of Terraform control. The ignore_changes argument is used to instruct Terraform to ignore updates to given resource properties such as tags. The Azure Firewall Policy resource definition contains a lifecycle block to prevent Terraform from fixing the resource when a rule collection or a single rule gets created, updated, or deleted. Likewise, the Azure Route Table contains a lifecycle block to prevent Terraform from fixing the resource when a user-defined route gets created, deleted, or updated. This allows to manage the DNAT, Application, and Network rules of an Azure Firewall Policy and the user-defined routes of an Azure Route Table outside of Terraform control.
The cd-redmine-via-helm pipeline shows how you can deploy a workload to a private AKS cluster using an Azure DevOps Pipelines that runs on a Self-hosted Agent. The sample deploys the Bitnami redmine project management web application using a public Helm chart. The following diagram shows the network topology of the sample:
The message flow can be described as follows:
kubernetes
public Standard Load Balancer of the AKS cluster in the node resource group.0.0.0.0/0
as address prefix and virtual appliance as next hope type.The cd-redmine-via-helm pipeline performs the following steps:
AksName_HelmReleaseNamespace_ServiceName
already exists in a give resource group.
AksName_HelmReleaseNamespace_ServiceName
already exists. If not, it creates a new Azure Firewall IP configuration using the az network firewall ip-config create command.DnatRules
already exists in the Azure Firewall Policy. If not, it creates a new a DNAT rule collection named DnatRules
under the DefaultDnatRuleCollectionGroup
rule collection group using the az network firewall policy rule-collection-group collection add-filter-collection command.AksName_HelmReleaseNamespace_ServiceName
public IP address used by the AksName_HelmReleaseNamespace_ServiceName
Azure Firewall IP configuration to the port 80 of the public IP address exposed by the redmine service on the Standard Load Balancer of the private AKS cluster (in a production environment you should use port 443 and HTTPS transport protocol instead of port 80 and unsecure HTTP transport protocol).AksName_HelmReleaseNamespace_ServiceName
directly to internet. This route is more specific than the user-defined route with CIDR 0.0.0.0/0
that routes the traffic from the subnets hosting AKS node pools to the private IP address of the Azure Firewall. This user-defined rule allows to properly send back response messages to the public IP address of the Azure Firewall Ip configuration used to expose the redmine Kubernetes service.Likewise, the destroy-redmine-via-helm pipeline shows how you can undeploy a workload to a private AKS cluster using an Azure DevOps Pipelines that runs on a Self-hosted Agent. The pipeline performs the following steps:
Gets the AKS cluster credentials using the az aks get-credentials command
Uses Helm CLI to uninstall the redmine release.
Uses kubectl to delete the Kubernetes namespace used by the release.
Uses the az network firewall policy rule-collection-group collection rule remove command to remove the DNAT rule called AksName_HelmReleaseNamespace_ServiceName
from the DnatRules
rule collection of the Azure Firewall Policy.
Uses the az network route-table route delete command to delete the user-defined route called AksName_HelmReleaseNamespace_ServiceName
from the Azure Route Table associated to the subnets hosting the node pools of the AKDS cluster.
Uses the az network firewall ip-config delete command to delete the Azure Firewall IP configuration called AksName_HelmReleaseNamespace_ServiceName
used to expose the redmine Kubernetes service.
Uses the az network public-ip delete command to destroy the Azure Public IP called AksName_HelmReleaseNamespace_ServiceName
used to expose the redmine Kubernetes service.
In a production environment where Azure Firewall is used to inspect, protect, and filter inbound internet traffic with Azure Firewall DNAT rules and Threat intelligence-based filtering, it's a good practice to use an API Gateway to expose web applications and REST APIs to the public internet.
Without an API gateway, client apps should send requests directly to the Kubernetes-hosted microservices and this would raises the following problems:
When running applications on AKS, you can use one of the following API Gateways:
In this scenario, an ASP.NET Core application is hosted as a service by an Azure Kubernetes Service cluster and fronted by an NGINX ingress controller. The application code is available under the source folder, while the Helm chart is available in the chart folder. The NGINX ingress controller is exposed via an internal load balancer with a private IP address in the spoke virtual network that hosts the AKS cluster. For more information, see Create an ingress controller to an internal virtual network in Azure Kubernetes Service (AKS). When you deploy an NGINX ingress controller or more in general a LoadBalancer
or ClusterIP
service with the service.beta.kubernetes.io/azure-load-balancer-internal: "true"
annotation in the metadata section, an internal standard load balancer called kubernetes-internal
gets created under the node resource group. For more information, see Use an internal load balancer with Azure Kubernetes Service (AKS). As shown in the picture below, the test web application is exposed via the Azure Firewall using a dedicated Azure public IP.
The message flow can be described as follows:
0.0.0.0/0
as address prefix and virtual appliance as next hope type.The ci-test-web-app pipeline performs the following steps:
docker build
and docker push
commands to build and publish the container image to Azure Container Registry.helm registry login
to login to Azure Container Registry via Helmhelm push
command to push the Helm chart as an Open Container Initiative (OCI) artifact.The cd-test-web-app pipeline performs the following steps:
LoadBalancer
or ClusterIP
service with the service.beta.kubernetes.io/azure-load-balancer-internal: "true"
annotation in the metadata section, an internal standard load balancer called kubernetes-internal
gets created under the node resource group. For more information, see Use an internal load balancer with Azure Kubernetes Service (AKS).AksName_HelmReleaseNamespace_ServiceName
already exists in a give resource group.
AksName_HelmReleaseNamespace_ServiceName
already exists. If not, it creates a new Azure Firewall IP configuration using the az network firewall ip-config create command.DnatRules
already exists in the Azure Firewall Policy. If not, it creates a new a DNAT rule collection named DnatRules
under the DefaultDnatRuleCollectionGroup
rule collection group using the az network firewall policy rule-collection-group collection add-filter-collection command.AksName_HelmReleaseNamespace_ServiceName
public IP address used by the AksName_HelmReleaseNamespace_ServiceName
Azure Firewall IP configuration to the port 80 of the private IP address exposed by the NGINX ingress controller on the Internal Load Balancer of the private AKS cluster. This rule is necessary to let Let's Encrypt to check that your are the owner of the domain specified in the ingress of your service when the cert-manager issues a certificate for SSL termination.AksName_HelmReleaseNamespace_ServiceName
public IP address used by the AksName_HelmReleaseNamespace_ServiceName
Azure Firewall IP configuration to the port 443 of the private IP address exposed by the NGINX ingress controller on the Internal Load Balancer of the private AKS cluster. This rule is used to translate and send incoming requests to the NGINX ingress controller.In a production environment, the endpoints publicly exposed by Kubernetes services running in a private AKS cluster should be exposed using an ingress controller such as NGINX Ingress Controller or Application Gateway Ingress Controller that provides advanced functionalities such as path based routing, load balancing, SSL termination, and web access firewall. For more information, see the following articles:
In the visio folder you can find the Visio document which contains the above diagrams.
If you open an ssh session to the Linux virtual machine via Azure Bastion and manually run the nslookup command using the fully-qualified name (FQDN) of the API server as a parameter, you should see an output like the the following:
NOTE: the Terraform module runs an Azure Custom Script Extension that installed the kubectl and Azure CLI on the jumpbox virtual machine.