cloud.gov infrastructure provisioning and deployment
This repository holds the terraform configuration (and BOSH vars and ops-files) to bootstrap our infrastructure.
Be sure to read the internal developer documentation ("cg-provision") for non-public information about using this repository.
Some Git hooks that may be useful when updating this code are provided in the .githooks
directory. To install these Git hooks so that they run automatically, use the provided make
command:
make add-githooks
Our Terraform code is organized by two concepts, with two corresponding directories.
terraform/modules
.
modules/stack/base
and modules/stack/spoke
.terraform/stacks
.As an example, if we wanted to write terraform code to deploy several CloudFront distributions in front of three load balancers in an environment, we could:
cloudfront
module that declares a CloudFront distribution, a Shield Advanced resource to protect it, and an Access Control List (ACL) association between the distribution and an ACL. (The ACL itself is not declared in the module.) It could take an origin, a list of domains, and an ACL ARN as variables.cloudfront
stack uses the cloudfront
module three times, once for each load balancer in the environment. It would pass an external domain and load balancer domain to each module. It could also declare a single ACL for the environment and pass its ARN to each cloudfront
module. The stack could take an environment name as a variable.cloudfront
stack and pass the environment name as a variable.In the future, we would like to add a third concept: An entire runtime environment. An environment would combine multiple stacks to represent the entire cloud.gov runtime stack. This collection of resources could be deployed as a single unit to a new AWS region or multiple times in the same region.
The main
stack is a template that is used to provision the production,
staging, and development "environments."
The regionalmasterbosh
stack contains our masterbosh for a given region, which deploys the tooling BOSH for that region.
The tooling BOSH then deploys the BOSH directors in the main stacks across all accounts in that region.
The tooling
stack is the same as the regionalmasterbosh
stack, but has some extras from before we started going multi-region
and multi-account:
The external
and dns
stacks are both outside of GovCloud (commercial AWS).
As mentioned above, we have four categories of environment:
main
- this is the thing we're actually after. It's the pieces that directly
support the platform components. There should be several of these across multiple
AWS accountstooling
- this is used to support the things in the main
platform - our CI
system, managment tools such as Nessus, etc.external
- this manages some things that don't (or historically didn't) exist
in govcloud (really just cloudfront and the users, etc, to support it). There's
one of these per main
environmentdns
- this manages route53. There's exactly one of these, although we really
should split it out to one per main
+ one for tooling
To allow the tooling
environment to manage the main
environment, there's a
tooling-terraform
role associated with each main
environment, which has an
assumerole policy allowing access by concourse workers in the tooling
account.
To add a new main
environment, see the README here
The bosh
directory contains vars and opsfiles for use by the BOSH directors.
The Concourse worker VMs must have AWS access to create and apply Terraform plans. How they are given that access depends on the partition being changed.
You can determine how a failing Concourse container is configured by hijacking it. Connect to the container (see fly hijack --help
) and run aws configure list
to see the current configuration.
The Concourse worker VMs are associated with an IAM role with read-write access to GovCloud resources. The AWS SDK in the Concourse containers is automatically configured to fetch credentials from the Instance metadata service. No further configuration is necessary - note that no access keys are passed to GovCloud jobs in pipeline.yml.
AWS IAM roles documentation: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html
Each Concourse job that manages AWS Commercial resources must override the Concourse worker's IAM role. The jobs set the AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and AWS_DEFAULT_REGION
environment variables to do this. Environment variables have higher precedence in the AWS SDK, so they are used instead of the IAM role. No further configuration in Terraform is necessary.
The DNS stack is a special case because it must read state from GovCloud but read and write resources and state to Commercial. AWS IAM users cannot have cross-partition permissions, so the job must use two separate AWS accounts (one for each partition).
To achieve this, the Concourse jobs pass an access key to a Commercial IAM user as a TF_VAR instead of using the standard AWS_*
environment variables. (Setting AWS_
variables would make the AWS SDK use them by default, and we want it to continue using the GovCloud IAM role by default.)
The IAM role and TF_VAR_
credentials are used as follows:
terraform init
command is run with the Commercial credentials using this script. This configures the s3 backend for the DNS stack to be set up in the Commercial account.terraform_remote_state
data blocks for each GovCloud s3 state object are configured with the GovCloud region. Because they are accessed using Terraform's initialization process, but separately from the initial terraform init
, they are not passed the Commercial credentials. Without any credentials set explicitly, the AWS SDK uses the GovCloud IAM role.Since IaaS is a shared resource (we don't have the money or time to provision
entire stacks for each developer), we never apply this configuration manually.
Instead, all execution is done through the Concourse pipeline, which is
configured to first run terraform plan
, and then wait for manual triggering
before running terraform apply
.
If you want to make infrastructure changes:
You may see access_key_id_prev
and aws_key_id_prev
as outputs for our iam
modules. These are used for cred
rotation
modules/stack/spoke
composes modules/stack/base
and some of the VPC
modules. It's not entirely clear why, and why the VPC modules weren't simply
included in base
(removing spoke
altogether).