An advanced monitoring system for Amazon Aurora PostgreSQL that is completely serverless, based on AWS Lambda and Amazon CloudWatch
Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud, that combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. This GitHub project provides an advanced monitoring system for Amazon Aurora Postgres that is completely serverless, based on AWS Lambda and Amazon CloudWatch. A serverless Lambda function runs on a schedule, connects to the configured Aurora Postgres cluster, and generates CloudWatch custom metrics that are visualized on a custom dashboard. You can create alarms based on these metrics to gain additional visibility and improve incident response capabilities for your database.
A list of metrics emitted by this solution is provided below, along with a description of what to watch for in relation to each metric.
This project is a fork of the amazon-redshift-monitoring solution. Many thanks to the developers of the original project for providing a solid, extensible foundation on which to build.
Follow the steps below to prepare your environment to run the Aurora Postgres Enhanced Monitoring solution:
Create a user in Aurora Postgres that will be used by the Lambda function to connect to the database and run the monitoring queries. This user should have access to the tables in the "pg_catalog" schema. For example:
GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO tamreporting
Create a KMS key in the same region as the Aurora Postgres Cluster. Take note of the key ARN. (Documentation)
Encrypt the password of the user with the KMS key and take note of the Base64-encoded result. An example of encrypting a plaintext value with a customer-managed KMS key using the awscli is shown below.
aws kms encrypt \
--key-id kd83nf8w-ls84-8fm2-90df-ks975nvls01a \
--plaintext MyPassword \
--output text
If your Aurora Postgres instance is not publicly accessible, you will need to use the VPC template below, and the additional requirements below will apply:
This solution can be automatically deployed using a Serverless Application Model (SAM) in CloudFormation.
The SAM stack will create:
Use the links below to walk through the CloudFormation deployment model.
You must supply the following parameters for the template:
The password for the Aurora Postgres user must be encrypted with a customer-managed KMS key, and plaintext passwords are NOT supported. Furthermore, Lambda Environment Variables can also be encrypted within the Lambda service using KMS.
For the VPC Template, you must also supply the following parameters:
If you wish to deploy manually, you can use the prebuilt zip in the dist folder, or you can build it yourself. We've included a build script for bash shell that will create a zip file which you can upload into AWS Lambda.
There are two options for passing configuration parameters to the function.
This option allows you to send the configuration as part of the Scheduled Event, which means you can support multiple clusters from a single Lambda function. This option will override any Environment variables that you've configured. An example event looks like:
{
"DatabaseUser": "postgres",
"EncryptedPassword": "AQICAHjCCIl/+A5oJxOzGSOe2e3e5QoGkTztWuyBXcpjvd6n7gHzY5qHG601eFMvAptttlNJAAAAcTBvBgkqhkiG9w0BBwagYjBgAgEAMFsGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMSOzWxmUpBW8lgEwnAgEQgC7srWfyJD/btzn46ru8E3/Z+f4cHqT/3pcoVkkYNi9wGQuE8GcS/5fB94lRLlmW",
"ClusterName": "database-1",
"ClusterEndpoint": "database-1.cluster-dghfvwtswhfx.us-east-1.rds.amazonaws.com",
"ClusterPort": "5432",
"DatabaseName": "mydatabase",
"ScheduleFrequency": "5 minutes",
"Debug": "True"
}
You can also use Lambda Environment Variables for configuration, including:
"Environment": {
"Variables": {
"DatabaseUser": "database user name",
"EncryptedPassword": "KMS encrypted password",
"ClusterName": "display name for cloudwatch metrics",
"ClusterEndpoint": "the writer endpoint of the cluster",
"ClusterPort": "database part number",
"DatabaseName": "database name",
"ScheduleFrequency": "5 minutes",
"Debug": "True"
}
}
If you are rebuilding the function or deploying it manually instead of using the provided CloudFormation templates, perform the following steps:
Create a Role for the Lambda function. At a minimum, this role should have the policy AWSLambdaVPCAccessExecutionRole
to be able to run in a VPC, and the custom policy below to access the KMS key:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1458213823000",
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": [
"<kms key arn>"
]
},
{
"Sid": "Stmt1458218675000",
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData"
],
"Resource": [
"*"
]
}
]
}
Download and install dependencies
pip install -r requirements.txt -t .
Assemble and compress the Lambda function package:
Option 1: Standard build
./build.sh
Option 2: Build with user query extensions
./build.sh --include-user-queries
Please note the labelled version in Github does not include any user queries.
Create a Lambda function, and upload the zip file generated by the build script. some of the parameters of the function are:
lambda_function.lambda_handler
Add an Event Source to the Lambda function with a Scheduled Event, running with the same frequency you configured in the Lambda function.
After a period of time, you can check your CloudWatch metrics and create alarms. You can also view the AuroraPostgresAdvancedMonitoringDashboard that was created as part of the CloudFormation script to visualize the custom metrics.
If the metrics are not being emitted properly after deploying the solution, send a test event to the Lambda function with a JSON payload similar to the one shown above where Debug is set to True (update parameters to match your environment). This will log additional details to help you uncover the cause of the failure. The most common issues are 1) Lack of network connectivity between the Lambda function amd the KMS and Cloudwatch APIs, and 2) Insufficient permissions on the database. Review the Pre-Requisites for solutions to these issues.
This solution emits the following custom metrics for the connected Aurora Postgres instance.
SELECT client_addr, application_name,COUNT(*) from pg_stat_activity GROUP BY 1,2;
. Make sure this count doesn't reach max_connections.vacuumdb --all --freeze --jobs=2 --echo --analyze
.The published CloudWatch metrics are all configured in a JSON file called monitoring-queries.json
. These are queries that have been built by Amazon Aurora Postgres subject matter experts, and they provide detailed metrics about the operation of your cluster.
If you would like to create your own queries to be instrumented via Amazon CloudWatch, these can be added into the user-queries.json file. The file is a JSON array, with each query having the following structure:
{
"query": "my select query that returns a numeric value",
"name":"MyQuery",
"unit":"Count | Seconds | Milliseconds | Whatever",
"type":"value"
}
Copyright 2020-2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.