Workshop exercise materials for re:Invent 2017 - SID 341: Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection
In this README you will find instructions and pointers to the resources used for the workshop exercises. In this workshop, there are two exercises:
After the setup steps below, there are instructions provided for all of the hands-on exercises, clean-up instructions to tear down the CloudFormation stack, and following that a full walkthrough guide on how to complete the exercises.
This repository contains the following files that will be used for this workshop:
Before getting started, you will need the following:
teardown.sh
scriptThe CloudFormation template creates 2 sets of resources for the following purposes:
First, log in to your AWS account using the IAM user with administrator access.
For this workshop, we will be working within the Canada Central (ca-central-1) region. To switch regions, click the region dropdown in the top right of the window and select Canada (Central).
To easily deploy the CloudFormation stack in the Canada (Central) region, please browse to the following stack launch URL:
That stack launch URL uses a copy of the cloudformation.yaml template that is contained in an S3 bucket, which is the same as the one contained in this code repository.
In this exercise, you will examine CloudTrail logs in your account, which will include generated activity from the CloudFormation stack you deployed earlier. The goal of this exercise is to familiarize with the structure of the CloudTrail logs, their format, and content.
userIdentity
block, along with some of the more interesting fields like sourceIPAddress
, eventSource
, and eventName
. Depending on the event, you may also see some requestParameters
and responseElements
present.In this exercise, you will build a simple CloudTrail log analyzer and detection engine using a Python-based AWS Lambda function.
Some core functionality of the Lambda function has already been provided for you and takes care of the following:
handler
entry point function)get_log_file_location
and get_records
functions).for
loops in the handler
that pass individual records to each analysis function in the analysis_functions
list).Here are steps you can follow to begin the exercise:
Look at each of the functions contained in the analysis_functions
tuple. Each of these functions gets passed every CloudTrail record.
The print_short_record
analysis function is already defined. All it does is print out a shortened form of each CloudTrail record by reading certain fields in the CloudTrail record. Observe how these fields are accessed since you will need to do something similar in the other analysis functions.
The CloudTrail User Guide has a reference on log events that explains in detail what each of the fields in a CloudTrail record mean. You will likely find this to be helpful as you start to analyze the events:
http://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference.html
To see what the abbreviated CloudTrail records being printed by print_short_record
look like, go to the Monitoring tab for the Lambda function and click on View logs in CloudWatch. Once in CloudWatch, click on the Log Stream and you will see the output from every invocation of the Lambda function.
Note that the sourceIPAddress
value for actions performed by the CloudFormation template is "cloudformation.amazonaws.com". Also, notice that some actions have a region of "us-east-1" rather than "ca-central-1". This is because some services, such as IAM (iam.amazonaws.com), are global services and report region as "us-east-1" (trivia: that was the first AWS region!).
The Analysis Lambda function will be invoked about every 5 minutes when new CloudTrail log files get created, so you don't need to set up a test event if you're okay with waiting until the next invocation happens automatically.
However, if you'd like to set up a test event to be able to Save and Test (or Test) the function as you make changes and have it run immediately, you can use the following template for a simplified version of the S3 Put test event to do so:
{
"Records": [
{
"s3": {
"bucket": {
"name": "CLOUDTRAIL_BUCKET_NAME"
},
"object": {
"key": "CLOUDTRAIL_LOG_FILE_PATH"
}
}
}
]
}
"CLOUDTRAIL_BUCKET_NAME"
in the sample event with the name of this CloudTrail bucket.AWSLogs/012345678900/CloudTrail/ca-central-1/2017/11/29/
."CLOUDTRAIL_LOG_FILE_PATH"
in the sample event with this path.Above the Analysis Lambda function you should now see the test event selected. By clicking the Test button your function will be immediately invoked with that event, which will load and analyze the same CloudTrail log file every time it is run.
The deletion of logs is an action that should normally not occur in most accounts, and may indicate an attacker trying to cover tracks.
In this exercise, we will focus on API calls that delete logs in CloudWatch Logs or CloudTrail. You need to implement code for the deleting_logs
function to check for those API calls by looking for API events whose name starts with "Delete"
. For the purposes of this exercise, you can focus this check on just CloudWatch Logs (logs.amazonaws.com) and CloudTrail (cloudtrail.amazonaws.com).
Use what you've learned about looking at CloudTrail records so far to identify the fields you will need to use, and borrow code patterns from print_short_record
as needed.
When a matching record is found, print it out using print_short_record
and return True.
You will also at this point want to comment out or remove the print_short_record
function in the analysis_functions
list so that only records matching the check will be printed rather than all records:
analysis_functions = (
# print_short_record,
deleting_logs,
instance_creds_used_outside_ec2,
)
Expand the check to look for an API call that stops logging on a trail in CloudTrail (without deleting the trail).
Curious about what some of the log deletion API calls do? Here are some docs to check out:
The usage of session credentials from a running EC2 instance from outside of that instance is a potential indicator that an attacker may have obtained leaked or stolen credentials.
Implement code for the instance_creds_used_outside_ec2
function to check for API calls made using EC2 instance credentials "off-instance", or in other words, credentails that have been removed from the instance and are being used outside of it.
Examine each record to look for ones that satisfy the following 3 properties for API calls made using instance credentials:
'AS'
instead of 'AK'
For #3, a regular expression pattern called instance_identifier_arn_pattern
has been predefined for you to use. You can use it with Python's match
function that returns True if the pattern matches and False otherwise:
arn_matches = instance_identifier_arn_pattern.match(arn)
if arn_matches:
print('ARN appears to contain an instance identifier!')
When a matching record is found, print it out using print_short_record
and return True.
Hint: The event you should find in this phase is an s3:GetBucketPolicy API call. Please note that if you are looking for this event in the CloudTrail console's Event History, you will not see it there because it is a read-only action and the Event History only shows create, modify, or delete actions.
Curious about what instance credentials are? See this documentation for more:
The CloudFormation template created a CloudWatch alarm with the following properties:
This means that if you put metric data to CloudWatch using MetricName AnomaliesDetected
, Namespace AWS/reInvent2017/SID341
, and a Value of 1
, the CloudWatch alarm will fire by going into ALARM
state.
There is a pre-defined CloudWatch Boto client in the handler
that you can use for this:
cloudwatch = session.client('cloudwatch')
When the CloudWatch alarm fires, if you had set up the NotificationEmailAddress
parameter earlier when creating the CloudFormation stack, you will receive an email about the alarm firing. You can also browse to the CloudWatch console and click on Alarms in the left menu to view the alarm and its current state.
These metrics, and the accompanying alarm, are quite simple, but it is straightforward to adjust this to have, for instance, separate alarms for each analysis function, or set different triggering conditions for the alarm. You could have separate alarms for each analysis function by changing the MetricName
accordingly, and updating the cloudformation.yaml
template to create alarms for each metric.
For more information, please visit the following CloudWatch User Guide pages:
To delete the CloudFormation stack, a Bash script, teardown.sh
, has been provided. Run as follows:
./teardown.sh
Q: The script is not running.
A: It may need to be made executable. Do chmod +x teardown.sh
to fix this.
Q: I'm using a different AWS CLI profile than the default.
A: The script supports a flag to specify a CLI profile that is configured in your ~/.aws/config
file. Do ./teardown.sh -p PROFILE_NAME
.
To delete the CloudFormation stack manually, it's a two-step process, since you have to delete the S3 buckets first otherwise the CloudFormation console's delete stack operation will report a "DELETE FAILED" error since the S3 bucket still have contents.
This walkthrough will give full details on how to complete each phase of the automated detection exercise, including finished code snippets that can be copied and pasted into the Lambda function.
To solve this phase, you need to examine each record to look for ones with an eventSource
of logs.amazonaws.com
or cloudtrail.amazonaws.com
and an eventName
that starts with the string "Delete"
.
The completed function will look like the following:
def deleting_logs(record):
"""
Checks for API calls that delete logs in CloudWatch Logs or CloudTrail.
:return: True if record matches, False otherwise
"""
event_source = record['eventSource']
event_name = record['eventName']
if event_source in ['logs.amazonaws.com', 'cloudtrail.amazonaws.com']:
if event_name.startswith('Delete'):
print_short_record(record)
return True
return False
You will also at this point want to comment out or remove the print_short_record
function in the analysis_functions
list so that only records matching the check will be printed rather than all records:
analysis_functions = (
# print_short_record,
deleting_logs,
instance_creds_used_outside_ec2,
)
For the bonus round, add the following check to the function:
if event_source == 'cloudtrail.amazonaws.com' and event_name == 'StopLogging':
print_short_record(record)
return True
To solve this phase, you must implement checks of the 3 properties that were specified:
userIdentity.type
is AssumedRole
userIdentity.accessKey
begins with string 'AS'
instead of 'AK'
userIdentity.arn
matches the instance_identifier_arn_pattern
Please note that the check in #3 is imperfect beacuse we don't know whether that is or was a valid instance identifier for an instance in this account (or any account, for that matter), since the username could technically be set to something that looks like an instance identifier, which could lead to a false positive. However, for our purposes in this workshop it will suffice. Improving this check will be left as extra credit for the reader ;)
The completed function will look like the following:
instance_identifier_arn_pattern = re.compile(r'(.*?)/i\-[a-zA-Z0-9]{8,}$')
def instance_creds_used_outside_ec2(record):
"""
Check for usage of EC2 instance credentials from outside the EC2 service.
:return: True if record matches, False otherwise
"""
identity = record['userIdentity']
# First, check that the role type is assumed role
role_type = identity.get('type', '')
if role_type != 'AssumedRole':
return False
# Next, check that the AKID starts with 'AS'
access_key = identity.get('accessKeyId', '')
if not access_key.startswith('AS'):
return False
# Finally, check that the end of the user ARN is an instance identifier
arn = identity.get('arn', '')
if instance_identifier_arn_pattern.match(arn):
print_short_record(record)
return True
return False
You can use the following code snippet at the end of your Lambda function handler, inside the for
loop that iterates over each record:
if func(record):
cloudwatch.put_metric_data(
Namespace='AWS/reInvent2017/SID341',
MetricData=[{
'MetricName': 'AnomaliesDetected',
'Value': 1,
'Unit': 'Count',
}]
)
It uses the pre-defined CloudWatch Boto client to put metric data to CloudWatch for a specific metric that the CloudWatch alarm is monitoring.