Sagemaker pipeline for AWS Summit New York
This is a sample solution using a SageMaker pipeline. This implementation could be useful for any organization trying to automate their use of Machine Learning. With an implementation like this, any inference is easy, and can simply be queried through an endpoint to receive the output of the model’s inference, tests can be automatically performed for QA, and ML code can be quickly updated to match needs.
AWS CloudFormation – AWS::CloudFormation::Interface sets parameter group metadata.
AWS CodeBuild – AWS::CodeBuild::Project uploads the project source code stored in GitHub to an S3 bucket.
AWS CodePipeline – AWS::CodePipeline::Pipeline – Easiest to create the Pipeline in the AWS Console, then use the get-pipeline CLI command to get the configuration in JSON to be placed into the CloudFormation Template.
AWS EC2 – Instance type specified in AWS::SageMaker::EndpointConfig
AWS SageMaker – AWS::SageMaker::Model – here the algorithm to be used by SageMaker is specified, as well as the source code to be submitted to once the model has been created;
AWS::SageMaker::Endpoint – this is the endpoint from which you can make requests;
AWS::SageMaker::EndpointConfig– here we specify key configurations for the endpoint, including the type of EC2 instance used, and can specify if we would like multiple endpoint models, e.g. for A-B testing, and similarly how much/what traffic we will direct to this endpoint.
AWS IAM – AWS::IAM::Role – Make sure to specify only the necessary permissions for each role.
AWS SNS – AWS::SNS::Topic – sends a confirmation to the email specified as a parameter.
AWS S3 – AWS::S3::Bucket – stores the model data and necessary artifacts
This section outlines cost considerations for running a SageMaker Pipeline. Running the default pipeline for 24 hours will cost roughly $1.76 including one training run, or $1.56 per day once the model is already trained.
ml.p2.xlarge
for training and the ml.t2.medium
instance for hosting. The cost for training with this instance is $1.26 an hour and $0.065 per hour for hosting with this instance. For more information, see Amazon SageMaker Pricing.Create your AWS account at http://aws.amazon.com by following the instructions on the site.
Create your token at GitHub's Token Settings, making sure to select scopes of repo and admin:repo_hook. After clicking Generate Token, make sure to save your OAuth Token in a secure location. The token will not be shown again.
Click on the Launch Stack button below to launch the CloudFormation Stack to set up the SageMaker Pipeline. Before Launching, ensure all architecture, configuration, etc. is set as desired.
Stack Assumptions: The pipeline stack assumes the following conditions, and may not function properly if they are not met:
us-east-1
).NOTE: The URL for Launch Stack is automatically generated through a pipeline in one of Stelligent's AWS accounts.
You can launch the same stack using the AWS CLI. Here's an example:
aws cloudformation create-stack --stack-name YOURSTACKNAME --template-body file:///home/ec2-user/environment/sagemaker-pipeline/CodePipeline/pipeline.yaml --parameters ParameterKey=Email,ParameterValue="[email protected]" ParameterKey=GitHubToken,ParameterValue="YOURGITHUBTOKEN12345ab1234234" --capabilities CAPABILITY_NAMED_IAM
Once the deployment has passed automated QA testing, before proceeding with the production stage it sends an email notification (via SNS) for manual approval. At this time, you may run any additional tests on the endpoint before approving it to be deployed into production.
Parameters | Description |
---|---|
The email where CodePipeline will send SNS notifications. | |
GitHubToken | A Secret OAuthToken with access to the GitHub repo. |
GitHubUser | GitHub Username. |
Repo | The name (not URL) of the GitHub repository to pull from. |
Branch | The name (not URL) of the GitHub repository’s branch to use. |
To launch a endpoint using the provide chalice project all you have to do is and then run from the Chalice directory in this repo. Once piece that will change depending on your endpoint is the value in .
Note this will create resource that you will manually have to delete. It will create a ApiGateway, IAM role, and a Lambda function.
For more details checkout this blog, it's also the source for this code: https://medium.com/@julsimon/using-chalice-to-serve-sagemaker-predictions-a2015c02b033
After following the deployment steps, your pipeline should be up and running with a production SageMaker Endpoint that you can query to make inferences with your newly trained model!