Zero administration inference with AWS Lambda for 🤗
Hugging Face Transformers is a popular open-source project that provides pre-trained, natural language processing (NLP) models for a wide variety of use cases. Customers with minimal machine learning experience can use pre-trained models to enhance their applications quickly using NLP. This includes tasks such as text classification, language translation, summarization, and question answering - to name a few.
Our solution consists of an AWS Cloud Development Kit (AWS CDK) script that automatically provisions container image-based Lambda functions that perform ML inference using pre-trained Hugging Face models. This solution also includes Amazon Elastic File System (EFS) storage that is attached to the Lambda functions to cache the pre-trained models and reduce inference latency.
In this architectural diagram:
The solution includes Python scripts for two common NLP use cases:
The following is required to run this example:
git clone <https://github.com/aws-samples/zero-administration-inference-with-aws-lambda-for-hugging-face.git>
pip install -r requirements.txt
cdk bootstrap
cdk deploy
The code is organized using the following structure:
├── inference
│ ├── Dockerfile
│ ├── sentiment.py
│ └── summarization.py
├── app.py
└── ...
The inference
directory contains:
Dockerfile
used to build a custom image to be able to run PyTorch Hugging Face inference using Lambda functionsThe sentiment.py
script shows how to use a Hugging Face Transformers
model:
import json
from transformers import pipeline
nlp = pipeline("sentiment-analysis")
def handler(event, context):
response = {
"statusCode": 200,
"body": nlp(event['text'])[0]
}
return response
For each Python script in the inference directory, the CDK generates a Lambda function backed by a container image and a Python inference script.
The CDK script is named app.py
in the solution's repository. The
beginning of the script creates a virtual private cloud (VPC).
vpc = ec2.Vpc(self, 'Vpc', max_azs=2)
Next, it creates the EFS file system and an access point in EFS for the cached model:
fs = efs.FileSystem(self, 'FileSystem',
vpc=vpc,
removal_policy=RemovalPolicy.DESTROY)
access_point = fs.add_access_point('MLAccessPoint',
create_acl=efs.Acl(
owner_gid='1001', owner_uid='1001', permissions='750'),
path="/export/models",
posix_user=efs.PosixUser(gid="1001", uid="1001"))
It iterates through the Python files in the inference directory:
docker_folder = os.path.dirname(os.path.realpath(__file__)) + "/inference"
pathlist = Path(docker_folder).rglob('*.py')
for path in pathlist:
And then creates the Lambda function that serves the inference requests:
base = os.path.basename(path)
filename = os.path.splitext(base)[0]
# Lambda Function from docker image
function = lambda_.DockerImageFunction(
self, filename,
code=lambda_.DockerImageCode.from_image_asset(docker_folder,
cmd=[filename+".handler"]),
memory_size=8096,
timeout=Duration.seconds(600),
vpc=vpc,
filesystem=lambda_.FileSystem.from_efs_access_point(
access_point, '/mnt/hf_models_cache'),
environment={
"TRANSFORMERS_CACHE": "/mnt/hf_models_cache"},
)
Optionally, you can add more models by adding Python scripts in the
inference directory. For example, add the following code in a file
called translate-en2fr.py
:
import json
from transformers
import pipeline
en_fr_translator = pipeline('translation_en_to_fr')
def handler(event, context):
response = {
"statusCode": 200,
"body": en_fr_translator(event['text'])[0]
}
return response
Then run:
$ cdk synth
$ cdk deploy
This creates a new endpoint to perform English to French translation.
After you are finished experimenting with this project, run cdk destroy
to remove all of the associated infrastructure.
This library is licensed under the MIT No Attribution License. See the LICENSE file. Disclaimer: Deploying the demo applications contained in this repository will potentially cause your AWS Account to be billed for services.