Develop and deploy a real-time feature pipeline in Python, using Bytewax 🐝 and Hopsworks Feature Store.
Machine Learning models are as good as the input features you feed at training and inference time.
And for many real-world applications, like financial trading, these features must be generated and served as fast as possible, so the ML system produces the best predictions possible.
Generating and serving features fast is what a real-time feature pipeline does.
Python alone is not a language designed for speed 🐢, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually writen with Java-based tools like Apache Spark or Apache Flink.
However, things are changing fast with the emergence of Rust 🦀 and libraries like Bytewax 🐝 that expose a pure Python API on top of a highly-efficient language like Rust.
So you get the best from both worlds.
So you can develop highly performant and scalable real-time pipelines, leveraging top-notch Python libraries.
In this repository you will learn how to develop and deploy a real-time feature pipeline in 100% Python that
You will also build a dashboard using Bokeh and Streamlit to visualize the final features, in real-time.
Create a Python virtual environment with the project dependencies with
$ make init
Set your Hopsworks API key and project name variables in set_environment_variables_template.sh
, rename the file and run it (sign up for free at hospworks.ai to get these 2 values)
$ . ./set_environment_variables.sh
To run the feature pipeline locally
$ make run
To spin up a Streamlit dashboard to visualize the data in real-time
$ make frontend
To run the feature pipeline on an AWS EC2 instance you first need to have an AWS account and the aws-cli
tool installed in your local system. Then run the following command to deploy your feature pipeline onto an EC2 instance
$ make deploy
Feature pipeline logs are send to AWS CloudWatch. Run the following command to grab the URL where you can see the logs.
$ make info
To shutdown the feature pipeline on AWS and free resources run
$ make undeploy
I am preparing a new hands-on tutorial where you will learn to buld a complete real-time ML system, from A to Z.
➡️ Subscribe to The Real-World ML Newsletter to be notified when the tutorial is out.