Developer platform for production ML.
This is an early alpha. The implementation might change between versions without warning. Please use at your own risk and pin to a specific version if you're relying on this for anything important!
Manage versioning of datasets and models with our python SDK. Versioning is semantic and managed automatically. You can install and run PureML using pip
.
Getting started is simple:
pip install pureml
If you are trying to manage versions of dataset all you have to do is use our decorator @dataset
.
For managing models we have to use @model
decorator. We have some other features built in such as data lineage and branching. For more information refer docs.
import pureml
pureml.dataset.validation(“petdata:dev:v1”)
If you want to add a dataset as validation while saving it, you can use our @validation
. This helps us capture not just one instance of this dataset but all the future variations without any intervention.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from pureml.decorators import dataset, validation
@validation
@dataset("petdata:dev")
def load_data(img_folder = "PetImages"):
image_size = (180, 180)
batch_size = 16
train_ds,
val_ds = tf.keras.utils.img_dataset_from_directory(
img_folder,
validation_split=0.2,
subset="both",
seed=1337,
image_size=image_size,
batch_size=batch_size,
)
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
]
)
train_ds = train_ds.map(
lambda img, label: (data_augmentation(img), label),
num_parallel_calls=tf.data.AUTOTUNE,
)
train_ds = train_ds.prefetch(tf.data.AUTOTUNE)
val_ds = val_ds.prefetch(tf.data.AUTOTUNE)
return train_ds, val_ds
We recommend utilizing our base predictor class when developing your model. By doing so, you can leverage the predict function in this class as your model's prediction function, which can be used in various stages such as testing, inference, and dockerization.
from pureml import BasePredictor
import pureml
import tensorflow as tf
from tensorflow import keras
class Predictor(BasePredictor):
model_details = ['pet_classifier:dev:latest']
input={'type': 'image'},
output={'type': 'numpy ndarray' }
def load_models(self):
self.model = pureml.model.fetch(self.model_details)
def predict(self, pred_img):
pred_img = keras.preprocessing.image.img_to_array(
pred_img
)
pred_img = tf.expand_dims(pred_img, 0)
predictions = self.model.predict(pred_img)
predictions = float(predictions[0])
return predictions
import pureml
pureml.model.evaluate("pet_classifier:dev:v1", "petdata:dev:v1")
Lets see how PureML makes it easier to identify and correct any issues with its review feature and allows you to evaluate the quality of their data and the accuracy of their model.
For more detailed explanation, please visit our Documentation for more reference.
PureML quick start demo in just 2 mins.
Build and run a PureML project to create data lineage and a model with our demo colab link.
Data Lineage | Automatic generation of data lineage |
Dataset Versioning | Object-based Automatic Semantic Versioning of datasets |
Model Versioning | Object-based Automatic Semantic Versioning of models |
Comparision | Comparing different versions of models or datasets |
Branches (Coming Soon) | Separation between experimentation and production ready models using branches |
Review (Coming Soon) | Review and approve models, and datasets to production ready branch |
Easy developer experience | An intuitive open source package aimed to bridge the gaps in data science teams |
Engineering best practices built-in | Integrating PureML functionalities in your code doesnot disrupt your workflow |
Object Versioning | A reliable object versioning mechanism to track changes to your datasets, and models |
Data is a first-class citizen | Your data is secure. It will never leave your system. |
Reduce Friction | Have access to operations performed on data using data lineage without having to spend time on lengthy meetings |
These are the fundamental concepts that PureML uses to operate.
Project | A data science project. This is where you store datasets, models, and their related objects. It is similar to a github repository with object storage. |
Lineage | Contains a series of transformations performed on data to generate a dataset. |
Data Versioning | Versioning of the data should be comprehensible to the user and should encapsulate the changes in the data, its creation mechanism, among others. |
Model Versioning | Versioning of the model should be comprehensible to the user and should encapuslate the changes in training data, model architecture, hyper parameters. |
Fetch | This functionality is used to fetch registered Models, and Datasets. |
Version control is much more common in software than in machine learning. So why isn’t everyone using Git? Git doesn’t work well with machine learning. It can’t handle large files, it can’t handle key/value metadata like metrics, and it can’t record information automatically from inside a training script.
GitHub wasn’t designed with data as a core project component. This along with a number of other differences between AI and more traditional software projects makes GitHub a bad fit for artificial intelligence, contributing to the reproducibility crisis in machine learning.
From manually tracking models to git based versioning systems that do not follow an intuitive versioning mechanism, there is no standardized way to track objects. Using these mechanisms, it is hard enough to track or get your model from a month ago running, let alone of a teammates!
We are trying to build a version control system for machine learning objects. A mechanism that is object dependant and intuitive for users.
Lets build this together. If you have faced this issue or have worked out a similar solution for yourself, please join us to help build a better system for everyone.
To report any bugs you have faced while using PureML package, please
Lets work together to improve the features for everyone. Here's step one for you to go through our Contributing Guide. We are already waiting for amazing ideas and features which you all have got.
Work with mutual respect. Please take a look at our public Roadmap here.
To get quick updates of feature releases of PureML, follow us on:
See the Apache-2.0 file for licensing information.