PureML Save Abandoned

Developer platform for production ML.

Project README
PureML

The next-gen developer platform for Production ML.

PyPi   ^3.8   Coverage   License   Downloads


⏳ Status

This is an early alpha. The implementation might change between versions without warning. Please use at your own risk and pin to a specific version if you're relying on this for anything important!

⏱ Getting started

1. Installation

Manage versioning of datasets and models with our python SDK. Versioning is semantic and managed automatically. You can install and run PureML using pip.

Getting started is simple:

pip install pureml

If you are trying to manage versions of dataset all you have to do is use our decorator @dataset.

For managing models we have to use @model decorator. We have some other features built in such as data lineage and branching. For more information refer docs.


2. Pureml-eval : Testing & Quality Control

Step 1: Use an existing model for validation

import pureml

pureml.dataset.validation(“petdata:dev:v1”)

If you want to add a dataset as validation while saving it, you can use our @validation. This helps us capture not just one instance of this dataset but all the future variations without any intervention.


Step 2: Register validation dataset

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from pureml.decorators import dataset, validation

@validation
@dataset("petdata:dev")
def load_data(img_folder = "PetImages"):
  image_size = (180, 180)
  batch_size = 16
  train_ds,
  val_ds = tf.keras.utils.img_dataset_from_directory(
    img_folder,
    validation_split=0.2,
    subset="both",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
  )
  data_augmentation = keras.Sequential(
   [
     layers.RandomFlip("horizontal"),
     layers.RandomRotation(0.1),
   ]
  )
  train_ds = train_ds.map(
    lambda img, label: (data_augmentation(img), label),
    num_parallel_calls=tf.data.AUTOTUNE,
  )
  train_ds = train_ds.prefetch(tf.data.AUTOTUNE)
  val_ds = val_ds.prefetch(tf.data.AUTOTUNE)
  return train_ds, val_ds

Step 3: Predictor for model

We recommend utilizing our base predictor class when developing your model. By doing so, you can leverage the predict function in this class as your model's prediction function, which can be used in various stages such as testing, inference, and dockerization.

from pureml import BasePredictor
import pureml
import tensorflow as tf
from tensorflow import keras

class Predictor(BasePredictor):
  model_details = ['pet_classifier:dev:latest']
  input={'type': 'image'},
  output={'type': 'numpy ndarray' }

  def load_models(self):
    self.model = pureml.model.fetch(self.model_details)

  def predict(self, pred_img):
    pred_img = keras.preprocessing.image.img_to_array(
      pred_img
    )
    pred_img = tf.expand_dims(pred_img, 0)
    predictions = self.model.predict(pred_img)
    predictions = float(predictions[0])

    return predictions

Step 4: Evaluating your model is done as follows

import pureml

pureml.model.evaluate("pet_classifier:dev:v1", "petdata:dev:v1")

Lets see how PureML makes it easier to identify and correct any issues with its review feature and allows you to evaluate the quality of their data and the accuracy of their model.

For more detailed explanation, please visit our Documentation for more reference.

💻 Demo

PureML quick start demo in just 2 mins.

PureML Demo Video
Click the image to play video

Live demo

Build and run a PureML project to create data lineage and a model with our demo colab link.


📍 Main Features

Data Lineage Automatic generation of data lineage
Dataset Versioning Object-based Automatic Semantic Versioning of datasets
Model Versioning Object-based Automatic Semantic Versioning of models
Comparision Comparing different versions of models or datasets
Branches (Coming Soon) Separation between experimentation and production ready models using branches
Review (Coming Soon) Review and approve models, and datasets to production ready branch

🔮 Core design principles

Easy developer experience An intuitive open source package aimed to bridge the gaps in data science teams
Engineering best practices built-in Integrating PureML functionalities in your code doesnot disrupt your workflow
Object Versioning A reliable object versioning mechanism to track changes to your datasets, and models
Data is a first-class citizen Your data is secure. It will never leave your system.
Reduce Friction Have access to operations performed on data using data lineage without having to spend time on lengthy meetings

⚙ Core abstractions

These are the fundamental concepts that PureML uses to operate.

Project A data science project. This is where you store datasets, models, and their related objects. It is similar to a github repository with object storage.
Lineage Contains a series of transformations performed on data to generate a dataset.
Data Versioning Versioning of the data should be comprehensible to the user and should encapsulate the changes in the data, its creation mechanism, among others.
Model Versioning Versioning of the model should be comprehensible to the user and should encapuslate the changes in training data, model architecture, hyper parameters.
Fetch This functionality is used to fetch registered Models, and Datasets.

🤝 Why to get involved

Version control is much more common in software than in machine learning. So why isn’t everyone using Git? Git doesn’t work well with machine learning. It can’t handle large files, it can’t handle key/value metadata like metrics, and it can’t record information automatically from inside a training script.

GitHub wasn’t designed with data as a core project component. This along with a number of other differences between AI and more traditional software projects makes GitHub a bad fit for artificial intelligence, contributing to the reproducibility crisis in machine learning.

From manually tracking models to git based versioning systems that do not follow an intuitive versioning mechanism, there is no standardized way to track objects. Using these mechanisms, it is hard enough to track or get your model from a month ago running, let alone of a teammates!

We are trying to build a version control system for machine learning objects. A mechanism that is object dependant and intuitive for users.

Lets build this together. If you have faced this issue or have worked out a similar solution for yourself, please join us to help build a better system for everyone.


🧮 Tutorials


🐞 Reporting Bugs

To report any bugs you have faced while using PureML package, please

  1. Report it in Discord channel
  2. Open an issue

⌨ Contributing and Developing

Lets work together to improve the features for everyone. Here's step one for you to go through our Contributing Guide. We are already waiting for amazing ideas and features which you all have got.

Work with mutual respect. Please take a look at our public Roadmap here.


👨‍👩‍👧‍👦 Community

To get quick updates of feature releases of PureML, follow us on:

Twitter LinkedIn GitHub GitHub


📄 License

See the Apache-2.0 file for licensing information.

Open Source Agenda is not affiliated with "PureML" Project. README Source: PuremlHQ/PureML