Images to inference with no labeling (use foundation models to train supervised models)
Autodistill uses big, slower foundation models to train small, faster supervised models. Using autodistill
, you can go from unlabeled images to inference on a custom model running at the edge with no human intervention in between.
Currently, autodistill
supports vision tasks like object detection and instance segmentation, but in the future it can be expanded to support language (and other) models.
Tutorial | Docs | Supported Models | Contribute |
---|
Here are example predictions of a Target Model detecting milk bottles and bottlecaps after being trained on an auto-labeled dataset using Autodistill (see the Autodistill YouTube video for a full walkthrough):
To use autodistill
, you input unlabeled data into a Base Model which uses an Ontology to label a Dataset that is used to train a Target Model which outputs a Distilled Model fine-tuned to perform a specific Task.
Autodistill defines several basic primitives:
autodistill
pipeline must match for them to be compatible with each other. Object Detection and Instance Segmentation are currently supported through the detection
task. classification
support will be added soon.CaptionOntology
which prompts a Base Model with text captions and maps them to class names. Other Ontologies may, for instance, use a CLIP vector or example images instead of a text caption.autodistill
process; it's a set of weights fine-tuned for your task that can be deployed to get predictions.Human labeling is one of the biggest barriers to broad adoption of computer vision. It can take thousands of hours to craft a dataset suitable for training a production model. The process of distillation for training supervised models is not new, in fact, traditional human labeling is just another form of distillation from an extremely capable Base Model (the human brain π§ ).
Foundation models know a lot about a lot, but for production we need models that know a lot about a little.
As foundation models get better and better they will increasingly be able to augment or replace humans in the labeling process. We need tools for steering, utilizing, and comparing these models. Additionally, these foundation models are big, expensive, and often gated behind private APIs. For many production use-cases, we need models that can run cheaply and in realtime at the edge.
Autodistill's Base Models can already create datasets for many common use-cases (and through creative prompting and few-shotting we can expand their utility to many more), but they're not perfect yet. There's still a lot of work to do; this is just the beginning and we'd love your help testing and expanding the capabilities of the system!
Autodistill is modular. You'll need to install the autodistill
package (which defines the interfaces for the above concepts) along with Base Model and Target Model plugins (which implement specific models).
By packaging these separately as plugins, dependency and licensing incompatibilities are minimized and new models can be implemented and maintained by anyone.
Example:
pip install autodistill autodistill-grounded-sam autodistill-yolov8
You can also clone the project from GitHub for local development:
git clone https://github.com/roboflow/autodistill
cd autodistill
pip install -e .
Additional Base and Target models are enumerated below.
See the demo Notebook for a quick introduction to autodistill
. This notebook walks through building a milk container detection model with no labeling.
Below, we have condensed key parts of the notebook for a quick introduction to autodistill
.
For this example, we'll show how to distill GroundedSAM into a small YOLOv8 model using autodistill-grounded-sam and autodistill-yolov8.
pip install autodistill autodistill-grounded-sam autodistill-yolov8
from autodistill_grounded_sam import GroundedSAM
from autodistill.detection import CaptionOntology
from autodistill_yolov8 import YOLOv8
# define an ontology to map class names to our GroundingDINO prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated annotations
base_model = GroundedSAM(ontology=CaptionOntology({"shipping container": "container"}))
# label all images in a folder called `context_images`
base_model.label(
input_folder="./images",
output_folder="./dataset"
)
target_model = YOLOv8("yolov8n.pt")
target_model.train("./dataset/data.yaml", epochs=200)
# run inference on the new model
pred = target_model.predict("./dataset/valid/your-image.jpg", confidence=0.5)
print(pred)
# optional: upload your model to Roboflow for deployment
from roboflow import Roboflow
rf = Roboflow(api_key="API_KEY")
project = rf.workspace().project("PROJECT_ID")
project.version(DATASET_VERSION).deploy(model_type="yolov8", model_path=f"./runs/detect/train/")
To plot the annotations for a single image using autodistill
, you can use the code below. This code is helpful to visualize the annotations generated by your base model (i.e. GroundedSAM) and the results from your target model (i.e. YOLOv8).
import supervision as sv
import cv2
img_path = "./images/your-image.jpeg"
image = cv2.imread(img_path)
detections = base_model.predict(img_path)
# annotate image with detections
box_annotator = sv.BoxAnnotator()
labels = [
f"{base_model.ontology.classes()[class_id]} {confidence:0.2f}"
for _, _, confidence, class_id, _ in detections
]
annotated_frame = box_annotator.annotate(
scene=image.copy(), detections=detections, labels=labels
)
sv.plot_image(annotated_frame, (16, 16))
Our goal is for autodistill
to support using all foundation models as Base Models and most SOTA supervised models as Target Models. We focused on object detection and segmentation
tasks first but plan to launch classification support soon! In the future, we hope autodistill
will also be used for models beyond computer vision.
base / target | YOLOv8 | YOLO-NAS | YOLOv5 | DETR | YOLOv7 | MT-YOLOv6 |
---|---|---|---|---|---|---|
DETIC | β | β | β | β | ||
GroundedSAM | β | β | β | β | ||
GroundingDINO | β | β | β | β | ||
OWL-ViT | β | β | β | β | ||
SAM-CLIP | β | β | β | β | ||
Azure DenseCaptions | ||||||
GLIPv2 |
base / target | YOLOv8 | YOLO-NAS | YOLOv5 | YOLOv7 | Segformer |
---|---|---|---|---|---|
GroundedSAM | β | π§ | π§ | ||
SAM-CLIP | π§ | π§ | π§ |
base / target | ViT | YOLOv8 | YOLOv5 |
---|---|---|---|
CLIP | π§ | π§ | π§ |
DINOv2 | π§ | π§ | π§ |
BLIP | π§ | π§ | π§ |
ALBEF | π§ | π§ | π§ |
GPT-4 | |||
Open Flamingo | |||
PaLM-2 |
You can optionally deploy some Target Models trained using Autodistill on Roboflow. Deploying on Roboflow allows you to use a range of concise SDKs for using your model on the edge, from roboflow.js for web deployment to NVIDIA Jetson devices.
The following Autodistill Target Models are supported by Roboflow for deployment:
model name | Supported? |
---|---|
YOLOv8 Object Detection | β |
YOLOv8 Instance Segmentation | β |
YOLOv5 Object Detection | β |
YOLOv5 Instance Segmentation | β |
YOLOv8 Classification |
Autodistill: Train YOLOv8 with ZERO Annotations
Apart from adding new models, there are several areas we plan to explore with autodistill
including:
We love your input! Please see our contributing guide to get started. Thank you π to all our contributors!
The autodistill
package is licensed under an Apache 2.0. Each Base or Target model plugin may use its own license corresponding with the license of its underlying model. Please refer to the license in each plugin repo for more information.