Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
The full set of models (for EfficientNet and MobilePose) are available for download at Objectron bucket.
These models can be used to predict the 3D object poses from RGB images. Example usage, including Mobile, Python, and Web API are available via Mediapipe.
To list/download the models, use gsutil ls gs://objectron/models
Check our latest newsletter for more details.