OpenMMLab Pose Estimation Toolbox and Benchmark.
Fix the bug when downloading config and checkpoint using mim
(see Issue #2918).
We are exited to release RTMO:
rtmo | body
.We have released additional RTMW models in various sizes:
Config | Input Size | Whole AP | Whole AR | FLOPS (G) |
---|---|---|---|---|
RTMW-m | 256x192 | 58.2 | 67.3 | 4.3 |
RTMW-l | 256x192 | 66.0 | 74.6 | 7.9 |
RTMW-x | 256x192 | 67.2 | 75.2 | 13.1 |
RTMW-l | 384x288 | 70.1 | 78.0 | 17.7 |
RTMW-x | 384x288 | 70.2 | 78.1 | 29.3 |
The hand keypoint detection accuracy has been notably improved.
We are glad to support the inference for the category-agnostic pose estimation method PoseAnything!
You can now specify ANY keypoints you want the model to detect, without needing extra training. Under the project folder:
python demo.py --support [path_to_support_image] --query [path_to_query_image] --config configs/demo_b.py --checkpoint [path_to_pretrained_ckpt]
We have added support for two new datasets:
ExLPose builds a new dataset of real low-light images with accurate pose labels. It can be helpful on tranining a pose estimation model working under extreme light conditions.
H3WB (Human3.6M 3D WholeBody) extends the Human3.6M dataset with 3D whole-body annotations using the COCO wholebody skeleton. This dataset enables more comprehensive 3D pose analysis and benchmarking for whole-body methods.
@Tau-J @Ben-Louis @xiexinch @Yang-Changhui @orhir @RFYoung @yao5401 @icynic @Jendker @willyfh @jit-a3 @Ginray
We are excited to release the alpha version of RTMW:
dw_openpose_full
preprocessor in sd-webui-controlnet
We are glad to support the following new algorithms:
We are glad to support the two-stage distillation method DWPose, which achieves the new SOTA performance on COCO-WholeBody.
Here is a guide to train DWPose:
Train DWPose with the first stage distillation
bash tools/dist_train.sh configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/rtmpose_x_dis_l_coco-ubody-384x288.py 8
Transfer the S1 distillation models into regular models
# first stage distillation
python pth_transfer.py $dis_ckpt $new_pose_ckpt
Train DWPose with the second stage distillation
bash tools/dist_train.sh configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_l-ll_coco-ubody-384x288.py 8
Transfer the S2 distillation models into regular models
# second stage distillation
python pth_transfer.py $dis_ckpt $new_pose_ckpt --two_dis
MotionBERT is the new SOTA method of Monocular 3D Human Pose Estimation on Human3.6M.
You can conviently try MotionBERT via the 3D Human Pose Demo with Inferencer:
python demo/inferencer_demo.py tests/data/coco/000000000785.jpg \
--pose3d human3d --vis-out-dir vis_results/human3d
We support ED-Pose, an end-to-end framework with Explicit box Detection for multi-person Pose estimation. ED-Pose re-considers this task as two explicit box detection processes with a unified representation and regression supervision. In general, ED-Pose is conceptually simple without post-processing and dense heatmap supervision.
The checkpoint is converted from the official repo. The training of EDPose is not supported yet. It will be supported in the future updates.
You can conviently try EDPose via the 2D Human Pose Demo with Inferencer:
python demo/inferencer_demo.py tests/data/coco/000000197388.jpg \
--pose2d edpose_res50_8xb2-50e_coco-800x1333 --vis-out-dir vis_results
In projects, we implement a topdown heatmap based human pose estimator, utilizing the approach outlined in UniFormer: Unifying Convolution and Self-attention for Visual Recognition (TPAMI 2023) and UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning (ICLR 2022).
We have added support for two new datasets:
UBody can boost 2D whole-body pose estimation and controllable image generation, especially for in-the-wild hand keypoint detection.
300W-LP contains the synthesized large-pose face images from 300W.
We are glad to support 3 new datasets:
Human-Art is a large-scale dataset that targets multi-scenario human-centric tasks to bridge the gap between natural and artificial scenes.
Contents of Human-Art:
Models trained on Human-Art:
Thanks @juxuan27 for helping with the integration of Human-Art!
Animal Kingdom provides multiple annotated tasks to enable a more thorough understanding of natural animal behaviors.
Results comparison:
Arch | Input Size | PCK(0.05) Ours | Official Repo | Paper |
---|---|---|---|---|
P1_hrnet_w32 | 256x256 | 0.6323 | 0.6342 | 0.6606 |
P2_hrnet_w32 | 256x256 | 0.3741 | 0.3726 | 0.393 |
P3_mammals_hrnet_w32 | 256x256 | 0.571 | 0.5719 | 0.6159 |
P3_amphibians_hrnet_w32 | 256x256 | 0.5358 | 0.5432 | 0.5674 |
P3_reptiles_hrnet_w32 | 256x256 | 0.51 | 0.5 | 0.5606 |
P3_birds_hrnet_w32 | 256x256 | 0.7671 | 0.7636 | 0.7735 |
P3_fishes_hrnet_w32 | 256x256 | 0.6406 | 0.636 | 0.6825 |
For more details, see this page
Thanks @Dominic23331 for helping with the integration of Animal Kingdom!
Landmark guided face Parsing dataset (LaPa) consists of more than 22,000 facial images with abundant variations in expression, pose and occlusion, and each image of LaPa is provided with an 11-category pixel-level label map and 106-point landmarks.
Supported by @Tau-J
MMEngine introduced the pure Python style configuration file:
Refer to the tutorial for more detailed usages.
We provided some examples here. Also, new config type of YOLOX-Pose is supported here. Feel free to try this new feature and give us your feedback!
We combined public datasets and released more powerful RTMPose models:
List of examples to deploy RTMPose:
Check out this page to know more.
Supported by @Tau-J
We have migrated SimpleBaseline3D and VideoPose3D into MMPose v1.1.0. Users can easily use Inferencer and body3d demo to conduct inference.
Below is an example of how to use Inferencer to predict 3d pose:
python demo/inferencer_demo.py tests/data/coco/000000000785.jpg \
--pose3d human3d --vis-out-dir vis_results/human3d \
--rebase-keypoint-height
Video result:
Supported by @LareinaM
We have made a lot of improvements to our demo scripts:
Take topdown_demo_with_mmdet.py
as example, you can conduct inference with webcam by specifying --input webcam
:
# inference with webcam
python demo/topdown_demo_with_mmdet.py \
projects/rtmpose/rtmdet/person/rtmdet_nano_320-8xb32_coco-person.py \
https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmdet_nano_8xb32-100e_coco-obj365-person-05d8511e.pth \
projects/rtmpose/rtmpose/body_2d_keypoint/rtmpose-m_8xb256-420e_coco-256x192.py \
https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/rtmpose-m_simcc-aic-coco_pt-aic-coco_420e-256x192-63eb25f7_20230126.pth \
--input webcam \
--show
Supported by @Ben-Louis and @LareinaM
Full Changelog: https://github.com/open-mmlab/mmpose/compare/v1.0.0...v1.1.0
We are excited to announce the release of MMPose 1.0.0 as a part of the OpenMMLab 2.0 project! MMPose 1.0.0 introduces an updated framework structure for the core package and a new section called "Projects". This section showcases a range of engaging and versatile applications built upon the MMPose foundation.
In this latest release, we have significantly refactored the core package's code to make it clearer, more comprehensible, and disentangled. This has resulted in improved performance for several existing algorithms, ensuring that they now outperform their previous versions. Additionally, we have incorporated some cutting-edge algorithms, such as SimCC and ViTPose, to further enhance the capabilities of MMPose and provide users with a more comprehensive and powerful toolkit. The new "Projects" section serves as an essential addition to MMPose, created to foster innovation and collaboration among users. This section offers the following attractive features:
RTMPose is a high-performance real-time multi-person pose estimation framework designed for practical applications. RTMPose offers high efficiency and accuracy, with various models achieving impressive AP scores on COCO and fast inference speeds on both CPU and GPU. It is also designed for easy deployment across various platforms and backends, such as ONNX, TensorRT, ncnn, OpenVINO, Linux, Windows, NVIDIA Jetson, and ARM. Additionally, it provides a pipeline inference API and SDK for Python, C++, C#, Java, and other languages. [Project][Model Zoo][Tech Report]
In this release, we introduce the MMPoseInferencer, a versatile API for inference that accommodates multiple input types. The API enables users to easily specify and customize pose estimation models, streamlining the process of performing pose estimation with MMPose.
Usage:
python demo/inferencer_demo.py ${INPUTS} --pose2d ${MODEL} [OPTIONS]
Example:
python demo/inferencer_demo.py tests/data/crowdpose --pose2d wholebody
All images located in the tests/data/crowdpose folder will be processed using RTMPose. Here are the visualization results:
For more details about Inferencer, please refer to https://mmpose.readthedocs.io/en/latest/user_guides/inference.html
In MMPose 1.0.0, we have enhanced the visualization capabilities for a more intuitive and insightful user experience, enabling a deeper understanding of the model's performance and keypoint predictions, and streamlining the process of fine-tuning and optimizing pose estimation models. The new visualization tool facilitates:
2D Heatmap (ViTPose) | 1D Heatmap (RTMPose) |
---|---|
We are excited to introduce the MMPose4AIGC project, a powerful tool that allows users to extract human pose information using MMPose and seamlessly integrate it with the T2I Adapter demo to generate stunning AI-generated images. The project makes it easy for users to generate both OpenPose-style and MMPose-style skeleton images, which can then be used as inputs in the T2I Adapter demo to create captivating AI-generated content based on pose information. Discover the potential of pose-guided image generation with the MMPose4AIGC project and elevate your AI-generated content to new heights!
YOLOX-Pose is a YOLO-based human detector and pose estimator, leveraging the methodology described in YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss (CVPRW 2022). With its lightweight and fast performance, this model is ideally suited for handling crowded scenes. [Project][Paper]
In addition to new features, MMPose 1.0.0 delivers key optimizations for an enhanced user experience. With PyTorch 2.0 compatibility and a streamlined Codec module, you'll enjoy a more efficient and user-friendly pose estimation workflow like never before.
MMPose 1.0.0 is now compatible with PyTorch 2.0, ensuring that users can leverage the latest features and performance improvements offered by the PyTorch 2.0 framework when using MMPose. With the integration of inductor, users can expect faster model speeds. The table below shows several example models:
Model | Training Speed | Memory |
---|---|---|
ViTPose-B | 29.6% ↑ (0.931 → 0.655) | 10586 → 10663 |
ViTPose-S | 33.7% ↑ (0.563 → 0.373) | 6091 → 6170 |
HRNet-w32 | 12.8% ↑ (0.553 → 0.482) | 9849 → 10145 |
HRNet-w48 | 37.1% ↑ (0.437 → 0.275) | 7319 → 7394 |
RTMPose-t | 6.3% ↑ (1.533 → 1.437) | 6292 → 6489 |
RTMPose-s | 13.1% ↑ (1.645 → 1.430) | 9013 → 9208 |
In pose estimation tasks, various algorithms require different target formats, such as normalized coordinates, vectors, and heatmaps. MMPose 1.0.0 introduces a unified Codec module to streamline the encoding and decoding processes:
Full Changelog: https://github.com/open-mmlab/mmpose/compare/v0.29.0...v1.0.0
Highlights
New Features
Improvements
Bug Fixes
Full Changelog: https://github.com/open-mmlab/mmpose/compare/v1.0.0rc0...v1.0.0rc1
Highlights
Improvements
Bug Fixes
fliplr_joints
that causes error when keypoint visibility has float values (#1589) @walsvidNew Features
Migrations
Improvements
Bug Fixes
tensor.tile
compatibility issue for pytorch 1.6 (#1658) @ly015MultilevelPixelData
(#1647) @liqikai9We are excited to announce the release of MMPose 1.0.0beta. MMPose 1.0.0beta is the first version of MMPose 1.x, a part of the OpenMMLab 2.0 projects. Built upon the new training engine.
Highlights
New engines. MMPose 1.x is based on MMEngine, which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.
Unified interfaces. As a part of the OpenMMLab 2.0 projects, MMPose 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.
More documentation and tutorials. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it here.
Breaking Changes
In this release, we made lots of major refactoring and modifications. Please refer to the migration guide for details and migration instructions.
This release is meant to fix the compatibility with the latest mmcv v1.6.1