OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Highlights
New Features
Improvements
Bug Fixes
We support two novel models for video recognition and retrieval based on open-domain text: ActionCLIP and CLIP4Clip. These models mark the first step of MMAction2's journey towards multi-modal video understanding. Furthermore, we also introduce a new video retrieval dataset, MSR-VTT.
For more details, please refer to ActionCLIP, CLIP4Clip and MSR-VTT.
Supported by @Dai-Wenxun in #2470 and #2489.
MMEngine introduced the pure Python style configuration file:
Refer to the tutorial for more detailed usages.
We are glad to support 3 new datasets:
HACS is a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos.
https://github.com/open-mmlab/mmaction2/assets/58767402/7b7407e3-994a-4523-975c-5bdee3b54998
For more details, please refer to HACS.
Supported by @hukkai in #2224
MultiSports is a multi-person video dataset of spatio-temporally localized sports actions.
https://github.com/open-mmlab/mmaction2/assets/58767402/1f94668a-823b-46a0-9ea7-eedf0f29d1d1
For more details, please refer to MultiSports.
Supported by @cir7 in #2280
For more details, please refer to Kinetics710.
Supported by @cir7 in #2534
demo_skeleton.py
by @Dai-Wenxun in https://github.com/open-mmlab/mmaction2/pull/2380
torch.div
by @Dai-Wenxun in https://github.com/open-mmlab/mmaction2/pull/2449
Full Changelog: https://github.com/open-mmlab/mmaction2/compare/v1.0.0...v1.1.0
We are excited to announce the release of MMAction2 1.0.0 as a part of the OpenMMLab 2.0 project! MMAction2 1.0.0 introduces an updated framework structure for the core package and a new section called Projects
. This section showcases various engaging and versatile applications built upon the MMAction2 foundation.
In this latest release, we have significantly refactored the core package's code to make it clearer, more comprehensible, and disentangled. This has resulted in improved performance for several existing algorithms, ensuring that they now outperform their previous versions. Additionally, we have incorporated some cutting-edge algorithms, such as VideoSwin and VideoMAE, to further enhance the capabilities of MMAction2 and provide users with a more comprehensive and powerful toolkit. The new Projects
section serves as an essential addition to MMAction2, created to foster innovation and collaboration among users. This section offers the following attractive features:
Flexible code contribution
: Unlike the core package, the Projects
section allows for a more flexible environment for code contributions, enabling faster integration of state-of-the-art models and features.Showcase of diverse applications
: Explore various projects built upon the MMAction2 foundation, such as deployment examples and combinations of video recognition with other tasks.Fostering creativity and collaboration
: Encourages users to experiment, build upon the MMAction2 platform, and share their innovative applications and techniques, creating an active community of developers and researchers. Discover the possibilities within the "Projects" section and join the vibrant MMAction2 community in pushing the boundaries of video understanding applications!RGBPoseConv3D is a framework that jointly uses 2D human skeletons and RGB appearance for human action recognition. It is a 3D CNN with two streams, with the architecture borrowed from SlowFast. In RGBPoseConv3D:
slow
stream in SlowFast; The Skeleton stream corresponds to the fast
stream in SlowFast.4x
larger than the pseudo heatmaps.In this release, we introduce the MMAction2Inferencer, which is a versatile API for the inference that supports multiple input types. The API enables users to easily specify and customize action recognition models, streamlining the process of performing video prediction using MMAction2.
Usage:
python demo/demo_inferencer.py ${INPUTS} [OPTIONS]
INPUTS
can be a video path or rawframes folder. For more detailed information on OPTIONS
, please refer to Inferencer.Example:
python demo/demo_inferencer.py zelda.mp4 --rec tsn --vid-out-dir zelda_out --label-file tools/data/kinetics/label_map_k400.txt
You can find the zelda.mp4
here. The output video is displayed below:
MMAction2 V1.0 introduces support for new models and datasets in the field of video understanding, including MSG3D [Project] (CVPR'2020), CTRGCN [Project] (CVPR'2021), STGCN++ (Arxiv'2022), Video Swin Transformer (CVPR'2022), VideoMAE (NeurIPS'2022), C2D (CVPR'2018), MViT V2 (CVPR'2022), UniFormer V1 (ICLR'2022), and UniFormer V2 (Arxiv'2022), as well as the spatiotemporal action detection dataset AVA-Kinetics (Arxiv'2022).
SlowOnlyR50 8x8
as an example, the Top-1 accuracy comparison of the three training methods illustrates that our omni-source training effectively employs the additional ImageNet
dataset, significantly boosting performance on Kinetics400
.joint
and bone
modalities, we have extended support to joint motion
and bone motion
modalities in MMAction2 V1.0. Furthermore, we have conducted training and evaluation for these four modalities using NTU60 2D and 3D keypoint data on STGCN, 2s-AGCN, and STGCN++.ImageNet
training and has been employed in recent Video Transformer works. Whenever a video is read during training, we use multiple (typically 2-4) random samples from the video for training. This approach not only enhances the model's generalization capability but also reduces the IO pressure of video reading. We support Repeat Augment in MMAction2 V1.0 and utilize this technique in MViT V2 training. The table below compares the Top-1 accuracy on Kinetics400
before and after employing Repeat Augment:Full Changelog: https://github.com/open-mmlab/mmaction2/compare/v0.24.0...v1.0.0
Highlights
New Features
Improvements
Bug Fixes
Documentation
Highlights
New Features
Improvements
SampleFrames
transform and improve most models' performance (#1942)SampleFrame
(#2157)Bug Fixes
gen_ntu_rgbd_raw
script (#2076)joint.pkl
and bone.pkl
used in multi-stream fusion tool (#2106)SampleFrames
(#2117), (#2121), (#2122), (#2124), (#2125), (#2126), (#2129), (#2128)check_videos.py
script (#2134)Documentation
Highlights
New Features
Improvements
Bug Fixes
We are excited to announce the release of MMAction2 v1.0.0rc0. MMAction2 1.0.0beta is the first version of MMAction2 1.x, a part of the OpenMMLab 2.0 projects. Built upon the new training engine.
Highlights
New engines. MMAction2 1.x is based on MMEngine](https://github.com/open-mmlab/mmengine), which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.
Unified interfaces. As a part of the OpenMMLab 2.0 projects, MMAction2 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.
More documentation and tutorials. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it here.
Breaking Changes
In this release, we made lots of major refactoring and modifications. Please refer to the migration guide for details and migration instructions.
This release is meant to fix the compatibility with the latest mmcv v1.6.1
Highlights
New Features
Documentations
Bug and Typo Fixes