note: If the image does not display, please check [link] or download the 'fig/' folder
state-of-the-art video object detectors performance comparison without post-processing methods.
state-of-the-art video object detectors performance comparison with post-processing methods. ∗ indicates use of video-level post-processing methods (e.g Seq-NMS, tubelet rescoring, BLR), △ indicates using data augmentation
Dataset
ImageNet: Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy,Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. "ImageNet Large Scale Visual Recognition Challenge". IJCV(2015).[paper] [download link]
Epic Kitchen: Dima Damen, Hazel Doughty, Giovanni Maria Farinella,Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. "Scaling egocentric vision: The epic-kitchens dataset". ECCV(2018).[paper] [download link]
IJCV 2021
CSMN: Liang Han, Pichao Wang, Zhaozheng Yin, Fan Wang, Hao Li. "Context and Structure Mining Network for Video Object Detection". IJCV(2021).[paper][code]
ACM MM 2021
TransVOD: Lu He, Qianyu Zhou, Xiangtai Li, Li Niu1, Guangliang Cheng, Xiao Li, Wenxuan Liu, Yunhai Tong, Lizhuang Ma, Liqing Zhang. "End-to-End Video Object Detection with Spatial-Temporal Transformers". ACM MM(2021).[paper][code]
VmAP: Anupam Sobti, Vaibhav Mavi, M Balakrishnan, Chetan Arora. "VmAP: A Fair Metric for Video Object Detection". ACM MM(2021).[paper]
AAAI 2021
MAMBA: Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson. "MAMBA:Multi-level Aggregation via Memory Bank for Video Object Detection". AAAI(2021).[paper]
ICCV 2021
TF-Blender: Yiming Cui, Liqi Yan, Zhiwen Cao, Dongfang Liu. "TF-Blender: Temporal Feature Blender for Video Object Detection". ICCV(2021).[paper][code]
ACM MM 2020
DSFNet: Lijian Lin, Haosheng Chen, Honglun Zhang, Jun Liang, Yu Li, Ying Shan, Hanzi Wang. "Dual Semantic Fusion Network for Video Object Detection". ACM MM(2020). [paper]
EBFA: Liang Han, Pichao Wang, Zhaozheng Yin, Fan Wang, Hao Li. "Exploiting Better Feature Aggregation for Video Object Detection.". ACM MM(2020). [paper]
ECCV 2020
LSTS: Jiang, Zhengkai and Liu, Yu and Yang, Ceyuan and Liu, Jihao and Gao, Peng and Zhang, Qian and Xiang, Shiming and Pan, Chunhong. "Learning Where to Focus for Efficient Video Object Detection". ECCV(2020). [paper] [code]
HVRNet: Mingfei Han, Yali Wang, Xiaojun Chang, and Yu Qiao Mining. "Mining Inter-Video Proposal Relations for Video Object Detection". ECCV(2020). [paper] [code]
CHP: Zhujun Xu, Emir Hrustic, and DamienVivet. "CenterNet Heatmap Propagation for Real-time Video Object Detection". ECCV(2020). [paper]
CVPR 2020
MEGA: Yihong Chen, Yue Cao, Han Hu, Liwei Wang. "Memory Enhanced Global-Local Aggregation for Video Object Detection". CVPR(2020).[paper] [code]
AAAI 2020
TCENet: Fei He, Naiyu Gao, Qiaozhe Li, Senyao Du, Xin Zhao, Kaiqi Huang. "Temporal Context Enhanced Feature Aggregation for Video Object Detection". AAAI(2020).[paper]
ICCV 2019
RDN: Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, and Tao Mei. "Relation Distillation Networks for Video Object Detection". ICCV(2019).[paper]
SELSA: Haiping Wu, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang. "Sequence Level Semantics Aggregation for Video Object Detection". ICCV(2019).[paper] [code]
LLTR: Mykhailo Shvets, Wei Liu, Alexander C. Berg. "Leveraging Long-Range Temporal Relationships Between Proposals for Video
Object Detection". ICCV(2019).[paper]
OGEMN: Hanming Deng, Yang Hua, Tao Song, Zongpu Zhang, Zhengui Xue, Ruhui Ma, Neil Robertson, and Haibing Guan. "Object Guided External Memory Network for Video Object Detection". ICCV(2019).[paper]
PSLA: Chaoxu Guo, Bin Fan1, Jie Gu, Qian Zhang, Shiming Xiang, Veronique Prinet, Chunhong Pan1. "Progressive Sparse Local Attention for Video Object Detection". ICCV(2019).[paper]
A Delay Metric for Video Object Detection: What Average Precision Fails to Tell: Huizi Mao, Xiaodong Yang, William J. Dally. "A Delay Metric for Video Object Detection: What Average Precision Fails to Tell". ICCV(2019).[paper]
DorT: Hao Luo, Wenxuan Xie, Xinggang Wang, Wenjun Zeng. "Detect or Track: Towards Cost-Effective Video Object Detection/Tracking". AAAI(2019).[paper]
CVPR 2018
THP: Xizhou Zhu, Jifeng Dai, Lu Yuan, Yichen Wei. "Towards High Performance Video Object Detection". CVPR(2018).[paper]
LSTM-SSD: Mason Liu, Menglong Zhu. "Mobile Video Object Detection with Temporally-Aware Feature Maps". CVPR(2018).[paper]
ST-Lattice: Kai Chen, Jiaqi Wang, Shuo Yang, Xingcheng Zhang, Yuanjun Xiong, Chen Chang Loy, Dahua Lin. "Optimizing Video Object Detection via a Scale-Time Lattice". CVPR(2018).[paper]
ECCV 2018
STSN: Gedas Bertasius, Lorenzo Torresani, ianbo Shi. "Object Detection in Video with Spatiotemporal Sampling Networks". ECCV(2018).[paper]
STMN: Fanyi Xiao, Yong Jae Lee. "Video Object Detection with an Aligned Spatial-Temporal Memory". ECCV(2018).[paper] [code]
MANet: Shiyao Wang, Yucong Zhou, Junjie Yan, Zhidong Deng. "Fully Motion-Aware Network for Video Object Detection". ECCV(2018).[paper]
CVPR 2017
DFF: Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei. "Deep Feature Flow for Video Recognition". CVPR(2017).[paper] [code]
ICCV 2017
FGFA: Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei. "Flow-Guided Feature Aggregation for Video Object Detection". ICCV(2017).[paper] [code]
D&T: Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman. "Detect to Track and Track to Detect". ICCV(2017).[paper] [code]
Papers before 2017
T-cnn: Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang. " T-cnn:
Tubelets with convolutional neural networks for object detection from videos". IEEE Transactions on Circuits and Systems for Video Technology(2017).[paper] [code]
Object detection from video tubelets with convolutional neural networks: Kai Kang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang. "Object detection
from video tubelets with convolutional neural networks". CVPR(2016).[paper] [code]
Seq-NMS: Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang. "Seq-NMS for Video Object Detection". ArXiv(2016).[paper]