ICCV 2021 paper with code
:star_and_crescent:论文下载:
ICCV2021 论文下载汇总:
链接: https://pan.baidu.com/s/1vmOQzLG1QaBCgQD1ijtYuw
提取码: bp9j (解压密码,联系微信 nvshenj125 获取)
CVPR 2021整理:https://github.com/DWCTOD/CVPR2021-Papers-with-Code-Demo
论文下载:https://pan.baidu.com/share/init?surl=gjfUQlPf73MCk4vM8VbzoA
密码:aicv
:star2: ICCV 2021持续更新最新论文/paper和相应的开源代码/code!
:car: ICCV 2021 收录列表
:steam_locomotive:ICCV 2021 报告和demo视频汇总 https://space.bilibili.com/288489574
:car: 官网链接:http://iccv2021.thecvf.com/home
:timer_clock: 时间 :watch: 论文/paper接收公布时间:2021年7月23日
:hand: 注:欢迎各位大佬提交issue,分享ICCV 2021论文/paper和开源项目!共同完善这个项目
:airplane: 为了方便下载,已将论文/paper存储在文件夹中 :heavy_check_mark: 表示论文/paper已下载 / Paper Download
ICCV 2021 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:ICCV+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。
:heavy_check_mark:Conformer: Local Features Coupling Global Representations for Visual Recognition
Contextual Convolutional Neural Networks
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
解读:https://zhuanlan.zhihu.com/p/353222035
论文/paper:https://arxiv.org/abs/2102.12122
代码/code:https://github.com/whai362/PVT
Reg-IBP: Efficient and Scalable Neural Network Robustness Training via Interval Bound Propagation
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
Beyond Road Extraction: A Dataset for Map Update using Aerial Images
:heavy_check_mark:FineAction: A Fined Video Dataset for Temporal Action Localization
KoDF: A Large-scale Korean DeepFake Detection Dataset
LLVIP: A Visible-infrared Paired Dataset for Low-light Vision
Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes
Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark
:heavy_check_mark:MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Semantically Coherent Out-of-Distribution Detection
StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
STRIVE: Scene Text Replacement In Videos
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
Who's Waldo? Linking People Across Text and Images (Oral)
Asymmetric Loss For Multi-Label Classification
Bias Loss for Mobile Neural Networks
Focal Frequency Loss for Image Reconstruction and Synthesis
Orthogonal Projection Loss
Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)
BN-NAS: Neural Architecture Search with Batch Normalization
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
CONet: Channel Optimization for Convolutional Neural Networks
FOX-NAS: Fast, On-device and Explainable Neural Architecture Search
Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift
RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving
Single-DARTS: Towards Stable Architecture Search
Influence-Balanced Loss for Imbalanced Visual Classification
Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories
Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification
An End-to-End Transformer Model for 3D Object Detection
AutoFormer: Searching Transformers for Visual Recognition
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
Conditional DETR for Fast Training Convergence
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
Fast Convergence of DETR with Spatially Modulated Co-Attention
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (Oral)
GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer
HiFT: Hierarchical Feature Transformer for Aerial Tracking
High-Fidelity Pluralistic Image Completion with Transformers
Improving 3D Object Detection with Channel-wise Transformer
Is it Time to Replace CNNs with Transformers for Medical Images?
Learning Spatio-Temporal Transformer for Visual Tracking
MUSIQ: Multi-scale Image Quality Transformer
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction (Oral)
解读:https://zhuanlan.zhihu.com/p/400017971
论文/paper:https://arxiv.org/abs/2108.03798
代码/code:https://github.com/Huage001/PaintTransformer
PlaneTR: Structure-Guided Transformers for 3D Plane Recovery
PnP-DETR: Towards Efficient Visual Analysis with Transformers
Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers (Oral)
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
解读:https://zhuanlan.zhihu.com/p/353222035
论文/paper:https://arxiv.org/abs/2102.12122
代码/code:https://github.com/whai362/PVT
Rethinking and Improving Relative Position Encoding for Vision Transformer
Rethinking Spatial Dimensions of Vision Transformers
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
SOTR: Segmenting Objects with Transformers
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
The Animation Transformer: Visual Correspondence via Segment Matching
The Right to Talk: An Audio-Visual Transformer Approach
TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
TransPose: Keypoint Localization via Transformer
:heavy_check_mark:Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
论文/paper:https://arxiv.org/abs/2101.11986
:heavy_check_mark:Visual Transformer with Statistical Test for COVID-19 Classification
Vision Transformer with Progressive Sampling
Visual Saliency Transformer
解读:https://blog.csdn.net/qq_39936426/article/details/117199411
论文/paper:https://arxiv.org/abs/2104.12099
代码/code: https://github.com/nnizhang/VST
Vision-Language Transformer and Query Generation for Referring Segmentation
Voxel Transformer for 3D Object Detection
Active Learning for Deep Object Detection via Probabilistic Modeling
Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters
Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery
Conditional Variational Capsule Network for Open Set Recognition
论文/paper: https://arxiv.org/abs/2104.09159
代码/code:https://github.com/guglielmocamporese/cvaecaposr
DetCo: Unsupervised Contrastive Learning for Object Detection
DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection
Deployment of Deep Neural Networks for Object Detection on Edge AI Devices with Runtime Optimization
Detecting Invisible People
FMODetect: Robust Detection and Trajectory Estimation of Fast Moving Objects
GraphFPN: Graph Feature Pyramid Network for Object Detection
Human Detection and Segmentation via Multi-view Consensus
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
Mutual Supervision for Dense Object Detection
Morphable Detector for Object Detection on Demand
Moving Object Detection for Event-based vision using Graph Spectral Clustering
Oriented R-CNN for Object Detection
Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)
Reconcile Prediction Consistency for Balanced Object Detection
Seeking Similarities over Differences: Similarity-based Domain Alignment for Adaptive Object Detection
Towards Rotation Invariance in Object Detection
TOOD: Task-aligned One-stage Object Detection (Oral)
Vector-Decomposed Disentanglement for Domain-Invariant Object Detection
Disentangled High Quality Salient Object Detection
Light Field Saliency Detection with Dual Local Graph Learning andReciprocative Guidance
RGB-D Saliency Detection via Cascaded Mutual Information Minimization
Specificity-preserving RGB-D Saliency Detection
Summarize and Search: Learning Consensus-aware Dynamic Convolution for Co-Saliency Detection
An End-to-End Transformer Model for 3D Object Detection
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather
LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation
Improving 3D Object Detection with Channel-wise Transformer
Is Pseudo-Lidar needed for Monocular 3D Object detection?
ODAM: Object Detection, Association, and Mapping using Posed RGB Video (Oral)
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
Voxel Transformer for 3D Object Detection
Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency
DepthTrack : Unveiling the Power of RGBD Tracking
Exploring Simple 3D Multi-Object Tracking for Autonomous Driving
Is First Person Vision Challenging for Object Tracking?
Learning to Track Objects from Unlabeled Videos
Learn to Match: Automatic Matching Network Design for Visual Tracking
Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths
Saliency-Associated Object Tracking
Video Annotation for Visual Tracking via Selection and Refinement
Complementary Patch for Weakly Supervised Semantic Segmentation
Calibrated Adversarial Refinement for Stochastic Semantic Segmentation
Deep Metric Learning for Open World Semantic Segmentation
Dual Path Learning for Domain Adaptation of Semantic Segmentation
EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow
Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing
Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
Exploring Cross-Image Pixel Contrast for Semantic Segmentation (Oral)
Enhanced Boundary Learning for Glass-like Object Segmentation
From Contexts to Locality: Ultra-high Resolution Ie Segmentation via Locality-aware Contextual Correlation
ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation
Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation
Labels4Free: Unsupervised Segmentation using StyleGAN
LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation
Learning Meta-class Memory for Few-Shot Semantic Segmentation
Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
Mining Contextual Information Beyond Image for Semantic Segmentation
Mining Latent Classes for Few-shot Segmentation(Oral)
Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
Multi-Anchor Active Domain Adaptation for Semantic Segmentation (Oral)
Personalized Image Semantic Segmentation
Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation
Pseudo-mask Matters inWeakly-supervised Semantic Segmentation
RECALL: Replay-based Continual Learning in Semantic Segmentation
Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)
Semantic Segmentation on VSPW Dataset through Aggregation of Transformer Models
Self-Regulation for Semantic Segmentation
Semantic Concentration for Domain Adaptation
ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
SOTR: Segmenting Objects with Transformers
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
The Marine Debris Dataset for Forward-Looking Sonar Semantic Segmentation
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
Weakly Supervised Temporal Anomaly Segmentation with Dynamic Time Warping
BiMaL: Bijective Maximum Likelihood Approach to Domain Adaptation in Semantic Scene Segmentation
VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation
Hierarchical Aggregation for 3D Instance Segmentation
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks
CDNet: Centripetal Direction Network for Nuclear Instance Segmentation
论文/paper:None
代码/code: https://github.com/2021-ICCV/CDNet
:heavy_check_mark:Crossover Learning for Fast Online Video Instance Segmentation
论文/paper:https://arxiv.org/abs/2104.05970
代码/code: https://github.com/hustvl/CrossVIS
:heavy_check_mark:Instances as Queries
Instance Segmentation Challenge Track Technical Report, VIPriors Workshop at ICCV 2021: Task-Specific Copy-Paste Data Augmentation Method for Instance Segmentation
Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)
Scaling up instance annotation via label propagation
Domain Adaptive Video Segmentation via Temporal Consistency Regularization
Full-Duplex Strategy for Video Object Segmentation
Hierarchical Memory Matching Network for Video Object Segmentation
demo:https://www.bilibili.com/video/BV1Eg41157q3
论文/paper:https://arxiv.org/abs/2109.11404 | 主页/homepage
代码/code:Hierarchical Memory Matching Network for Video Object Segmentation
Joint Inductive and Transductive Learning for Video Object Segmentation
Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
Uncertainty-aware GAN with Adaptive Loss for Robust MRI Image Enhancement
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
Improving Tuberculosis (TB) Prediction using Synthetically Generated Computed Tomography (CT) Images
Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts
Studying the Effects of Self-Attention for Medical Image Analysis
3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations (Oral)
AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer
Click to Move: Controlling Video Generation with Sparse Motion
Collaging Class-specific GANs for Semantic Image Synthesis
Disentangled Lifespan Face Synthesis
Dual Projection Generative Adversarial Networks for Conditional Image Generation
EigenGAN: Layer-Wise Eigen-Learning for GANs
GAN Inversion for Out-of-Range Images with Geometric Transformations
Generative Models for Multi-Illumination Color Constancy
Gradient Normalization for Generative Adversarial Networks
Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs
Image Synthesis via Semantic Composition
InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images
Learning to Diversify for Single Domain Generalization
Manifold Matching via Deep Metric Learning for Generative Modeling
Meta Gradient Adversarial Attack
Online Multi-Granularity Distillation for GAN Compression
Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation
PixelSynth: Generating a 3D-Consistent Experience from a Single Image
Robustness and Generalization via Generative Adversarial Training
SemIE: Semantically-Aware Image Extrapolation
SketchLattice: Latticed Representation for Sketch Manipulation
Sketch Your Own GAN
Target Adaptive Context Aggregation for Video Scene Graph Generation
Toward a Visual Concept Vocabulary for GAN Latent Space
Toward Spatially Unbiased Generative Models
Towards Vivid and Diverse Image Colorization with Generative Color Prior
Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation
Unaligned Image-to-Image Translation by Learning to Reweight
Unconditional Scene Graph Generation
Unsupervised Geodesic-preserved Generative Adversarial Networks for Unconstrained 3D Pose Transfer
Domain-Aware Universal Style Transfer
Benchmark Platform for Ultra-Fine-Grained Visual Categorization BeyondHuman Performance
Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
Residual Attention: A Simple but Effective Method for Multi-Label Recognition
ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot Oral
Manifold Matching via Deep Metric Learning for Generative Modeling
Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation
Binocular Mutual Learning for Improving Few-shot Classification
Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder
Discriminative Region-based Multi-Label Zero-Shot Learning
Domain Generalization via Gradient Surgery
Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
Few-Shot Batch Incremental Road Object Detection via Detector Fusion
Field-Guide-Inspired Zero-Shot Learning
Few-shot Visual Relationship Co-localization
Generalized Source-free Domain Adaptation
Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting
Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning
Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning
On the Importance of Distractors for Few-Shot Classification
Relational Embedding for Few-Shot Classification
SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation
Transductive Few-Shot Classification on the Oblique Manifold
Visual Domain Adaptation for Monocular Depth Estimation on Resource-Constrained Hardware
Adversarial Robustness for Unsupervised Domain Adaptation
Collaborative Unsupervised Visual Representation Learning from Decentralized Data
Instance Similarity Learning for Unsupervised Feature Representation
Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning
Unsupervised Dense Deformation Embedding Network for Template-Free Shape Correspondence
Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density
Digging into Uncertainty in Self-supervised Multi-view Stereo
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring
Improving Self-supervised Learning with Hardness-aware Dynamic Curriculum Learning: An Application to Digital Pathology
Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark
Reducing Label Effort: Self-Supervised meets Active Learning
Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging
Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
Self-Supervised Video Representation Learning with Meta-Contrastive Network
SSH: A Self-Supervised Framework for Image Harmonization
Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning
Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency
A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
Online Refinement of Low-level Feature Based Activation Map for Weakly Supervised Object Localization
Influence Selection for Active Learning
Class Semantics-based Attention for Action Detection
"Knights": First Place Submission for VIPriors21 Action Recognition Challenge at ICCV 2021
A Baseline Framework for Part-level Action Parsing and Action Recognition
Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
Elaborative Rehearsal for Zero-shot Action Recognition
论文/paper:https://arxiv.org/abs/2108.02833
:heavy_check_mark:FineAction: A Fined Video Dataset for Temporal Action Localization
论文/paper:https://arxiv.org/abs/2105.11107 | 主页/Homepage
代码/code: None
:heavy_check_mark:MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation (Oral)
Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition
Enriching Local and Global Contexts for Temporal Action Localization
Boundary-sensitive Pre-training for Temporal Localization in Videos
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
Visual Alignment Constraint for Continuous Sign Language Recognition
HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton
Hand-Object Contact Consistency Reasoning for Human Grasps Generation
Human Pose Regression with Residual Log-likelihood Estimation Oral
Online Knowledge Distillation for Efficient Pose Estimation
The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation
TransPose: Keypoint Localization via Transformer
EventHPE: Event-based 3D Human Pose and Shape Estimation
DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders(Oral)
FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration
Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild
Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation
Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows
PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop
Shape-aware Multi-Person Pose Estimation from Multi-View Images
Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition
RePOSE: Real-Time Iterative Rendering and Refinement for 6D Object Pose Estimation
SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
ARCH++: Animation-Ready Clothed Human Reconstruction Revisited
imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose
Learning to Regress Bodies from Images using Differentiable Semantic Rendering
Learning Motion Priors for 4D Human Body Capture in 3D Scenes (Oral)
Physics-based Human Motion Estimation and Synthesis from Videos
Probabilistic Modeling for Human Mesh Recovery
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (Oral)
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation [oral]
Masked Face Recognition Challenge: The InsightFace Track Report
Masked Face Recognition Challenge: The WebFace260M Track Report
PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition
Rethinking Common Assumptions to Mitigate Racial Bias in Face Recognition Datasets
SynFace: Face Recognition with Synthetic Data
Unravelling the Effect of Image Distortions for Biased Prediction of Pre-trained Face Recognition Models
ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment
Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Self-Supervised 3D Face Reconstruction via Conditional Estimation
论文/paper:https://arxiv.org/abs/2110.04800
代码/code:None
Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing
论文/paper:https://arxiv.org/abs/2103.15432
代码/code:None
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
Understanding and Mitigating Annotation Bias in Facial Expression Recognition
A Technical Report for ICCV 2021 VIPriors Re-identification Challenge
ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification
IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID Oral
Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences
Learning Instance-level Spatial-Temporal Patterns for Person Re-identification
论文/paper:https://arxiv.org/abs/2108.00171
代码/code:https://github.com/RenMin1991/cleaned-DukeMTMC-reID/
Learning Compatible Embeddings
Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency
Towards Discriminative Representation Learning for Unsupervised Person Re-identification
TransReID: Transformer-based Object Re-Identification
Video-based Person Re-identification with Spatial and Temporal Memory Networks
Weakly Supervised Person Search with Region Siamese Networks
Heterogeneous Relational Complement for Vehicle Re-identification
MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?
Spatial and Semantic Consistency Regularizations for Pedestrian Attribute Recognition
Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework (Oral)
Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting
Generating Smooth Pose Sequences for Diverse Human Motion Prediction
MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction
RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting
Skeleton-Graph: Long-Term 3D Motion Prediction From 2D Observations Using Deep Spatio-Temporal Graph CNNs
DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets
MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction
CL-Face-Anti-spoofing
3D High-Fidelity Mask Face Presentation Attack Detection Challenge
Exploring Temporal Coherence for More General Video Face Forgery Detection
OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild
Fake It Till You Make It: Face analysis in the wild using synthetic data alone
A Hierarchical Assessment of Adversarial Severity
AdvDrop: Adversarial Attack to DNNs by Dropping Information
AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning
Optical Adversarial Attack
Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings
TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning
Wasserstein Coupled Graph Learning for Cross-Modal Retrieval
AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network
Augmenting Depth Estimation with Geospatial Context
Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation
Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation (oral)
Motion Basis Learning for Unsupervised Deep Homography Estimationwith Subspace Projection
Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation
SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting (Oral)
StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation
论文/paper:https://arxiv.org/abs/2108.06815
代码/code:https://github.com/JunHeum/ABME
:heavy_check_mark:XVFI: eXtreme Video Frame Interpolation(Oral)
论文/paper:https://arxiv.org/abs/2103.16206
代码/code: https://github.com/JihyongOh/XVFI
The Multi-Modal Video Reasoning and Analyzing Competition
CodeNeRF: Disentangled Neural Radiance Fields for Object Categories
GNeRF: GAN-based Neural Radiance Field without Posed Camera
In-Place Scene Labelling and Understanding with Implicit Scene Representation (Oral)
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering
NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo (Oral)
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
Self-Calibrating Neural Radiance Fields
UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction (Oral)
论文/paper:https://arxiv.org/abs/2104.10078 | 主页/Homepage
代码/code:None
CANet: A Context-Aware Network for Shadow Removal
DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Self-supervised Product Quantization for Deep Unsupervised Image Retrieval
Designing a Practical Degradation Model for Deep Blind Image Super-Resolution
Dual-Camera Super-Resolution with Aligned Attention Modules
论文/paper:https://arxiv.org/abs/2109.01349
代码/code:None
Generalized Real-World Super-Resolution through Adversarial Robustness
论文/paper:https://arxiv.org/abs/2108.11505
代码/code:None
Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
论文/paper:https://arxiv.org/abs/2004.03791
代码/code:https://github.com/LongguangWang/ArbSR
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
论文/paper:None
代码/code: https://github.com/Anonymous-iccv2021-paper3163/CaFM-Pytorch
Equivariant Imaging: Learning Beyond the Range Space (Oral)
Spatially-Adaptive Image Restoration using Distortion-Guided Networks
Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image (Oral)
SDWNet: A Straight Dilated Network with Wavelet Transformation for Image Deblurring
Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions
Deep Reparametrization of Multi-Frame Super-Resolution and Denoising (Oral)
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
**ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models **Oral
Rethinking Deep Image Prior for Denoising
Rethinking Noise Synthesis and Modeling in Raw Denoising
ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss
Gap-closing Matters: Perceptual Quality Assessment and Optimization of Low-Light Image Enhancement
Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables
Effect of Parameter Optimization on Classical and Learning-based Image Matching Methods
Viewpoint Invariant Dense Matching for Visual Geolocalization
MUSIQ: Multi-scale Image Quality Transformer
Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging
Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform
Dynamic Attentive Graph Learning for Image Restoration
Towards Flexible Blind JPEG Artifacts Removal
Image Inpainting via Conditional Texture and Structure Dual Generation
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
Internal Video Inpainting by Implicit Long-range Propagation
Occlusion-Aware Video Object Inpainting
Searching for Two-Stream Models in Multivariate Space for Video Recognition
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Multi-scale Matching Networks for Semantic Correspondence
:heavy_check_mark:CPF: Learning a Contact Potential Field to Model the Hand-object Interaction
Exploiting Scene Graphs for Human-Object Interaction Detection
Spatially Conditioned Graphs for Detecting Human–Object Interactions
Virtual Multi-Modality Self-Supervised Foreground Matting for Human-Object Interaction
Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation
论文/paper:https://arxiv.org/abs/2107.13780 | 主页/Homepage
代码/code:https://github.com/DreamtaleCore/PnP-GA
Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation
论文/paper:https://arxiv.org/abs/2110.06853
代码/code:None
Improving Contrastive Learning by Visualizing Feature Transformation
论文/paper:https://arxiv.org/abs/2108.02982
代码/code:https://github.com/DTennant/CL-Visualizing-Feature-Transformation
Social NCE: Contrastive Learning of Socially-aware Motion Representations
论文/paper:https://arxiv.org/abs/2012.11717
代码/code:https://github.com/vita-epfl/social-nce-crowdnav
Parametric Contrastive Learning
论文/paper:https://arxiv.org/abs/2107.12028
代码/code:https://github.com/jiequancui/Parametric-Contrastive-Learning
MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction
GDP: Stabilized Neural Network Pruning via Gates with Differentiable Polarization
论文/paper:https://arxiv.org/abs/2109.02220
代码/code:None
Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks
论文/paper:https://arxiv.org/abs/2110.09195
代码/code:https://github.com/yikaiw/SNN
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
Distance-aware Quantization
Dynamic Network Quantization for Efficient Video Inference
Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
Deep Structured Instance Graph for Distilling Object Detectors
论文/paper:https://arxiv.org/abs/2109.12862
代码/code:https://github.com/dvlab-research/Dsig
Distilling Holistic Knowledge with Graph Neural Networks
Lipschitz Continuity Guided Knowledge Distillation
论文/paper:https://arxiv.org/abs/2108.12905
代码/code:None
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
Self Supervision to Distillation for Long-Tailed Visual Recognition
A Robust Loss for Point Cloud Registration
论文/paper:https://arxiv.org/abs/2108.11682
代码/code:None
A Technical Survey and Evaluation of Traditional Point Cloud Clustering Methods for LiDAR Panoptic Segmentation
论文/paper:https://arxiv.org/abs/2108.09522v1
代码/code:None
(Just) A Spoonful of Refinements Helps the Registration Error Go Down Oral
论文/paper:https://arxiv.org/abs/2108.03257
代码/code:None
ABD-Net: Attention Based Decomposition Network for 3D Point Cloud Decomposition
AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds
Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds
CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds
Deep Models with Fusion Strategies for MVP Point Cloud Registration
DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
Guided Point Contrastive Learning for Semi-supervised Point Cloud Semantic Segmentation
Learning Inner-Group Relations on Point Clouds
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
ME-PCN: Point Completion Conditioned on Mask Emptiness
MVP Benchmark: Multi-View Partial Point Clouds for Completion and Registration
Out-of-Core Surface Reconstruction via Global TGV Minimization
PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds
PICCOLO: Point Cloud-Centric Omnidirectional Localization
Point Cloud Augmentation with Weighted Local Transformations
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers (Oral)
ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
Sampling Network Guided Cross-Entropy Method for Unsupervised Point Cloud Registration
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification
Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching
Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion
Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility
Voxel-based Network for Shape Completion by Leveraging Edge Generation
Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis
论文/paper:https://arxiv.org/abs/2105.01288v1| 主页/Homepage
代码/code:https://github.com/tiangexiang/CurveNet
3D Shapes Local Geometry Codes Learning with SDF
3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces
DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension
Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for Single-view Garment Reconstruction
Pixel-Perfect Structure-from-Motion with Featuremetric Refinement(Oral)
VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction
:heavy_check_mark:Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts
论文/paper:https://arxiv.org/abs/2104.00887
代码/code:https://github.com/clovaai/mxfont
Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
Data Augmentation for Scene Text Recognition
论文/paper:https://arxiv.org/abs/2108.06949
代码/code:https://github.com/roatienza/straug
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
论文/paper:None
代码/code:https://github.com/wangyuxin87/VisionLAN
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
FOVEA: Foveated Image Magnification for Autonomous Navigation
Learning to drive from a world on rails
MAAD: A Model and Dataset for "Attended Awareness" in Driving
MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving
NEAT: Neural Attention Fields for End-to-End Autonomous Driving
Road-Challenge-Event-Detection-for-Situation-Awareness-in-Autonomous-Driving
Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving
论文/paper:https://arxiv.org/abs/2109.01510
代码/code:https://github.com/xrenaa/Safety-Aware-Motion-Prediction
ICCV2021_Visdrone_detection
论文/paper:None
代码/code:https://github.com/Gumpest/ICCV2021_Visdrone_detection
DRÆM -- A discriminatively trained reconstruction embedding for surface anomaly detection
论文/paper:https://arxiv.org/abs/2108.07610
代码/code:None
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
论文/paper:https://arxiv.org/pdf/2101.10030.pdf
代码/code:https://github.com/tianyu0207/RTFM
Cross-Camera Convolutional Color Constancy
论文/paper:https://arxiv.org/abs/2011.11164
代码/code:https://github.com/mahmoudnafifi/C5
Learnable Boundary Guided Adversarial Training
论文/paper:https://arxiv.org/abs/2011.11164
代码/code:https://github.com/FPNAS/LBGAT
Prior-Enhanced network with Meta-Prototypes (PEMP)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Generalized-Shuffled-Linear-Regression (Oral)
VLGrammar: Grounded Grammar Induction of Vision and Language
A New Journey from SDRTV to HDRTV
IICNet: A Generic Framework for Reversible Image Conversion
Structure-Preserving Deraining with Residue Channel Prior Guidance
Learning with Noisy Labels via Sparse Regularization
Neural Strokes: Stylized Line Drawing of 3D Shapes
COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation
RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction
CanvasVAE: Learning to Generate Vector Graphic Documents
Refining activation downsampling with SoftPool
Aligning Latent and Image Spaces to Connect the Unconnectable
Unifying Nonlocal Blocks for Neural Networks
SLAMP: Stochastic Latent Appearance and Motion Prediction
TransForensics: Image Forgery Localization with Dense Self-Attention
Learning Facial Representations from the Cycle-consistency of Face
NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models
Impact of Aliasing on Generalization in Deep Convolutional Networks
Learning Canonical 3D Object Representation for Fine-Grained Recognition
UniNet: A Unified Scene Understanding Network and Exploring Multi-Task Relationships through the Lens of Adversarial Attacks
SUNet: Symmetric Undistortion Network for Rolling Shutter Correction
Learning to Cut by Watching Movies
Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Towards Interpretable Deep Metric Learning with Structural Matching
m-RevNet: Deep Reversible Neural Networks with Momentum
DiagViB-6: A Diagnostic Benchmark Suite for Vision Models in the Presence of Shortcut and Generalization Opportunities
perf4sight: A toolflow to model CNN training performance on Edge GPUs
MT-ORL: Multi-Task Occlusion Relationship Learning
ProAI: An Efficient Embedded AI Hardware for Automotive Applications - a Benchmark Study
SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments
CODEs: Chamfer Out-of-Distribution Examples against Overconfidence Issue
Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark
Pixel Difference Networks for Efficient Edge Detection
Online Continual Learning For Visual Food Classification
DICOM Imaging Router: An Open Deep Learning Framework for Classification of Body Parts from DICOM X-ray Scans
PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation
Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest Radiographs Using Deep Convolutional Neural Networks
FaPN: Feature-aligned Pyramid Network for Dense Image Prediction
Finding Representative Interpretations on Convolutional Neural Networks
Investigating transformers in the decomposition of polygonal shapes as point collections
Self-Supervised Pretraining and Controlled Augmentation Improve Rare Wildlife Recognition in UAV Images
Group-aware Contrastive Regression for Action Quality Assessment
End-to-End Dense Video Captioning with Parallel Decoding
PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion
Scene Designer: a Unified Model for Scene Search and Synthesis from Sketch
Structured Outdoor Architecture Reconstruction by Exploration and Classification
Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
Deep Hybrid Self-Prior for Full 3D Mesh Generation
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning
Thermal Image Processing via Physics-Inspired Deep Networks
A New Journey from SDRTV to HDRTV
Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs
Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
LOKI: Long Term and Key Intentions for Trajectory Prediction
Stochastic Scene-Aware Motion Prediction
Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes
Social Fabric: Tubelet Compositions for Video Relation Detection
Causal Attention for Unbiased Visual Recognition
Universal Cross-Domain Retrieval: Generalizing Across Classes and Domains
Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
Learning to Match Features with Seeded Graph Matching Network
A Unified Objective for Novel Class Discovery
How to cheat with metrics in single-image HDR reconstruction
Towards Understanding the Generative Capability of Adversarially Robust Classifiers (Oral)
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization
PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility
Continual Learning for Image-Based Camera Localization
Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data
Detecting and Segmenting Adversarial Graphics Patterns from Images
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies
Learning Signed Distance Field for Multi-view Surface Reconstruction (Oral)
Deep Relational Metric Learning
Ranking Models in Unlabeled New Environments
Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image
LSD-StructureNet: Modeling Levels of Structural Detail in 3D Part Hierarchies
BiaSwap: Removing dataset bias with bias-tailored swapping augmentation
LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning
Learning of Visual Relations: The Devil is in the Tails
Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision
Support-Set Based Cross-Supervision for Video Grounding
Fast Robust Tensor Principal Component Analysis via Fiber CUR Decomposition
Improving Generalization of Batch Whitening by Convolutional Unit Optimization
CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing
NGC: A Unified Framework for Learning with Open-World Noisy Data
LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Learning Cross-modal Contrastive Features for Video Domain Adaptation
Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process
A Dual Adversarial Calibration Framework for Automatic Fetal Brain Biometry
LUAI Challenge 2021 on Learning to Understand Aerial Images
Embedding Novel Views in a Single JPEG Image
Learning to Discover Reflection Symmetry via Polar Matching Convolution
Deep 3D Mask Volume for View Synthesis of Dynamic Scenes
Cross-category Video Highlight Detection via Set-based Learning
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
Sparse to Dense Motion Transfer for Face Image Animation
SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos
4D-Net for Learned Multi-Modal Alignment
The Power of Points for Modeling Humans in Clothing
The Functional Correspondence Problem
On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation
Towards Learning Spatially Discriminative Feature Representations
Learning Fast Sample Re-weighting Without Reward Data
CTRL-C: Camera calibration TRansformer with Line-Classification
PR-Net: Preference Reasoning for Personalized Video Highlight Detection
Dual Transfer Learning for Event-based End-task Prediction via Pluggable Event to Image Translation
Learning to Generate Scene Graph from Natural Language Supervision
Parsing Table Structures in the Wild
Hierarchical Object-to-Zone Graph for Object Navigation
Square Root Marginalization for Sliding-Window Bundle Adjustment
YouRefIt: Embodied Reference Understanding with Language and Gesture
Deep Hough Voting for Robust Global Registration
IICNet: A Generic Framework for Reversible Image Conversion
Estimating Leaf Water Content using Remotely Sensed Hyperspectral Data
What Matters for Ad-hoc Video Search? A Large-scale Evaluation on TRECVID
Shape-Biased Domain Generalization via Shock Graph Embeddings
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation
Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting(Oral)
Multiresolution Deep Implicit Functions for 3D Shape Representation
Image Shape Manipulation from a Single Augmented Training Sample (Oral)
ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors
Contact-Aware Retargeting of Skinned Motion
DisUnknown: Distilling Unknown Factors for Disentanglement Learning
FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
A Pathology Deep Learning System Capable of Triage of Melanoma Specimens Utilizing Dermatopathologist Consensus as Ground Truth
PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering
The First Vision For Vitals (V4V) Challenge for Non-Contact Video-Based Physiological Estimation
FaceEraser: Removing Facial Parts for Augmented Reality
S3VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation
JEM++: Improved Techniques for Training JEM
Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching
Long Short View Feature Decomposition via Contrastive Video Representation Learning
Visual Scene Graphs for Audio Source Separation
Meta-Aggregator: Learning to Aggregate for 1-bit Graph Neural Networks
Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness
Sensor-Guided Optical Flow
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
Video Autoencoder: self-supervised disentanglement of static 3D structure and motion
Topologically Consistent Multi-View Face Inference Using Volumetric Sampling
Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice (Oral)
HighlightMe: Detecting Highlights from Human-Centric Videos
How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors
Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images
Waypoint Models for Instruction-guided Navigation in Continuous Environments
Procedure Planning in Instructional Videosvia Contextual Modeling and Model-based Policy Learning (Oral)
De-rendering Stylized Texts
Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction
Keypoint Communities
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction (Oral)
2nd Place Solution to Google Landmark Retrieval 2021
Neural Strokes: Stylized Line Drawing of 3D Shapes
Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency
Pano-AVQA: Grounded Audio-Visual Question Answering on 360∘ Videos
Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans
BuildingNet: Learning to Label 3D Buildings (oral)
SOMA: Solving Optical Marker-Based MoCap Automatically
Topic Scene Graph Generation by Attention Distillation from Caption
Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual Concepts
Understanding of Emotion Perception from Art
Nuisance-Label Supervision: Robustness Improvement by Free Labels
Simple Baseline for Single Human Motion Forecasting
PixelPyramids: Exact Inference Models from Lossless Image Pyramids