收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
欢迎关注公众号:AI算法与图像处理
:star_and_crescent:福利 注册即可领取 200 块计算资源 : https://www.bkunyun.com/wap/console?source=aistudy 使用说明
:star2: ECCV 2022 持续更新最新论文/paper和相应的开源代码/code!
:car: ECCV 2022 收录列表ID:https://ailb-web.ing.unimore.it/releases/eccv2022/accepted_papers.txt
:car: 官网链接:https://eccv2022.ecva.net
B站demo:https://space.bilibili.com/288489574
:hand: 注:欢迎各位大佬提交issue,分享ECCV 2022论文/paper和开源项目!共同完善这个项目
往年顶会论文汇总:
ECCV 2022 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:ECCV+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。
COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis
CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions
Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments
Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation
Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization
Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification
Invariant Feature Learning for Generalized Long-Tailed Classification
RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos
PLMCL: Partial-Label Momentum Curriculum Learning for Multi-Label Image Classification
Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization
Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling
CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer
Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis
RepMix: Representation Mixing for Robust Attribution of Synthesized Images
VecGAN: Image-to-Image Translation with Interpretable Latent Directions
Context-Consistent Semantic Image Editing with Style-Preserved Modulation
DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation
Supervised Attribute Information Removal and Reconstruction for Image Manipulation
Name: Adaptive Feature Interpolation for Low-Shot Image Generation
WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation
FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs
Outpainting by Queries
Single Stage Virtual Try-on via Deformable Attention Flows
Structure-aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation
Monocular 3D Object Reconstruction with GAN Inversion
Generative Multiplane Images: Making a 2D GAN 3D-Aware
DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta
Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis
SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition
2D GANs Meet Unsupervised Single-view 3D Reconstruction
InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images
Auto-regressive Image Synthesis with Integrated Quantization
Compositional Human-Scene Interaction Synthesis with Semantic Control
Generator Knows What Discriminator Should Learn in Unconditional GANs
StyleLight: HDR Panorama Generation for Lighting Estimation and Editing
Cross Attention Based Style Distribution for Controllable Person Image Synthesis
SKDCGN: Source-free Knowledge Distillation of Counterfactual Generative Networks using cGANs
Hierarchical Semantic Regularization of Latent Spaces in StyleGANs
Style Your Hair: Latent Optimization for Pose-Invariant Hairstyle Transfer via Local-Style-Aware Hair Alignment
Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing
Mind the Gap in Distilling StyleGANs
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization
FurryGAN: High Quality Foreground-aware Image Synthesis
Improving GANs for Long-Tailed Data through Group Spectral Regularization
Unrestricted Black-box Adversarial Attack Using GAN with Limited Queries
3D-FM GAN: Towards 3D-Controllable Face Manipulation
High-Fidelity Image Inpainting with GAN Inversion
Bokeh-Loss GAN: Multi-Stage Adversarial Training for Realistic Edge-Aware Bokeh
Exploring Gradient-based Multi-directional Controls in GANs
Studying Bias in GANs through the Lens of Race
Improved Masked Image Generation with Token-Critic
Weakly-Supervised Stitching Network for Real-World Panoramic Image Generation
Streamable Neural Fields
Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis
AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields
PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo
Neural-Sim: Learning to Generate Training Data with NeRF
Neural Density-Distance Fields
HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields
k-means Mask Transformer
Weakly Supervised Grounding for VQA in Vision-Language Transformers
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning
CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition
Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection
Hunting Group Clues with Transformers for Social Group Activity Recognition
Entry-Flipped Transformer for Inference and Prediction of Participant Behavior
DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation
Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning
TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
Action Quality Assessment with Temporal Parsing Transformer
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning
AiATrack: Attention in Attention for Transformer Visual Tracking
Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration
SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer
Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation
IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition
3D Siamese Transformer Network for Single Object Tracking on Point Clouds
Reference-based Image Super-Resolution with Deformable Attention Transformer
SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding
Online Continual Learning with Contrastive Vision Transformer
Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers
Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition
TransMatting: Enhancing Transparent Objects Matting with Transformers
Ghost-free High Dynamic Range Imaging with Context-aware Transformer
Audio-Visual Segmentation
Cross-modal Prototype Driven Network for Radiology Report Generation
Hierarchical Latent Structure for Multi-Modal Vehicle Trajectory Forecasting
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Video Graph Transformer for Video Question Answering
Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution
Exploiting Unlabeled Data with Vision and Language Models for Object Detection
LocVTP: Video-Text Pre-training for Temporal Localization
Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments
Cross-Modal 3D Shape Generation and Manipulation
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Frozen CLIP Models are Efficient Video Learners
Consistency-based Self-supervised Learning for Temporal Anomaly Localization
Motion Sensitive Contrastive Learning for Self-supervised Video Representation
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval
Learning an Efficient Multimodal Depth Completion Model
Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation
MUST-VQA: MUltilingual Scene-text VQA
Vision-Language Adaptive Mutual Decoder for OOV-STR
Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation
Network Binarization via Contrastive Learning
Contrastive Deep Supervision
ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images
Action-based Contrastive Learning for Trajectory Prediction
FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs
Adversarial Contrastive Learning via Asymmetric InfoNCE
Fast-MoCo: Boost Momentum-based Contrastive Learning with Combinatorial Patches
Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness
Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation
Patient-level Microsatellite Stability Assessment from Whole Slide Images By Combining Momentum Contrast Learning and Group Patch Embeddings
FairDisCo: Fairer AI in Dermatology via Disentanglement Contrastive Learning
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection
Should All Proposals be Treated Equally in Object Detection?
HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
Adversarially-Aware Robust Object Detector
ObjectBox: From Centers to Boxes for Anchor-Free Object Detection
Point-to-Box Network for Accurate Object Detection via Single Point Supervision
DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection
SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection
Rethinking IoU-based Optimization for Single-stage 3D Object Detection
Densely Constrained Depth Estimator for Monocular 3D Object Detection
Robust Object Detection With Inaccurate Bounding Boxes
Unsupervised Domain Adaptation for One-stage Object Detector using Offsets to Bounding Box
AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection
Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection
Active Learning Strategies for Weakly-supervised Object Detection
W2N:Switching From Weak Supervision to Noisy Supervision for Object Detection
Salient Object Detection for Point Clouds
UC-OWOD: Unknown-Classified Open World Object Detection
Monocular 3D Object Detection with Depth from Motion
Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection
Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph
Object Discovery via Contrastive Learning for Weakly Supervised Object Detection
RFLA: Gaussian Receptive Field based Label Assignment for Tiny Object Detection
Object Detection in Aerial Images with Uncertainty-Aware Graph Network
Adversarial Vulnerability of Temporal Feature Networks for Object Detection
Identifying Out-of-Distribution Samples in Real-Time for Safety-Critical 2D Object Detection with Margin Entropy Loss
CenterFormer: Center-based Transformer for 3D Object Detection
Tracking Objects as Pixel-wise Distributions
Towards Grand Unification of Object Tracking
The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
MOTCOM: The Multi-Object Tracking Dataset Complexity Metric
Robust Landmark-based Stent Tracking in X-ray Fluoroscopy
AiATrack: Attention in Attention for Transformer Visual Tracking
3D Siamese Transformer Network for Single Object Tracking on Point Clouds
Tracking Every Thing in the Wild
AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing
Robust Multi-Object Tracking by Marginal Inference
Towards Sequence-Level Training for Visual Tracking
Domain Adaptive Video Segmentation via Temporal Pseudo Supervision
OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers
PseudoClick: Interactive Image Segmentation with Click Imitation
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
Tackling Background Distraction in Video Object Segmentation
Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation
Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation
Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
Learning Quality-aware Dynamic Memory for Video Object Segmentation
Box-supervised Instance Segmentation with Level Set Evolution
ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation
Self-Supervised Interactive Object Segmentation Through a Singulation-and-Grasping Approach
DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
CoSMix: Compositional Semantic Mix for Domain Adaptation in 3D LiDAR Segmentation
GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation
Online Domain Adaptation for Semantic Segmentation in Ever-Changing Conditions
In Defense of Online Models for Video Instance Segmentation
Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation
Long-tailed Instance Segmentation using Gumbel Optimized Loss
Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation
Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation
Self-Support Few-Shot Semantic Segmentation
Active Pointly-Supervised Instance Segmentation
Video Mask Transfiner for High-Quality Video Instance Segmentation
Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation
Per-Clip Video Object Segmentation
Cluster-to-adapt: Few Shot Domain Adaptation for Semantic Segmentation across Disjoint Labels
Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration
Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications
Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation
Occlusion-Aware Instance Segmentation via BiLayer Network Architectures
Video Mask Transfiner for High-Quality Video Instance Segmentation
Personalizing Federated Medical Image Segmentation via Local Calibration
Learning Topological Interactions for Multi-Class Medical Image Segmentation
qDWI-Morph: Motion-compensated quantitative Diffusion-Weighted MRI analysis for fetal lung maturity assessment
Self-Supervised Pretraining for 2D Medical Image Segmentation
Knowledge Condensation Distillation
FedX: Unsupervised Federated Learning with Cross Knowledge Distillation
ReAct: Temporal Action Detection with Relational Queries
Semi-Supervised Temporal Action Detection with Proposal-Free Masking
Temporal Action Detection with Global Segmentation Mask Learning
Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions
HaloAE: An HaloNet based Local Transformer Auto-Encoder for Anomaly Detection and Localization
Compound Prototype Matching for Few-shot Action Recognition
Collaborating Domain-shared and Target-specific Feature Clustering for Cross-domain 3D Action Recognition
Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition
PSUMNet: Unified Modality Part Streams are All You Need for Efficient Pose-based Action Recognition
Lane Change Classification and Prediction with Action Recognition Networks
Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
Registration based Few-Shot Anomaly Detection
Look at Adjacent Frames: Video Anomaly Detection without Offline Training
Towards Open Set Video Anomaly Detection
Controllable and Guided Face Synthesis for Unconstrained Face Recognition
Towards Robust Face Recognition with Comprehensive Search
Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation
Category-Level 6D Object Pose and Size Estimation using Self-Supervised Deep Prior Deformation Networks
Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning
TransGrasp: Grasp Pose Estimation of a Category of Objects by Transferring Grasps from Only One Labeled Instance
Pose for Everything: Towards Category-Agnostic Pose Estimation
C3P: Cross-domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation
3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal
Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection
ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization
RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation
Neural Correspondence Field for Object Pose Estimation
Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation
CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation
PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation
Towards Unbiased Label Distribution Learning for Facial Pose Estimation Using Anisotropic Spherical Gaussian
Learning Visibility for Robust Dense Human Body Estimation
Generative Domain Adaptation for Face Anti-Spoofing
Multi-domain Learning for Updating Face Anti-spoofing Models
FairGRAPE: Fairness-aware GRAdient Pruning mEthod for Face Attribute Classification
On Mitigating Hard Clusters for Face Clustering
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis
Perspective Reconstruction of Human Faces by Joint Mesh and Landmark Regression
Latent Partition Implicit with Surface Codes for 3D Representation
LWA-HAND: Lightweight Attention Hand for Interacting Hand Reconstruction
SimpleRecon: 3D Reconstruction Without 3D Convolutions
3D Clothed Human Reconstruction in the Wild
UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation
The One Where They Reconstructed 3D Humans and Environments in TV Shows
BCom-Net: Coarse-to-Fine 3D Textured Body Shape Completion Network
Neural Capture of Animatable 3D Human from Monocular Video
Geometry-aware Single-image Full-body Human Relighting
Relighting4D: Neural Relightable Human from Videos
Detecting and Recovering Sequential DeepFake Manipulation
An Efficient Method for Face Quality Assessment on the Edge
Character decomposition to resolve class imbalance problem in Hangul OCR
Shift Variance in Scene Text Detection
1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words
Levenshtein OCR
Scene Text Recognition with Permuted Autoregressive Sequence Models
Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting
Contextual Text Block Detection towards Scene Text Understanding
GLASS: Global to Local Attention for Scene-Text Spotting
Multi-Granularity Prediction for Scene Text Recognition
Open-world Semantic Segmentation for LIDAR Point Clouds
2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds
CPO: Change Robust Panorama to Point Cloud Localization
diffConv: Analyzing Irregular Point Clouds with an Irregular View
CATRE: Iterative Point Clouds Alignment for Category-level Object Pose Refinement
Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation
SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer
Dynamic 3D Scene Analysis by Point Cloud Accumulation
3D Siamese Transformer Network for Single Object Tracking on Point Clouds
Salient Object Detection for Point Clouds
MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud
Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation
Learning to Generate Realistic LiDAR Point Clouds
Bi-PointFlowNet: Bidirectional Learning for Point Cloud Based Scene Flow Estimation
What Matters for 3D Scene Flow Network
Deep 360$^\circ$ Optical Flow Estimation Based on Multi-Projection Fusion
Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches
Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics
RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation
Self-distilled Feature Aggregation for Self-supervised Monocular Depth Estimation
RCLane: Relay Chain Prediction for Lane Detection
Action-based Contrastive Learning for Trajectory Prediction
Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction
Aware of the History: Trajectory Forecasting with the Local Behavior Data
Human Trajectory Prediction via Neural Social Physics
D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights
Image Super-Resolution with Deep Dictionary
Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution
CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution
Towards Interpretable Video Super-Resolution via Alternating Optimization
Reference-based Image Super-Resolution with Deformable Attention Transformer
Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution
HST: Hierarchical Swin Transformer for Compressed Image Super-resolution
DSR: Towards Drone Image Super-Resolution
Optimizing Image Compression via Joint Learning with Denoising
Spatio-Temporal Deformable Attention Network for Video Deblurring
Efficient Video Deblurring Guided by Motion Magnitude
Learning Degradation Representations for Image Deblurring
Towards Real-World Video Deblurring by Exploring Blur Formation Process
D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration
Flow-Guided Transformer for Video Inpainting
Unbiased Multi-Modality Guidance for Image Inpainting
Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression
Video Interpolation by Event-driven Anisotropic Adjustment of Optical Flow
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation
Feature Representation Learning for Unsupervised Cross-domain Image Retrieval
A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
Lossy Image Compression with Conditional Diffusion Models
Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets
GraphVid: It Only Takes a Few Nodes to Understand a Video
Target-absent Human Attention
Lottery Ticket Hypothesis for Spiking Neural Networks
Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality
AvatarCap: Animatable Avatar Conditioned Monocular Human Volumetric Capture
DeepPS2: Revisiting Photometric Stereo Using Two Differently Illuminated Images
Learning Local Implicit Fourier Representation for Image Warping
SESS: Saliency Enhancing with Scaling and Sliding
TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts
DenseHybrid: Hybrid Anomaly Detection for Dense Open-set Recognition
FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling
Towards Realistic Semi-Supervised Learning
OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning
Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning
Factorizing Knowledge in Neural Networks
SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
Video Dialog as Conversation about Objects Living in Space-Time
Demystifying Unsupervised Semantic Correspondence Estimation
A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision
DCCF: Deep Comprehensible Color Filter Learning Framework for High-Resolution Image Harmonization
Batch-efficient EigenDecomposition for Small and Medium Matrices
Few 'Zero Level Set'-Shot Learning of Shape Signed Distance Functions in Feature Space
Camera Pose Auto-Encoders for Improving Pose Regression
Synergistic Self-supervised and Quantization Learning
Frequency Domain Model Augmentation for Adversarial Attack
Organic Priors in Non-Rigid Structure from Motion
Unsupervised Visual Representation Learning by Synchronous Momentum Grouping
Learning Implicit Templates for Point-Based Clothed Human Modeling
BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks
Lipschitz Continuity Retained Binary Neural Network
3D Instances as 1D Kernels
ScaleNet: Searching for the Model to Scale
Rethinking Data Augmentation for Robust Visual Question Answering
Semantic Novelty Detection via Relational Reasoning
Label2Label: A Language Modeling Framework for Multi-Attribute Learning
Towards High-Fidelity Single-view Holistic Reconstruction of Indoor Scenes
Class-incremental Novel Class Discovery
MPIB: An MPI-Based Bokeh Rendering Framework for Realistic Partial Occlusion Effects
SepLUT: Separable Image-adaptive Lookup Tables for Real-time Image Enhancement
Learning with Recoverable Forgetting
Zero-Shot Temporal Action Detection via Vision-Language Prompting
Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
Neural Color Operators for Sequential Image Retouching
Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching
JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes
You Should Look at All Objects
NeFSAC: Neurally Filtered Minimal Samples
CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS
Cross-Domain Cross-Set Few-Shot Learning via Learning Compact and Aligned Representations
Self-calibrating Photometric Stereo by Neural Inverse Rendering
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Towards Understanding The Semidefinite Relaxations of Truncated Least-Squares in Robust Rotation Search
PoserNet: Refining Relative Camera Poses Exploiting Object Detections
Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos
Deep Semantic Statistics Matching (D2SM) Denoising Network
3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform
NDF: Neural Deformable Fields for Dynamic Human Modelling
Self-Supervision Can Be a Good Few-Shot Learner
ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild
MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views
SelectionConv: Convolutional Neural Networks for Non-rectilinear Image Data
Prior-Guided Adversarial Initialization for Fast Adversarial Training
Prior Knowledge Guided Unsupervised Domain Adaptation
Discover and Mitigate Unknown Biases with Debiasing Alternate Networks
Difficulty-Aware Simulator for Open Set Recognition
Tailoring Self-Supervision for Supervised Learning
Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain
Temporal and cross-modal attention for audio-visual zero-shot learning
Telepresence Video Quality Assessment
Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing
Negative Samples are at Large: Leveraging Hard-distance Elastic Loss for Re-identification
Discrete-Constrained Regression for Local Counting Models
Resolving Copycat Problems in Visual Imitation Learning via Residual Action Prediction
Efficient Meta-Tuning for Content-aware Neural Video Delivery
Object-Compositional Neural Implicit Surfaces
Explaining Deepfake Detection by Analysing Image Matching
ERA: Expert Retrieval and Assembly for Early Action Prediction
Perspective Phase Angle Model for Polarimetric 3D Reconstruction
Explicit Image Caption Editing
Unsupervised Deep Multi-Shape Matching
Contributions of Shape, Texture, and Color in Visual Recognition
Novel Class Discovery without Forgetting
Approximate Differentiable Rendering with Algebraic Surfaces
FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic Upsampling
Error Compensation Framework for Flow-Guided Video Inpainting
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
Temporal Saliency Query Network for Efficient Video Recognition
UFO: Unified Feature Optimization
OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search
Towards Accurate Open-Set Recognition via Background-Class Regularization
Grounding Visual Representations with Texts for Domain Generalization
SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis
On Label Granularity and Object Localization
Spotting Temporally Precise, Fine-Grained Events in Video
Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning
Visual Knowledge Tracing
Tackling Long-Tailed Category Distribution Under Domain Shifts
Latent Discriminant deterministic Uncertainty
Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance
Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach
Structural Causal 3D Reconstruction
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Continual Variational Autoencoder Learning via Online Cooperative Memorization
Panoptic Scene Graph Generation
Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free Replay
POP: Mining POtential Performance of new fashion products via webly cross-modal query expansion
Few-shot Object Counting and Detection
Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection
My View is the Best View: Procedure Learning from Egocentric Videos
Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation
MeshLoc: Mesh-Based Visual Localization
MemSAC: Memory Augmented Sample Consistency for Large Scale Domain Adaptation
Deforming Radiance Fields with Cages
Equivariance and Invariance Inductive Bias for Learning from Insufficient Data
Black-box Few-shot Knowledge Distillation
Balancing Stability and Plasticity through Advanced Null Space in Continual Learning
Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing
Domain Adaptive Person Search
VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments
Label-Guided Auxiliary Training Improves 3D Object Detector
Combining Internal and External Constraints for Unrolling Shutter in Videos
TIPS: Text-Induced Pose Synthesis
Improving Test-Time Adaptation via Shift-agnostic Weight Regularization and Nearest Source Prototypes
Learning Graph Neural Networks for Image Style Transfer
Contrastive Monotonic Pixel-Level Modulation
CompNVS: Novel View Synthesis with Scene Completion
When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition
Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
3D Shape Sequence of Human Comparison and Classification using Current and Varifolds
NewsStories: Illustrating articles with visual summaries
Efficient One Pass Self-distillation with Zipf's Label Smoothing
AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction
Static and Dynamic Concepts for Self-supervised Video Representation Learning
Learning Hierarchy Aware Features for Reducing Mistake Severity
Translating a Visual LEGO Manual to a Machine-Executable Plan
Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning
Trainability Preserving Neural Structured Pruning
Shift-tolerant Perceptual Similarity Metric
Abstracting Sketches through Simple Primitives
AutoTransition: Learning to Recommend Video Transition Effects
Hardly Perceptible Trojan Attack against Neural Networks with Bit Flips
Identifying Hard Noise in Long-Tailed Sample Distribution
One-Trimap Video Matting
PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation
End-to-end Graph-constrained Vectorized Floorplan Generation with Panoptic Refinement
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation
LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity
Initialization and Alignment for Adversarial Texture Optimization
Depth Field Networks for Generalizable Multi-view Scene Representation
Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection
Neural Strands: Learning Hair Geometry and Appearance from Multi-View Images
Break and Make: Interactive Structural Understanding Using LEGO Bricks
A Repulsive Force Unit for Garment Collision Handling in Neural Networks
Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal Charts and Distortion
Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding
AlphaVC: High-Performance and Efficient Learned Video Compression
WISE: Whitebox Image Stylization by Example-based Learning
Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels
Video Question Answering with Iterative Video-Text Co-Tokenization
S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning
Skeleton-free Pose Transfer for Stylized 3D Characters
Improving Fine-Grained Visual Recognition in Low Data Regimes via Self-Boosting Attention Mechanism
SdAE: Self-distillated Masked Autoencoder
Out-of-Distribution Detection with Semantic Mismatch under Masking
Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction
Revisiting the Critical Factors of Augmentation-Invariant Representation Learning
Few-shot Single-view 3D Reconstruction with Memory Prior Contrastive Network
Few-Shot Class-Incremental Learning from an Open-Set Perspective
DAS: Densely-Anchored Sampling for Deep Metric Learning
Fast Two-step Blind Optical Aberration Correction
Negative Frames Matter in Egocentric Visual Query 2D Localization
Neighborhood Collective Estimation for Noisy Label Identification and Correction
PlaneFormers: From Sparse View Planes to 3D Reconstruction
SLiDE: Self-supervised LiDAR De-snowing through Reconstruction Difficulty
Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects
Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer
Learning Omnidirectional Flow in 360-degree Video via Siamese Representation
Inpainting at Modern Camera Resolution by Guided PatchMatch with Auto-Curation
Contrastive Positive Mining for Unsupervised 3D Action Representation Learning
Speaker-adaptive Lip Reading with User-dependent Padding
Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast
Rethinking Robust Representation Learning Under Fine-grained Noisy Faces
RDA: Reciprocal Distribution Alignment for Robust SSL
RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild
PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees
MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition
PRIF: Primary Ray-based Implicit Function
Learning Semantic Correspondence with Sparse Annotations
CCRL: Contrastive Cell Representation Learning
Pose Forecasting in Industrial Human-Robot Collaboration
Combating Label Distribution Shift for Active Domain Adaptation
Matching Multiple Perspectives for Efficient Representation Learning
Uncertainty-guided Source-free Domain Adaptation
Context-Aware Streaming Perception in Dynamic Environments
Towards an Error-free Deep Occupancy Detector for Smart Camera Parking System
AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets
DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning
L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training
ConMatch: Semi-Supervised Learning with Confidence-Guided Consistency Regularization
Unifying Visual Perception by Dispersible Points Learning
Visual Cross-View Metric Localization with Dense Uncertainty Estimates
GCISG: Guided Causal Invariant Learning for Improved Syn-to-real Generalization
SIM2E: Benchmarking the Group Equivariant Capability of Correspondence Matching Algorithms
Artifact-Based Domain Generalization of Skin Lesion Models
Fuse and Attend: Generalized Embedding Learning for Art and Sketches
Effectiveness of Function Matching in Driving Scene Recognition
Consistency Regularization for Domain Adaptation
IMPaSh: A Novel Domain-shift Resistant Representation for Colorectal Cancer Tissue Classification
Deep Structural Causal Shape Models
Learning from Noisy Labels with Coarse-to-Fine Sample Credibility Modeling
Anatomy-Aware Contrastive Representation Learning for Fetal Ultrasound
The Value of Out-of-Distribution Data
Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization
RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN
Cross-Camera View-Overlap Recognition
On the Design of Privacy-Aware Cameras: a Study on Deep Neural Networks
Discovering Transferable Forensic Features for CNN-generated Images Detection
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks
Learning Continuous Implicit Representation for Near-Periodic Patterns
NeuralSI: Structural Parameter Identification in Nonlinear Dynamical Systems
Take One Gram of Neural Features, Get Enhanced Group Robustness
CIRCLe: Color Invariant Representation Learning for Unbiased Classification of Skin Lesions
ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer
Probing Contextual Diversity for Dense Out-of-Distribution Detection
CAIR: Fast and Lightweight Multi-Scale Color Attention Network for Instagram Filter Removal
FUSION: Fully Unsupervised Test-Time Stain Adaptation via Fused Normalization Statistics
Style-Agnostic Reinforcement Learning
LiteDepth: Digging into Fast and Accurate Depth Estimation on Mobile Devices
Unpaired Image Translation via Vector Symbolic Architectures
CNSNet: A Cleanness-Navigated-Shadow Network for Shadow Removal
Semi-Supervised Domain Adaptation by Similarity based Pseudo-label Injection
Recurrent Bilinear Optimization for Binary Neural Networks
Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions
Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies
Interpretations Steered Network Pruning via Amortized Inferred Saliency Maps
Exploring Anchor-based Detection for Ego4D Natural Language Query
Detecting Driver Drowsiness as an Anomaly Using LSTM Autoencoders
Switchable Online Knowledge Distillation
Self-supervised Human Mesh Recovery with Cross-Representation Alignment
Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection
PointScatter: Point Set Representation for Tubular Structure Extraction
Adversarial Coreset Selection for Efficient Robust Training
Out-of-Vocabulary Challenge Report
DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction
MIPI 2022 Challenge on RGB+ToF Depth Completion: Dataset and Report
MIPI 2022 Challenge on Quad-Bayer Re-mosaic: Dataset and Report
MIPI 2022 Challenge on Under-Display Camera Image Restoration: Methods and Results
Hydra Attention: Efficient Attention with Many Heads