A modular framework for vision & language multimodal research from Faceb...
Fully-Convolutional Point Networks for Large-Scale Point Clouds
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning an...
Python code for handling the Clotho dataset.
A Base Tensorflow Project for Medical Report Generation
A Tennis dataset and models for event detection & commentary generation
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://...
What and How Well You Performed? A Multitask Learning Approach to Action...
A Pytorch implementation of Attention on Attention module (both self and...
Audio captioning baseline system for DCASE 2020 challenge.
My notes on some Deep Learning papers