X-modaler is a versatile and high-performance codebase for cross-modal a...
pytorch implementation of video captioning
Video to Text: Natural language description generator for some given vid...
Auto transcribe tool based on whisper
This repository contains the code for a video captioning system inspired...
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
A curated list of Multimodal Captioning related research(including image...
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning mod...
A PyTorch implementation of state of the art video captioning models fro...
Attention Bidirectional Video Recurrent Net
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Represe...
Video captioning baseline models on Video2Commonsense Dataset.
🎬 Video Captioning: ICCV '15 paper implementation
Convert SRT formatted subtitle to WebVTT on the fly over HTML5/browser e...
What and How Well You Performed? A Multitask Learning Approach to Action...