This is an official implementation for "Swin Transformer: Hierarchical V...
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Bottom-up attention model for image captioning and VQA, based on Faster ...
This repository contains the source code of our work on designing effici...
VarifocalNet: An IoU-aware Dense Object Detector
Video Platform for Action Recognition and Object Detection in Pytorch
generate captions for images using a CNN-RNN model that is trained on th...
A tensorflow implement mobilenetv3 centernet, which can be easily deploy...
Adds SPICE metric to coco-caption evaluation server codes
A tool for converting computer vision label formats.
Visually informed embedding of word (VIEW) is a tool for transferring mu...
Implementation of models in our EMNLP 2019 paper: A Logic-Driven Framewo...