Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
No resources for this project.