A modular framework for vision & language multimodal research from Faceb...
Official code for paper "Spatially Aware Multimodal Transformers for Tex...