Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
GroupViT is a framework for learning semantic segmentation purely from text captions without using any mask supervision. It learns to perform bottom-up heirarchical spatial grouping of semantically-related visual regions. This repository is the official implementation of GroupViT introduced in the paper:
GroupViT: Semantic Segmentation Emerges from Text Supervision, Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang, CVPR 2022.