Combining OwlViT with Segment Anything - Open-vocabulary Detection and Segmentation (Text-conditioned, and Image-conditioned)
An interesting demo by combining OWL-ViT of Google and Segment Anything of Meta!
prompt: a bird with a yellow wing
The code requires python>=3.8
, as well as pytorch>=1.7
and torchvision>=0.8
. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
Install Segment Anything:
python -m pip install -e segment_anything
Install OWL-ViT (the OWL-ViT is included in transformer library):
pip install transformer
More details can be found in installation segment anything
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
bash run_demo.sh
Please give applause for IDEA-Research and OWL-ViT on HuggingFace
If you find this project helpful for your research, please consider citing the following BibTeX entry.
@article{kirillov2023segany,
title={Segment Anything},
author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
journal={arXiv:2304.02643},
year={2023}
}
@misc{minderer2022simple,
title={Simple Open-Vocabulary Object Detection with Vision Transformers},
author={Matthias Minderer and Alexey Gritsenko and Austin Stone and Maxim Neumann and Dirk Weissenborn and Alexey Dosovitskiy and Aravindh Mahendran and Anurag Arnab and Mostafa Dehghani and Zhuoran Shen and Xiao Wang and Xiaohua Zhai and Thomas Kipf and Neil Houlsby},
year={2022},
eprint={2205.06230},
archivePrefix={arXiv},
primaryClass={cs.CV}
}