ViTAE Transformer Remote Sensing Save

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

Project README

:alarm_clock: The repo of the paper "An Empirical Study of Remote Sensing Pretraining" has been moved to RSP

Remote Sensing

This repo contains a comprehensive list of our research works related to Remote Sensing. For any related questions, please contact Di Wang at [email protected] or [email protected].

Overview

1. An Empirical Study of Remote Sensing Pretraining [TGRS-2022]

2. Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model [TGRS-2022]

3. SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model [NeurIPS-2023]

4. MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [arXiv-2024]

Projects

📘 An Empirical Study of Remote Sensing Pretraining [TGRS-2022]

Di Wang∗, Jing Zhang∗, Bo Du, Gui-Song Xia and Dacheng Tao

Paper | Github Code | BibTex

We train different networks from scratch with the help of the largest remote sensing scene recognition dataset up to now-MillionAID, to obtain the remote sensing pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks. Then, we investigate the impact of ImageNet pretraining (IMP) and remote sensing pretraining (RSP) on a series of downstream tasks including scene recognition, semantic segmentation, object detection, and change detection using the CNN and vision transformers backbones.


📘 Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model [TGRS-2022]

Di Wang∗, Qiming Zhang∗, Yufei Xu∗, Jing Zhang, Bo Du, Dacheng Tao and Liangpei Zhang.

Paper | Github Code | BibTex

We resort to plain vision transformers with about 100M and make the first attempt to propose large vision models customized for RS tasks and propose a new rotated varied-size window attention (RVSA) to substitute the original full attention to handle the large image size and objects of various orientations in RS images. The RVSA could significantly reduce the computational cost and memory footprint while learn better object representation by extracting rich context from the generated diverse windows.


📘 SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model [NeurIPS-2023]

Di Wang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, Dacheng Tao and Liangpei Zhang.

Paper | Github Code | BibTex

In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS surpasses existing high-resolution RS segmentation datasets in size by several orders of magnitude, and provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. We hope it could facilitate research in RS segmentation, particularly in large model pre-training.


📘 MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [arXiv-2024]

Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao and Liangpei Zhang.

Paper | Github Code | BibTex

In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. We hope this research encourages further exploration of RS foundation models and anticipate the widespread application of these models across diverse fields of RS image interpretation.

Open Source Agenda is not affiliated with "ViTAE Transformer Remote Sensing" Project. README Source: ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing

Open Source Agenda Badge

Open Source Agenda Rating