"UrbanGPT: Spatio-Temporal Large Language Models"
A pytorch implementation for the paper: [UrbanGPT: Spatio-Temporal Large Language Models]
Zhonghang Li, Lianghao Xia, Jiabin Tang, Yong Xu, Lei Shi, Long Xia, Dawei Yin, Chao Huang* (*Correspondence)
Data Intelligence Lab@University of Hong Kong, South China University of Technology, Baidu Inc
โข ๐ ไธญๆๅๅฎข
This repository hosts the code, data, and model weights of UrbanGPT.
๐ฏ๐ฏ๐ข๐ข We upload the models and data used in our UrbanGPT on ๐ค Huggingface. We highly recommend referring to the table below for further details:
๐ค Huggingface Address | ๐ฏ Description |
---|---|
https://huggingface.co/bjdwh/UrbanGPT | It's the checkpoint of our UrbanGPT based on Vicuna-7B-v1.5-16k tuned on instruction data train-data |
https://huggingface.co/datasets/bjdwh/ST_data_urbangpt | We release a portion of the dataset for evaluation. |
In this work, we present a spatio-temporal large language model that can exhibit exceptional generalization capabilities across a wide range of downstream urban tasks. To achieve this objective, we present the UrbanGPT, which seamlessly integrates a spatio-temporal dependency encoder with the instruction-tuning paradigm. This integration enables large language models (LLMs) to comprehend the complex inter-dependencies across time and space, facilitating more comprehensive and accurate predictions under data scarcity. Extensive experimental findings highlight the potential of building LLMs for spatio-temporal learning, particularly in zero-shot scenarios.
https://github.com/HKUDS/UrbanGPT/assets/90381931/9cd094b4-8fa3-486f-890d-631a08b19b4a
.
| README.md
| urbangpt_eval.sh
| urbangpt_train.sh
|
+---checkpoints
| \---st_encoder
| pretrain_stencoder.pth
|
+---playground
| | inspect_conv.py
| |
| +---test_embedding
| | README.md
| | test_classification.py
| | test_semantic_search.py
| | test_sentence_similarity.py
| |
| \---test_openai_api
| anthropic_api.py
| openai_api.py
|
+---tests
| test_openai_curl.sh
| test_openai_langchain.py
| test_openai_sdk.py
|
\---urbangpt
| constants.py
| conversation.py
| utils.py
| __init__.py
|
+---eval
| | run_urbangpt.py # evaluation
| | run_vicuna.py
| |
| \---script
| run_model_qa.yaml
|
+---model
| | apply_delta.py
| | apply_lora.py
| | builder.py
| | compression.py
| | convert_fp16.py
| | make_delta.py
| | model_adapter.py
| | model_registry.py
| | monkey_patch_non_inplace.py
| | STLlama.py # model
| | utils.py
| | __init__.py
| |
| \---st_layers
| args.py
| ST_Encoder.conf
| ST_Encoder.py # ST-Encoder
| __init__.py
|
+---protocol
| openai_api_protocol.py
|
+---serve
| | api_provider.py
| | bard_worker.py
| | cacheflow_worker.py
| | cli.py
| | controller.py
| | controller_graph.py
| | gradio_block_arena_anony.py
| | gradio_block_arena_named.py
| | gradio_css.py
| | gradio_patch.py
| | gradio_web_server.py
| | gradio_web_server_graph.py
| | gradio_web_server_multi.py
| | huggingface_api.py
| | inference.py
| | model_worker.py
| | model_worker_graph.py
| | openai_api_server.py
| | register_worker.py
| | test_message.py
| | test_throughput.py
| | __init__.py
| |
| +---examples
| | extreme_ironing.jpg
| | waterview.jpg
| |
| +---gateway
| | nginx.conf
| | README.md
| |
| \---monitor
| basic_stats.py
| clean_battle_data.py
| elo_analysis.py
| hf_space_leaderboard_app.py
| monitor.py
|
\---train
llama2_flash_attn_monkey_patch.py
llama_flash_attn_monkey_patch.py
stchat_trainer.py
train_lora.py
train_mem.py
train_st.py # train
Please first clone the repo and install the required environment, which can be done by running the following commands:
conda create -n urbangpt python=3.9.13
conda activate urbangpt
# Torch with CUDA 11.7
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
# To support vicuna base model
pip3 install "fschat[model_worker,webui]"
# To install pyg and pyg-relevant packages
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.1+cu117.html
# Clone our UrabnGPT or download it
git clone https://github.com/HKUDS/UrbanGPT.git
cd UrbanGPT
# Install required libraries
# (The recommendation is to install separately using the following method)
pip install deepspeed
pip install ray
pip install einops
pip install wandb
# ๏ผThere is a version compatibility issue between "flash-attn" and "transformers". Please refer to the flash-attn [GitHub URL](https://github.com/Dao-AILab/flash-attention) for more information.๏ผ
pip install flash-attn==2.3.5 # or download from (https://github.com/Dao-AILab/flash-attention/releases, e.g. flash_attn-2.3.5+cu117torch2.0cxx11abiFALSE-cp39-cp39-linux_x86_64.whl)
pip install transformers==4.34.0
# ๏ผor you can install according to the requirements file.๏ผ
pip install -r requirements.txt
UrabnGPT is trained based on following excellent existing models. Please follow the instructions to prepare the checkpoints.
Vicuna
:
Prepare our base model Vicuna, which is an instruction-tuned chatbot and base model in our implementation. Please download its weights here. We generally utilize v1.5 and v1.5-16k model with 7B parameters. You should update the 'config.json' of vicuna, for example, the 'config.json' in v1.5-16k can be found in config.json
Spatio-temporal Encoder
:
We employ a simple TCNs-based spatio-temporal encoder to encode the spatio-temporal dependencies. The weights of st_encoder are pre-trained through a typical multi-step spatio-temporal prediction task.
Spatio-temporal Train Data
:
We utilize pre-training data consisting of New York City's taxi, bike, and crime data, including spatio-temporal statistics, recorded timestamps, and information about regional points of interest (POIs). These data are organized in train_data. Please download it and put it at ./UrbanGPT/ST_data_urbangpt/train_data
# to fill in the following path to run our UrbanGPT!
model_path=./checkpoints/vicuna-7b-v1.5-16k
instruct_ds=./ST_data_urbangpt/train_data/multi_NYC.json
st_data_path=./ST_data_urbangpt/train_data/multi_NYC_pkl.pkl
pretra_ste=ST_Encoder
output_model=./checkpoints/UrbanGPT
wandb offline
python -m torch.distributed.run --nnodes=1 --nproc_per_node=8 --master_port=20001 \
urbangpt/train/train_mem.py \
--model_name_or_path ${model_path} \
--version v1 \
--data_path ${instruct_ds} \
--st_content ./TAXI.json \
--st_data_path ${st_data_path} \
--st_tower ${pretra_ste} \
--tune_st_mlp_adapter True \
--st_select_layer -2 \
--use_st_start_end \
--bf16 True \
--output_dir ${output_model} \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2400 \
--save_total_limit 1 \
--learning_rate 2e-3 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True \
--report_to wandb
You could start the second stage tuning by filling blanks at urbangpt_eval.sh. There is an example as below:
# to fill in the following path to evaluation!
output_model=./checkpoints/tw2t_multi_reg-cla-gird
datapath=./ST_data_urbangpt/NYC_taxi_cross-region/NYC_taxi.json
st_data_path=./ST_data_urbangpt/NYC_taxi_cross-region/NYC_taxi_pkl.pkl
res_path=./result_test/cross-region/NYC_taxi
start_id=0
end_id=51920
num_gpus=8
python ./urbangpt/eval/run_urbangpt.py --model-name ${output_model} --prompting_file ${datapath} --st_data_path ${st_data_path} --output_res_path ${res_path} --start_id ${start_id} --end_id ${end_id} --num_gpus ${num_gpus}
If you find UrbanGPT useful in your research or applications, please kindly cite:
@misc{li2024urbangpt,
title={UrbanGPT: Spatio-Temporal Large Language Models},
author={Zhonghang Li and Lianghao Xia and Jiabin Tang and Yong Xu and Lei Shi and Long Xia and Dawei Yin and Chao Huang},
year={2024},
eprint={2403.00813},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
You may refer to related work that serves as foundations for our framework and code repository, Vicuna. We also partially draw inspirations from GraphGPT. The design of our website and README.md was inspired by NExT-GPT, and the design of our system deployment was inspired by gradio and Baize. Thanks for their wonderful works.