GenSim: Generating Robotic Simulation Tasks via Large Language Models
Project Page | Arxiv | Gradio Demo | Huggingface Dataset | Finetuned Code-LLama Model | GPTs
This repo explores the use of an LLM code generation pipeline to write simulation environments and expert goals to augment diverse simulation tasks. Strongly recommend also checking out the Gradio Demo and GPTs.
pip install -r requirements.txt
python setup.py develop
export GENSIM_ROOT=$(pwd)
export OPENAI_KEY=YOUR KEY
. We use OpenAI's GPT-4 as the language model. You need to have an OpenAI API key to run task generation with GenSim. You can get one from here.After the installation process, you can run:
# basic bottom-up prompt
python gensim/run_simulation.py disp=True prompt_folder=vanilla_task_generation_prompt_simple
# bottom-up template generation
python gensim/run_simulation.py disp=True prompt_folder=bottomup_task_generation_prompt save_memory=True load_memory=True task_description_candidate_num=10 use_template=True
# top-down task generation
python gensim/run_simulation.py disp=True prompt_folder=topdown_task_generation_prompt save_memory=True load_memory=True task_description_candidate_num=10 use_template=True target_task_name="build-house"
# task-conditioned chain-of-thought generation
python gensim/run_simulation.py disp=True prompt_folder=topdown_chain_of_thought_prompt save_memory=True load_memory=True task_description_candidate_num=10 use_template=True target_task_name="build-car"
python misc/purge_task.py -f color-sequenced-block-insertion
python misc/add_task_from_code.py -f ball_on_box_on_container
cliport/generated_tasks
should have automatically been importeddemo.py
for visualization. For instance, python cliport/demos.py n=200 task=build-car mode=test disp=True
.train
, val
, test
datasets with demos.py
train.py
eval.py
to find the best checkpoint on val
tasks and save *val-results.json
*val-results.json
on test
tasks with eval.py
Prepare data using python gensim/prepare_finetune_gpt.py
. Released dataset is here
Finetune using openai api openai api fine_tunes.create --training_file output/finetune_data_prepared.jsonl --model davinci --suffix 'GenSim'
Evaluate it using python gensim/evaluate_finetune_model.py +target_task=build-car +target_model=davinci:ft-mit-cal:gensim-2023-08-06-16-00-56
Compare with python gensim/run_simulation.py disp=True prompt_folder=topdown_task_generation_prompt_simple load_memory=True task_description_candidate_num=10 use_template=True target_task_name="build-house" gpt_model=gpt-3.5-turbo-16k trials=3
Compare with python gensim/run_simulation.py disp=True prompt_folder=topdown_task_generation_prompt_simple_singleprompt load_memory=True task_description_candidate_num=10 target_task_name="build-house" gpt_model=gpt-3.5-turbo-16k
turbo finetuned models. python gensim/evaluate_finetune_model.py +target_task=build-car +target_model=ft:gpt-3.5-turbo-0613: trials=3 disp=True
Finetune Code-LLAMA using hugging-face transformer library here
offline eval: python -m gensim.evaluate_finetune_model_offline model_output_dir=after_finetune_CodeLlama-13b-Instruct-hf_fewshot_False_epoch_10_0
scripts/task_list/GPT_*.json
for a list of benchmark settings. Pretrained multitask models can be found here.bash scripts/generate_datasets.sh data 'align-box-corner assembling-kits block-insertion'
sh scripts/train_test_multi_task.sh data "[align-rope,align-box-corner]
sh scripts/train_test_single_task.sh data align-box-corner
0.5-0.8
is good range for diversity, 0.0-0.2
is for stable results.gensim
and training and task scripts are in cliport
.prompts/
folder stores different kinds of prompts to get the desired environments. Each folder contains a sequence of prompts as well as a meta_data file. prompts/data
stores the base task library and the generated task library.generated_tasks/
. Use demo.py
to play with them. cliport/demos_gpt4.py
is an all-in-one prompt script that can be converted into ipython notebook.output/output_stats
, figure results saved in output/output_figures
, policy evaluation results are saved in output/cliport_output
.generated_task.py
then run
python cliport/demos.py n=50 task=gen-task disp=True
batchsize>1
and can run with more recent versions of pytorch and pytorch lightning.python cliport/demos.py n=310 task=align-box-corner mode=test disp=True +record.blender_render=True record.save_video=True
If you find GenSim useful in your research, please consider citing:
@inproceedings{wang2023gen,
author = {Lirui Wang and Yiyang Ling and Zhecheng Yuan and Mohit Shridhar and Chen Bao and Yuzhe Qin and Bailin Wang and Huazhe Xu and Xiaolong Wang},
title = {GenSim: Generating Robotic Simulation Tasks via Large Language Models},
booktitle = {Arxiv},
year = {2023}
}