Official implementation of Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior
Nan Huang, Ting Zhang, Yuhui Yuan, Dong Chen, Shanghang Zhang
We only test on Ubuntu 22 with torch 2.0.1 & CUDA 11.7 on an A100. Make sure git, wget, Eigen are installed.
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
apt update && apt upgrade
apt install git wget libeigen3-dev -y
Install with pip:
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install git+https://github.com/facebookresearch/pytorch3d.git
pip install git+https://github.com/S-aiueo32/contextual_loss_pytorch.git@4585061
pip install ./raymarching
pip install git+https://github.com/facebookresearch/segment-anything.git
Other dependencies:
pip install -r requirements.txt
Zero-1-to-3 for 3D diffusion prior.
We use zero123-xl.ckpt
by default, reimplementation borrowed from Stable Diffusion repo, and is available in nerf/zero123.py
.
cd pretrained/zero123
wget https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt
cd ../../
MiDaS for depth estimation.
We use dpt_beit_large_512.pt
. Put it in folder pretrained/midas/
mkdir -p pretrained/midas
cd pretrained/midas
wget https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_beit_large_512.pt
cd ../../
Omnidata for normal estimation.
mkdir pretrained/omnidata
cd pretrained/omnidata
# assume gdown is installed
gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t'
cd ../../
SAM to segement foreground mask of an object.
cd mask
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
cd ..
In the ./data
directory, we have included some preprocessed files that already extracted multi-modal images. If you want to test your own example, follow the following preprocessing steps and follow the file structure in ./data
. Takes seconds.
You can preprocess single image.
python preprocess_image.py --path /path/to/image
You can also preprocess images in list or directory.
bash scripts/preprocess_list.sh $GPU_IDX
bash scripts/preprocess_folder.sh $GPU_IDX /path/to/dir
Customize-It-3D uses the default DreamBooth from diffuers. To finetune multi-modal DreamBooth:
bash dreambooth/dreambooth.sh $GPU_IDX $INSTANCE_DIR $OUTPUT_DIR $CLASS_NAME $CLASS_DIR
$INSTANCE_DIR is the path to directory containing your own image.
$OUTPUT_DIR is the path where to save the trained model.
$CLASS_NAME is the text prompt describing the class of the generated sample images.
$CLASS_DIR is the path to a folder containing the generated class sample images.
For example:
bash dreambooth/dreambooth.sh 0 data/horse out/horse horse images_gen/horse
Don't forget the path of your trained model (in ./out
directory).
We use progressive training strategy to generate a full 360° 3D geometry.
bash scripts/run.sh $GPU_IDX $WORK_SPACE $REF_PATH $Enable_First_Stage $Enable_Second_Stage $TRAINED_MODEL_PATH $CLASS_NAME {More_Arugments}
As an example, run Customize-It-3D in the horse example whose trained multi-modal DreamBooth model is out/horse
using both stages in GPU 0 and set the workspace and class name as horse
, by the following command:
bash scripts/run.sh 0 horse data/horse/rgba/rgba.png 1 1 out/horse horse
scripts/run_folder.sh
scripts/run_list.sh
If you find this work useful, a citation will be appreciated via:
@misc{huang2023customizeit3d,
title={Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior},
author={Nan Huang and Ting Zhang and Yuhui Yuan and Dong Chen and Shanghang Zhang},
year={2023},
eprint={2312.11535},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This code borrows heavily from Stable-Dreamfusion, many thanks to the author.