DI Orchestrator Save

OpenDILab RL Kubernetes Custom Resource and Operator Lib

Project README

Build Releases

DI Orchestrator

DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator.

Prerequisites

  • A well-prepared kubernetes cluster. Follow the instructions to create a kubernetes cluster, or create a local kubernetes node referring to kind or minikube

Install DI Orchestrator

DI Orchestrator consists of two components: di-operator and di-server. Install them with the following command.

kubectl create -f ./config/di-manager.yaml

di-operator and di-server will be installed in di-system namespace.

$ kubectl get pod -n di-system
NAME                               READY   STATUS    RESTARTS   AGE
di-operator-57cc65d5c9-5vnvn       1/1     Running   0          59s
di-server-7b86ff8df4-jfgmp         1/1     Running   0          59s

Submit DIJob

# submit DIJob
$ kubectl create -f config/samples/atari-dqn-tasks.yaml

# get pod and you will see coordinator is created by di-operator
# a few seconds later, you will see collectors and learners created by di-server
$ kubectl get pod
NAME                         READY   STATUS    RESTARTS      AGE
job-with-tasks-collector-0   1/1     Running   0             2s
job-with-tasks-collector-1   1/1     Running   0             2s
job-with-tasks-evaluator-0   1/1     Running   0             2s
job-with-tasks-learner-0     1/1     Running   0             2s

# get logs of tasks
$ kubectl logs job-with-tasks-evaluator-0 
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
[06-28 08:25:29] INFO     Evaluator running on node 1                                                                                                           func.py:58
A.L.E: Arcade Learning Environment (version +a54a328)
[Powered by Stella]
/opt/conda/lib/python3.8/site-packages/ale_py/roms/__init__.py:44: UserWarning: ale_py.roms contains unsupported ROMs: /opt/conda/lib/python3.8/site-packages/AutoROM/roms/{joust.bin, warlords.bin, maze_craze.bin, combat.bin}
  warnings.warn(
[06-28 08:25:46] INFO     Evaluation: Train Iter(0)       Env Step(0)     Eval Reward(-21.000)                                                                  func.py:58
[06-28 08:25:46] WARNING  You have not installed memcache package! DI-engine has changed to some alternatives. 

$ kubectl logs job-with-tasks-learner-0
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
[06-28 08:25:27] INFO     Learner running on node 0

User Guide

Refers to user-guide. For Chinese version, please refer to 中文手册

Contributing

Refers to developer-guide.

Contact us throw [email protected]

Open Source Agenda is not affiliated with "DI Orchestrator" Project. README Source: opendilab/DI-orchestrator
Stars
220
Open Issues
0
Last Commit
1 year ago
License

Open Source Agenda Badge

Open Source Agenda Rating