DI Orchestrator Save

OpenDILab RL Kubernetes Custom Resource and Operator Lib

Project README

DI Orchestrator

DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator.

Prerequisites

A well-prepared kubernetes cluster. Follow the instructions to create a kubernetes cluster, or create a local kubernetes node referring to kind or minikube

Install DI Orchestrator

DI Orchestrator consists of two components: di-operator and di-server. Install them with the following command.

kubectl create -f ./config/di-manager.yaml

di-operator and di-server will be installed in di-system namespace.

$ kubectl get pod -n di-system
NAME                               READY   STATUS    RESTARTS   AGE
di-operator-57cc65d5c9-5vnvn       1/1     Running   0          59s
di-server-7b86ff8df4-jfgmp         1/1     Running   0          59s

Submit DIJob

# submit DIJob
$ kubectl create -f config/samples/atari-dqn-tasks.yaml

# get pod and you will see coordinator is created by di-operator
# a few seconds later, you will see collectors and learners created by di-server
$ kubectl get pod
NAME                         READY   STATUS    RESTARTS      AGE
job-with-tasks-collector-0   1/1     Running   0             2s
job-with-tasks-collector-1   1/1     Running   0             2s
job-with-tasks-evaluator-0   1/1     Running   0             2s
job-with-tasks-learner-0     1/1     Running   0             2s

# get logs of tasks
$ kubectl logs job-with-tasks-evaluator-0 
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
[06-28 08:25:29] INFO     Evaluator running on node 1                                                                                                           func.py:58
A.L.E: Arcade Learning Environment (version +a54a328)
[Powered by Stella]
/opt/conda/lib/python3.8/site-packages/ale_py/roms/__init__.py:44: UserWarning: ale_py.roms contains unsupported ROMs: /opt/conda/lib/python3.8/site-packages/AutoROM/roms/{joust.bin, warlords.bin, maze_craze.bin, combat.bin}
  warnings.warn(
[06-28 08:25:46] INFO     Evaluation: Train Iter(0)       Env Step(0)     Eval Reward(-21.000)                                                                  func.py:58
[06-28 08:25:46] WARNING  You have not installed memcache package! DI-engine has changed to some alternatives. 

$ kubectl logs job-with-tasks-learner-0
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
[06-28 08:25:27] INFO     Learner running on node 0

User Guide

Refers to user-guide. For Chinese version, please refer to 中文手册

Contributing

Refers to developer-guide.

Open Source Agenda is not affiliated with "DI Orchestrator" Project. README Source: opendilab/DI-orchestrator

Stars

220

Open Issues

Last Commit

1 year ago

Repository

opendilab/DI-orchestrator

License

Apache-2.0

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/di-orchestrator"><img src="https://www.opensourceagenda.com/projects/di-orchestrator/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022