ROCm Machine Learning and HPC Stack installer
RET is a comprehensive checking, set up, installation, testing and benchmarking tool which does carry out the installation of ROCm suite ranging from dependencies, drivers and toolchain to framework and benchmark. RET makes the process of carrying out automated ROCm installation incredibly simple and provides a more user friendly and faster installation experience.
please refer to ROCm main repository at ROCmInstall.
Note: it is required to start with a clean system
Formatting a hard drive along with the install of a new OS is the best option after the installation you will need git to download the RET source
sudo apt -y install git
git clone https://github.com/rocmsys/RET.git
sudo ./ret <command> [<option>]
e.g.
sudo ./ret install rocm or sudo ./ret install tensorflow
Command:
[install] <Package> : Install ROCm or ML Framework TF/PT
[remove] <Package> : Remove ROCm or ML Framework TF/PT
[benchmark] <Packages> <Model> : Run benchmark for specific ML Framework
[build] <Container> <ImageName> : Build ROCm Container either with Docker or Singularity
Packages:
[rocm] : ROCm-dkms packages
[tensorflow] : TensorFlow framework
Model:
[resnet56] : ResNet-56 model. Default Model
Container:
[docker] : Build Docker Container
[singularity] : Build Singularity Container
[ImageName] : Choosing an OS Base Image. Default is [ubuntu:18.04]
Options:
[-py2|-py3] : Python version. Default is Python3
[-h|--help] : Show this help message
[-v|--version] : Show version of this package
[-V|--verbose] : Be verbose
[-d|--debug] : Enable Debug Mode
[-y|--yes] : Skip confirmation message
[-ns|--nsc] : Skip system check steps
[-nv|--nov] : Skip verification steps
[-ic|--incontainer] : Run RET on top of Container
cd RET
sudo ./ret install rocm # install ROCm stack
sudo reboot
sudo ./ret install tensorflow # install TensorFlow
Details on the benchmarks can be found at this Link.
Here are the basic instructions to run ResNet-56 benchmark:
sudo ./ret benchmark tensorflow resnet56
You can also use the TensorFlow benchmarks:
git clone https://github.com/reger-men/tensorflow_benchmark.git
python3 train.py
Note: You may need to add your GPU number --num_gpus=YOUR_GPU_NUMBER