Helmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
Experimenting
with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
Author: Helmut Hoffer von Ankershoffen né Oertel
Hints:
nvcr.io/nvidia/l4t-base
base image provided by NVIDIAmax
as worker node labeled as jetson:true
and jetson_model:[nano,xavier]
- see Project Max reg. max
jetson/[nano,xavier]/ml-base
including CUDA, CUDNN, TensorRT, TensorFlow and Anaconda for arm64
jetson/[nano,xavier]/tensorflow-serving-base
using bazel and the latest TensorFlow core including support for CUDA capabilities of Jetson edge devicesufw
for basic securityTotal ca. $210 including options.
Hints:
Total ca. $767.
Hints:
As the eMMC soldered onto a Xavier board is 32GB only an SSD is required to provide adequate disk space for Docker images and volume.
/var/lib/docker
directorymake bootstrap-environment
to install requirements on your macOS device and setup hostnames such as nano-one.local
in your /etc/hosts
Hint:
make
is used as a facade for all workflows triggered from the development workstation - execute make help
on your macOS device to list all targets and see the Makefile
in this directory for detailsworkflow/provision/group_vars/all.yml
workflow/requirements/macOS/ansible/requirements.yml
for a list of packages and applications installedprovision
account as part of oem-setup and automatically establish secure accessmake nano-image-download
on your macOS device to download and unzip the NVIDIA Jetpack image into workflow/provision/image/
balenaEtcher
application and flash your micro sd card with the downloaded imageprovision
and "Administrator" rights via the UI and set nano-one
as hostname - wait until the login screen appearsmake setup-access-secure
and enter the password you set for the provision
user in the step above when asked - passwordless ssh access and sudo will be set upHints:
balenaEtcher
application was installed as part of bootstrap - see abovenano-one.local
in workflow/requirements/generic/ansible/playbook.yml
- run make requirements-hosts
after updating the IP address which in turn updates the /etc/hosts
file of your macmake nano-one-ssh
- your ssh public key was uploaded in step 6 above so no password is asked forprovision
account on Xavier as part of oem-setup and automatically establish secure accessmake guest-sdk-manager-download
on your macOS device and follow instructions shown to download the NVIDIA SDK Manager installer into workflow/guest/download/
- if you fail to download the NVIDIA SDK manager you will be instructed in the next step on how to do it.make guest-build
to automatically a) create, boot and provision a Ubuntu guest VM on your macOS device using Vagrant, Virtual Box and Ansible and b) build a custom kernel and rootfs inside the guest VM for flashing the Xavier device - the Linux kernel is built with requirements for Kubernetes + Weave networking - such as activating IP sets and the Open vSwitch datapath module, SDK components are added to the rootfs for automatic installation during provisioning (see part 2). You will be prompted to download SDK components via the NVIDIA SDK manager that was automatically installed during provisioning of the guest VM - please do as instructed on-screen.make guest-flash
to flash the Xavier with the custom kernel and rootfs - wire up the Xavier with your macOS device using USB-C and enter the recovery mode by powering up and pressing the buttons as described in the printed user manual that was part of your Jetson AGX Xavier shipment before executionmake guest-oem-setup
to start the headless oem-setup process. Follow the on-screen instructions to setup your locale and timezone, create a user account called provision
and set an initial password - press the reset button of your Xavier after flashing before triggering the oem-setup.make setup-access-secure
and enter the password you set for the user provision
in the step above when asked - passwordless ssh access and sudo will be set upHints:
xavier-one.local
in workflow/requirements/generic/ansible/playbook.yml
- you can check the assigned IP after step 4 by logging in as user provision
with the password you set, executing ifconfig eth0 | grep 'inet'
and checking the IP address shown - run make requirements-hosts
after updating the IP address which in turn updates the /etc/hosts
file of your macmake xavier-one-ssh
- your ssh public key was uploaded in step 5 above so no password is asked formake provision
- amongst others services will provisioned, kernel will be compiled (on Nanos only), Kubernetes cluster will be joinedHints:
workflow/provision/group_var/all.yml
make provision-nanos
and make provision-xaviers
to provision one type of Jetson model only
kubectl get nodes
to check that your edge devices joined your cluster and are readyclick
from the terminal which was installed as part of bootstrap - enter nodes
to list the nodes with the nano being one of them - see Click for detailsnano-one.local:5901
or xavier-one.local:5901
respectively - the password is secret
/var/lib/docker
. For Nanos you can optionally use a SATA SSD as boot device as described below.make help | grep "provision-"
and execute the desired make target e.g. make provision-kernel
make ml-base-build-and-test
once to build the Docker base image jetson/[nano,xavier]/ml-base
, test via container structure tests and push to the private registry of your cluster, which, amongst others, includes CUDA, CUDNN, TensorRT, TensorFlow, python bindings and Anaconda - have a look at the directory workflow/deploy/ml-base
for details - most images below derive from this imagemake device-query-deploy
to build and deploy a pod into the Kubernetes cluster that queries CUDA capabilities thus validating GPU and Tensor Core access from inside Docker and correct labeling of Jetson/arm64 based Kubernetes nodes - execute make device-query-log-show
to show the result after deployingmake jupyter-deploy
to build and deploy a Jupyter server as a Kubernetes pod running on nano supporting CUDA accelerated TensorFlow + Keras including support for Octave as an alternative Jupyter Kernel in addition to iPython - execute make jupyter-open
to open a browser tab pointing to the Jupyter server to execute the bundled Tensorflow Jupyter notebooks for deep learning
make tensorflow-serving-base-build-and-test
once to build the TensorFlow Serving base image jetson/[nano,xavier]/tensorflow-serving-base
test via container structure tests and push to the private registry of your cluster - have a look at the directory workflow/deploy/tensorflow-serving-base
for details - most images below derive from this imagemake tensorflow-serving-deploy
to build and deploy TensorFlow Serving plus a Python/Fast API based Webservice for getting predictions as a Kubernetes pod running on nano - execute make tensorflow-serving-docs-open
to open browser tabs pointing to the interactive OAS3 documentation Webservice API; execute make tensorflow-serving-health-check
to check the health as used in K8s readiness and liveness probes; execute make tensorflow-serving-predict
to get predictionsHints:
export JETSON_MODEL=xavier
in your shell before executing the make ...
commands which will auto-activate the matching Skaffold profiles (see below) - if JETSON_MODEL
is not set the Nanos will be targetedjetson-$deployment
- e.g. jetson-jupyter
for the Jupyter deployment - thus easing inspection in the Kubernetes dashboard, click
or similarmake device-query-delete
which will automatically delete the respective namespace, persistent volumes, pods, services, ingress and loadbalancer on the clusterbuild
which was created during provisioning - use make nano-one-ssh-build
or make xavier-one-ssh-build
to login as user build
to monitor intermediate resultsworkflow/deploy/device-query/skaffold.yaml
and workflow/deploy/tools/builder
for the approachmake device-query-dev
as an example/dev/nv*
at runtime to access the GPU and Tensor Cores - see workflow/deploy/device-query/kustomize/base/deployment.yaml
for detailsworkflow/deploy/device-query/skaffold.yaml
and workflow/deploy/device-query/kustomize/overlays/xavier
for an example - in this case the xavier
profile is auto-activated respecting the JETSOON_MODEL
environment variable (see above) with the profile in turn activating the xavier
Kustomize overlaymake device-query-build-and-test
as an example.docker-hub.auth
in this directory (see docker-hub.auth.template
) and execute the approriate make target, e.g. make ml-base-publish
max-one
accordingly to wire up with your infrastructuretensorflow-serving
accesses TensorFlow Serving via its REST or alternatively the Python bindings of the gRPC API - have a look at the directory workflow/deploy/tensorflow-serving/src/webservice
for details of the implementationprovision-swap
Hints:
workflow/provision/group_vars/all.yml
make nano-one-reboot
ssd.id_serial_short
of the SSD in workflow/provision/group_vars
given the info provided by executing make nano-one-ssd-id-serial-short-show
nano-one-ssd-prepare
to assign the stable device name /dev/ssd, wipe and partition the SSD, create an ext4 filesystem and sync the micro SD card to the SSDssd.uuid
of the SSD in workflow/provision/group_vars
given the info provided by executing make nano-one-ssd-uuid-show
nano-one-ssd-activate
to configure the boot menu to use the SSD as the default boot device and rebootHints:
/mnt/mmc
after step 5 in case you want to update the kernel later which now resides in /mnt/mmc/boot/Image
make l4t-deploy
to cross-build Docker image on macOS using buildkit based on official base image from NVIDIA and deploy - functionality is identical to device-query
- see aboveHints:
workflow/deploy/l4t/builder.mac
on how this is achieveddaemon.json
switching on experimental Docker features on your macOS device required for this was automatically installed as part of bootstrap - see aboveprovision-firewall