TF 2.x and PyTorch Lightning Callbacks for GPU monitoring
gpumonitor
gives you stats about GPU usage during execution of your scripts and trainings,
as TensorFlow or
Pytorch Lightning callbacks.
Installation can be done directly from this repository:
pip install gpumonitor
monitor = gpumonitor.GPUStatMonitor(delay=1)
# Your instructions here
# [...]
monitor.stop()
monitor.display_average_stats_per_gpu()
It keeps track of the average of GPU statistics. To reset the average and start from fresh, you can also reset the monitor:
monitor = gpumonitor.GPUStatMonitor(delay=1)
# Your instructions here
# [...]
monitor.display_average_stats_per_gpu()
monitor.reset()
# Some other instructions
# [...]
monitor.display_average_stats_per_gpu()
Add the following callback to your training loop:
For TensorFlow,
from gpumonitor.callbacks.tf import TFGpuMonitorCallback
model.fit(x, y, callbacks=[TFGpuMonitorCallback(delay=0.5)])
For PyTorch Lightning,
from gpumonitor.callbacks.lightning import PyTorchGpuMonitorCallback
trainer = pl.Trainer(callbacks=[PyTorchGpuMonitorCallback(delay=0.5)])
trainer.fit(model)
You can customize the display format according to the gpustat
options. For example, display of watts consumption,
fan speed are available. To know which options you can change, refer to: