Recommand · October 16, 2021 0

MaskRCNN with RTX3090 very slow (GPU Spiking)

I’ve got following problem:

I am training a TensorFlow 2.6 Matterport MaskRCNN-Port on my RTX3090. I’ve installed CUDA 11.4 and CUDNN.
The GPU is shown in Tensorboard.

When I’m training the GPU is idle on 0% GPU-util and suddenly spikes after a time, processing a batch of images. The CPU (64 core) is going 100% on ~2 cores beforehand.

Why is there as much idle time? Tensorboard says there is no input-pipeline "lag".


input pipeline

Trace view

maskrcnn log


Linux, RTX3090, 64-core CPU (EPYC), CUDNN 8.0.2, CUDA 11.4.1