This talk will introduce performance analysis techniques for deep learning applications using the NVIDIA Nsight Systems profiling tool to peek under the covers. We will cover how to collect performance information for the neural network layers to relate GPU work back to those higher-level concepts, as well as for other sections of code that feed the DNN or consume its results. You will gain deeper insights into the execution and interactions among the processes, OS, GPU CUDA kernels, Tensor Cores, NVLinks, and even nodes. We will discuss ways to access report data for deeper analysis as well as some common pitfalls with both training and inference applications.