This talk will introduce performance analysis techniques for deep learning applications using the NVIDIA Nsight Systems profiling tool to peek under the covers. We will cover how to collect performance information for the neural network layers to relate GPU work back to those higher-level concepts, as well as for other sections of code that feed the DNN or consume its results. You will gain deeper insights into the execution and interactions among the processes, OS, GPU CUDA kernels, Tensor Cores, NVLinks, and even nodes. We will discuss ways to access report data for deeper analysis as well as some common pitfalls with both training and inference applications.
Speaker Bio(s)
Daniel Horowitz is a Director of Engineering at NVIDIA. His teams develop tools focused on helping you, regardless of industry, to be more efficient and effective at harnessing the power of NVIDIA technologies including GPU, CPU, SOC, NIC/HCA, DPU, etc.
Tod Courtney is a Senior System Software Engineer at NVIDIA on the Nsight Systems development team, where he is currently working on new profiling and data analysis capabilities for HPC cluster users. He has a broad background in GPU and CPU development and performance optimization across a variety of scientific computing domains.