This session will present the use of Nsight Compute for analyzing the performance of individual GPU kernels on the NVIDIA GPUs that power ALCF's ThetaGPU and NERSC's Perlmutter. We will walk through some simple compute kernels, which are compute-bound and memory bandwidth-bound, and learn how to profile them with Nsight Compute, generate roofline charts, and analyze the performance of those kernels. We will then introduce a sample realistic kernel from an HPC application and discuss how comprehensive kernel analysis can be used in an iterative process to substantially speed up key application bottlenecks. The webinar will conclude with an interactive demo of Nsight Compute. The goal is for the user to be able to determine whether the performance of a compute construct is “good enough” relative to the capabilities of the hardware and, if not, what steps should be taken to address this.