Abstract: Neural network architectures have become complicated (wider/deeper) in order to make the prediction accuracy as higher as possible, and consequently, the computational penalty when training these neural networks tend to be larger compared to their predecessors. This is because a) more computational operations (e.g., convolutions) and parameters (e.g., weights) are included in the neural network architecture; b) more computational resources (e.g., GPUs) are needed; c) the training time for a neural network is not reduced proportionally when for example, the GPU counts are raised; d) the GPU memory capacity is a potential limit and cannot be increased accordingly. To decrease the cost, memory systems in terms of CPU play an important role to the neural network research and need to be paid more attention to. For instance, gradients of some layers in the forward pass can be saved to the file system, and used directly in the backward pass.
Please use this link to attend the virtual seminar.
Meeting ID: 793906484 / Participant passcode: 2806