The goal of source compression is to map any outcome of a discrete random variable $x ∼ p_d(x)$ in a finite symbol space $x ∈ S$ to its shortest possible binary representation. Given a tractable model probability mass function (PMF) $p(x)$ that approximates $p_d(x)$, entropy coders provide such an optimal mapping. As a result, the task of source compression is simplified to identifying a good model PMF for the data at hand.Even though the setup as described is the most commonly used one, there are restrictions to it. Entropy coders can only process one dimensional variables and process them sequentially. Hence the structure of the entropy coder implies a sequential structure of the data. This is a problem when compressing sets instead of sequences. In the first part of the talk, I present an optimal codec for sets [1]. The problem we encounter for sets can be generalized for many other structural priors in data. In the second part of the talk, I thus investigate the problem. We generalize rate distortion theory for structural data priors and develop a strategy to learn codecs for this data [2].
[1] Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding; Yangjun Ruan, Karen Ullrich, Daniel Severo, James Townsend, Ashish Khisti, Arnaud Doucet, Alireza Makhzani, Chris J Maddison; Oral @ICML
[2] Lossy Compression for Lossless Prediction; Yann Dubois, Benjamin Bloem-Reddy, Karen Ullrich, Chris J Maddison; Spotlight @ Neurips (edited)
Speaker Bio: Karen Ullrich is a research scientist at FAIR NY and is actively collaborating with researchers from the Vector Institute and the University of Amsterdam. My main research focus lies in the intersection of information theory and probabilistic machine learning / deep learning.
I previously completed a PhD under the supervision of Prof. Max Welling. Prior to that, I worked at the Austrian Research Institute for AI, Intelligent Music Processing and Machine Learning Group lead by Prof. Gerhard Widmer. I studied Physics and Numerical Simulations in Leipzig and Amsterdam.
See all upcoming talks at https://www.anl.gov/mcs/lans-seminar