We consider a distributionally robust Partially Observable Markov Decision Process (DR-POMDP), where the distribution of the transition-observation probabilities is unknown at the beginning of each decision period, but their realizations can be inferred using side information at the end of each period after an action being taken. We build an ambiguity set of the joint distribution using bounded moments via conic constraints and seek an optimal policy to maximize the worst-case (minimum) reward for any distribution in the set. We propose a distributionally robust variant of the heuristic search value iteration method for obtaining lower and upper bounds of the value function, based on its convexity with respect to the belief state. We conduct numerical studies and demonstrate the computational performance of our approach via testing instances of a dynamic epidemic control problem.
Please use this link to attend the virtual seminar: