Risk-averse Allocation Indices for Multi-armed Bandit Problem
Milad MalekiPirbazari, Department of Industrial Engineering, Bilkent University
Abstract
In classical multi-armed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decisionmakers are risk-averse in some real life applications. In this study, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multi-armed bandit problem with respect to this novel setting, and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.
Short Bio
Milad MalekiPirbazari is currently a Ph.D. candidate in the Industrial Engineering department at Bilkent University. His research interests are two-fold, namely, statistical machine learning with a particular emphasis on feature selection, and, risk-averse sequential decision making. He received his BS and MS degrees in Chemical Engineering from the University of Tehran and Tarbiat Modares University, Iran. He also received an MS degree in Industrial and Systems Engineering from Istanbul Sehir University. At Bilkent, he is currently employed as a teaching assistant.
Venue
Friday, November 6, 2020, 4.00 pm - Zoom Meeting
Milad MalekiPirbazari, Department of Industrial Engineering, Bilkent University
Abstract
In classical multi-armed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decisionmakers are risk-averse in some real life applications. In this study, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multi-armed bandit problem with respect to this novel setting, and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.
Short Bio
Milad MalekiPirbazari is currently a Ph.D. candidate in the Industrial Engineering department at Bilkent University. His research interests are two-fold, namely, statistical machine learning with a particular emphasis on feature selection, and, risk-averse sequential decision making. He received his BS and MS degrees in Chemical Engineering from the University of Tehran and Tarbiat Modares University, Iran. He also received an MS degree in Industrial and Systems Engineering from Istanbul Sehir University. At Bilkent, he is currently employed as a teaching assistant.
Venue
Friday, November 6, 2020, 4.00 pm - Zoom Meeting
English