Gain/variability tradeoffs in undiscounted Markov Decision Processes
Filar, Jerzy A
MetadataShow full item record
We consider a finite state/action Markov Decision Process over the infinite time horizon, and with the limiting average reward criterion. However, we are interested not only in maximizing the above reward criterion but also in minimizing "the variability" of the stream of rewards. The latter notion is formalized in two alternative ways: one in terms of measuring absolute deviations from the "optimal" reward, and the other in terms of a "long-run variance" of a policy. In both cases we formulate a bi-objective optimization problem and show that efficient (i.e., "nondominated") deterministic stationary policies exist and can be computed by finite algorithms. In addition, in the former case we give an algorithm for computing a finite set of "critical efficient policies" which in a sense constitutes one complete set of reasonable responses by a decision-maker sensitive to the variability of rewards. However, the analysis of this case is intended primarily as a "sensitivity analysis" technique rather than a complete theoretical treatment of the gain/variability tradeoffs.