Mean-payoff objectives in Markov Decision Processes

Pranav Ashok (TU Munich)

When: Wednesday April 5, 2pm

Where: room C417 (at the very end of the corridor)


Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviors. The mean-payoff objective provides a mathematically elegant formalism to express performance related properties. The value-iteration (VI) approach provides one of the simplest and most efficient algorithmic approaches for MDPs with other properties (such as reachability objectives). Unfortunately, the straightforward VI approach does not work for the mean-payoff objective in MDPs. In particular, there is no stopping criterion which can give guarantees that the solution obtained through VI is epsilon-close to the optimal solution. We provide some insight into the problem which leads to the development of two practical algorithms based on VI. We also present some experimental results which show that our approach significantly outperforms the standard approaches on several benchmarks.