The report FIMU-RS-2011-02
Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes
A full version of the paper presented at conference LICS 2011. April 2011, 32 pages.
Available as Postscript,
We study Markov decision processes (MDPs) with multiple
limit-average (or mean-payoff) functions. We consider two different
objectives, namely, expectation and satisfaction objectives. Given
an MDP with k reward functions, in the expectation objective the
goal is to maximize the expected limit-average value, and in the satisfaction
objective the goal is to maximize the probability of runs such that
the limit-average value stays above a given vector.
We show that
under the expectation objective, in contrast to the single-objective
case, both randomization and memory are necessary for strategies,
and that finite-memory randomized strategies are sufficient.
Under the satisfaction objective, in contrast to the
single-objective case, infinite memory is necessary
for strategies, and that randomized memoryless strategies
are sufficient for epsilon-approximation, for all
epsilon. We further prove that
the decision problems for both expectation and satisfaction
objectives can be solved in polynomial time and the trade-off curve
(Pareto curve) can be epsilon-approximated in time polynomial
in the size of the MDP and 1/epsilon, and exponential
in the number of reward functions, for all epsilon>0. Our
results also reveal flaws in
previous work for MDPs with multiple
mean-payoff functions under the expectation objective, correct the
flaws and obtain improved results.