Full Text: PDF
DOI: 10.23952/cot.2026.37
Received May 3, 2025; Accepted August 12, 2025; Published online April 29, 2026
Abstract. Consider a decentralized partially-observed Markov decision problem (POMDP) with multiple cooperative agents aiming to maximize a long-term-average reward criterion. We observe that the availability, at a fixed rate, of entangled states of a product quantum system between the agents, where each agent has access to one of the component systems, can result in strictly improved performance even compared to the scenario where common randomness is provided to the agents, i.e., there is a quantum advantage in decentralized control. This observation comes from a simple reinterpretation of the conclusions of the well-known Mermin-Peres square, which underpins the Mermin-Peres game. While quantum advantage has been demonstrated earlier in one-shot team problems of this kind, it is notable that there are examples where there is a quantum advantage for the one-shot criterion but it disappears in the dynamical scenario. The presence of a quantum advantage in dynamical scenarios is thus seen to be a novel finding relative to the current state of knowledge about the achievable performance in decentralized control problems.
How to Cite this Article:
V. Anantharam, Quantum advantage in decentralized control of POMDPs: A control-theoretic view of the Mermin-Peres square, Commun. Optim. Theory 2026 (2026) 37.