{"@context":"http://iiif.io/api/presentation/2/context.json","@id":"https://repo.library.stonybrook.edu/cantaloupe/iiif/2/manifest.json","@type":"sc:Manifest","label":"Complexity Estimates and Reductions to Discounting for Total and Average-Reward Markov Decision Processes and Stochastic Games","metadata":[{"label":"dc.description.sponsorship","value":"This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree."},{"label":"dc.format","value":"Monograph"},{"label":"dc.format.medium","value":"Electronic Resource"},{"label":"dc.identifier.uri","value":"http://hdl.handle.net/11401/77166"},{"label":"dc.language.iso","value":"en_US"},{"label":"dc.publisher","value":"The Graduate School, Stony Brook University: Stony Brook, NY."},{"label":"dcterms.abstract","value":"Recently there has been a resurgence of interest in the complexity of algorithms for Markov decision processes (MDPs) and stochastic games. Much of this work was inspired by recent groundbreaking results on the complexity of policy iteration algorithms for MDPs under the Blum-Shub-Smale (BSS) model of computation. In particular, for discounted MDPs with a fixed discount factor, Yinyu Ye showed that the number of arithmetic operations needed by two classic variants of policy iteration can be bounded above by a polynomial in the number of state-action pairs only. A natural question is whether a similar complexity estimate exists for the value iteration algorithm, which is another classic approach to computing optimal policies for MDPs. Our first main contribution is a negative answer to this question. Using a deterministic MDP with four state-action pairs, we show that under the BSS model there is no upper bound on the number of iterations needed by value iteration to return an optimal policy. We also show that the same example implies the same result for a broad class of so-called optimistic policy iteration algorithms, which includes algorithms of interest to the reinforcement learning community such as \u00ce\u00bb-policy iteration and modified policy iteration. Another natural question is whether Ye's approach can yield results for MDPs under other optimality criteria. Our second main contribution is a formulation of conditions, which to our knowledge are the most general ones known, under which MDPs and two-player zero-sum stochastic games with Borel state and action spaces can be reduced to discounted ones. For undiscounted total-reward MDPs and stochastic games, the transformations we formulate are based on an idea due to Alan Hoffman. For average-rewards, the transformations are extensions to Borel state and action spaces of one proposed recently for finite stochastic games. In addition to implying the existence of \u00ce\u00b5-optimal policies for total and average rewards, these reductions lead to estimates of the number of arithmetic operations needed to compute optimal policies for such models with finite state and actions sets, as well as complexity estimates for computing \u00ce\u00b5-optimal policies for MDPs with Euclidean state and action spaces."},{"label":"dcterms.available","value":"2017-09-20T16:52:08Z"},{"label":"dcterms.contributor","value":"Hu, Jiaqiao"},{"label":"dcterms.creator","value":"Huang, Jefferson"},{"label":"dcterms.dateAccepted","value":"2017-09-20T16:52:08Z"},{"label":"dcterms.dateSubmitted","value":"2017-09-20T16:52:08Z"},{"label":"dcterms.description","value":"Department of Applied Mathematics and Statistics"},{"label":"dcterms.extent","value":"125 pg."},{"label":"dcterms.format","value":"Application/PDF"},{"label":"dcterms.identifier","value":"http://hdl.handle.net/11401/77166"},{"label":"dcterms.issued","value":"2016-12-01"},{"label":"dcterms.language","value":"en_US"},{"label":"dcterms.provenance","value":"Made available in DSpace on 2017-09-20T16:52:08Z (GMT). No. of bitstreams: 1\nHuang_grad.sunysb_0771E_13001.pdf: 790262 bytes, checksum: a03a3259262f99b742f09c2bf2344f16 (MD5)\n Previous issue date: 1"},{"label":"dcterms.publisher","value":"The Graduate School, Stony Brook University: Stony Brook, NY."},{"label":"dcterms.subject","value":"Operations research -- Mathematics"},{"label":"dcterms.title","value":"Complexity Estimates and Reductions to Discounting for Total and Average-Reward Markov Decision Processes and Stochastic Games"},{"label":"dcterms.type","value":"Dissertation"},{"label":"dc.type","value":"Dissertation"}],"description":"This manifest was generated dynamically","viewingDirection":"left-to-right","sequences":[{"@type":"sc:Sequence","canvases":[{"@id":"https://repo.library.stonybrook.edu/cantaloupe/iiif/2/canvas/page-1.json","@type":"sc:Canvas","label":"Page 1","height":1650,"width":1275,"images":[{"@type":"oa:Annotation","motivation":"sc:painting","resource":{"@id":"https://repo.library.stonybrook.edu/cantaloupe/iiif/2/23%2F96%2F16%2F23961617120465106098784695519462982261/full/full/0/default.jpg","@type":"dctypes:Image","format":"image/jpeg","height":1650,"width":1275,"service":{"@context":"http://iiif.io/api/image/2/context.json","@id":"https://repo.library.stonybrook.edu/cantaloupe/iiif/2/23%2F96%2F16%2F23961617120465106098784695519462982261","profile":"http://iiif.io/api/image/2/level2.json"}},"on":"https://repo.library.stonybrook.edu/cantaloupe/iiif/2/canvas/page-1.json"}]}]}]}