東華大學圖書館 |

Monte Carlo Planning and Reinforcement Learning for Large Scale Sequential Decision Problems.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Monte Carlo Planning and Reinforcement Learning for Large Scale Sequential Decision Problems./
作者:	Mern, John Michael.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	138 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-09, Section: B.
Contained By:	Dissertations Abstracts International83-09B.
標題:	Aircraft. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29003824
ISBN:	9798209786979

Monte Carlo Planning and Reinforcement Learning for Large Scale Sequential Decision Problems.
Mern, John Michael.

Monte Carlo Planning and Reinforcement Learning for Large Scale Sequential Decision Problems. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 138 p.

Source: Dissertations Abstracts International, Volume: 83-09, Section: B.

Thesis (Ph.D.)--Stanford University, 2021.

This item must not be sold to any third party vendors.

Autonomous agents have the potential to do tasks that would otherwise be too repetitive, difficult, or dangerous for humans. Solving many of these problems requires reasoning over sequences of decisions in order to reach a goal. Autonomous driving, inventory management, and medical diagnosis and treatment are all examples of important real-world sequential decision problems. Approximate solution methods such as reinforcement learning and Monte Carlo planning have achieved superhuman performance in some domains. In these methods, agents learn good actions to take in response to inputs. Problems with many widely varying inputs or possible actions remain challenging to efficiently solve without extensive problem-specific engineering.One of the key challenges in solving sequential decision problems is efficiently exploring the many different paths an agent may take. For most problems, it is infeasible to test every possible path. Many existing approaches explore paths using simple random sampling. Problems in which many different actions may be taken at each step often require more efficient exploration to be solved. Large, unstructured input spaces can also challenge conventional learning approaches. Agents must learn to recognize inputs that are functionally similar while simultaneously learning an effective decision strategy. As a result of these challenges, learning agents are often limited to solving tasks in virtual domains where very large amounts of trials can be conducted relatively safely and cheaply. When problems are solved using black-box models such as neural networks, the resulting decision making policy is impossible for a human to meaningfully interpret. This can also limit the use of learning agents to low-regret tasks such as image classification or video game playing.The work in this thesis addresses the challenges of learning in large-space sequential decision problems. The thesis first considers methods to improve scaling of deep reinforcement learning and Monte Carlo tree search methods. We present neural network architectures for the common case of exchangeable object inputs in deep reinforcement learning. The presented architecture accelerates learning by efficiently sharing learned representations among objects of the same type. The thesis then addresses methods to efficiently explore large action spaces in Monte Carlo tree search. We present two algorithms, PA-POMCPOW and BOMCP, that improve search by guiding exploration to actions with good expected performance or information gain. We then propose methods to improve the use of offline learned policies within online Monte Carlo planning through importance sampling and experience generalization. Finally, we study methods to interpret learned policies and expected search performance. Here, we present a method to represent high-dimensional policies with interpretable local surrogate trees. We also propose bounds on the error rates for Monte Carlo estimation that can be numerically calculated using empirical quantities.

ISBN: 9798209786979Subjects--Topical Terms:

832698
Aircraft.

Monte Carlo Planning and Reinforcement Learning for Large Scale Sequential Decision Problems.
LDR:04181nmm a2200349 4500 001 2349912
005 20221010063655.5
008 241004s2021 ||||||||||||||||| ||eng d
020 $a 9798209786979
035 $a (MiAaPQ)AAI29003824
035 $a (MiAaPQ)STANFORDrh431py7651
035 $a AAI29003824
040 $a MiAaPQ $c MiAaPQ
100 1 $a Mern, John Michael. $3 3689337
245 1 0 $a Monte Carlo Planning and Reinforcement Learning for Large Scale Sequential Decision Problems.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 138 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-09, Section: B.
500 $a Advisor: Kochenderfer, Mykel J. ; Mukerji, Tapan; Schwager, Mac.
502 $a Thesis (Ph.D.)--Stanford University, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a Autonomous agents have the potential to do tasks that would otherwise be too repetitive, difficult, or dangerous for humans. Solving many of these problems requires reasoning over sequences of decisions in order to reach a goal. Autonomous driving, inventory management, and medical diagnosis and treatment are all examples of important real-world sequential decision problems. Approximate solution methods such as reinforcement learning and Monte Carlo planning have achieved superhuman performance in some domains. In these methods, agents learn good actions to take in response to inputs. Problems with many widely varying inputs or possible actions remain challenging to efficiently solve without extensive problem-specific engineering.One of the key challenges in solving sequential decision problems is efficiently exploring the many different paths an agent may take. For most problems, it is infeasible to test every possible path. Many existing approaches explore paths using simple random sampling. Problems in which many different actions may be taken at each step often require more efficient exploration to be solved. Large, unstructured input spaces can also challenge conventional learning approaches. Agents must learn to recognize inputs that are functionally similar while simultaneously learning an effective decision strategy. As a result of these challenges, learning agents are often limited to solving tasks in virtual domains where very large amounts of trials can be conducted relatively safely and cheaply. When problems are solved using black-box models such as neural networks, the resulting decision making policy is impossible for a human to meaningfully interpret. This can also limit the use of learning agents to low-regret tasks such as image classification or video game playing.The work in this thesis addresses the challenges of learning in large-space sequential decision problems. The thesis first considers methods to improve scaling of deep reinforcement learning and Monte Carlo tree search methods. We present neural network architectures for the common case of exchangeable object inputs in deep reinforcement learning. The presented architecture accelerates learning by efficiently sharing learned representations among objects of the same type. The thesis then addresses methods to efficiently explore large action spaces in Monte Carlo tree search. We present two algorithms, PA-POMCPOW and BOMCP, that improve search by guiding exploration to actions with good expected performance or information gain. We then propose methods to improve the use of offline learned policies within online Monte Carlo planning through importance sampling and experience generalization. Finally, we study methods to interpret learned policies and expected search performance. Here, we present a method to represent high-dimensional policies with interpretable local surrogate trees. We also propose bounds on the error rates for Monte Carlo estimation that can be numerically calculated using empirical quantities.
590 $a School code: 0212.
650 4 $a Aircraft. $3 832698
650 4 $a Deep learning. $3 3554982
650 4 $a Sample variance. $3 3680758
650 4 $a Computer security. $3 540555
650 4 $a Planning. $3 552734
650 4 $a Optimization. $3 891104
650 4 $a Decision making. $3 517204
650 4 $a Sensors. $3 3549539
650 4 $a Neural networks. $3 677449
650 4 $a Algorithms. $3 536374
650 4 $a Visualization. $3 586179
650 4 $a Distance learning. $3 3557921
650 4 $a Markov analysis. $3 3562906
650 4 $a Aerospace engineering. $3 1002622
650 4 $a Artificial intelligence. $3 516317
650 4 $a Computer science. $3 523869
650 4 $a Engineering. $3 586835
650 4 $a Operations research. $3 547123
690 $a 0538
690 $a 0800
690 $a 0984
690 $a 0537
690 $a 0796
710 2 $a Stanford University. $3 754827
773 0 $t Dissertations Abstracts International $g 83-09B.
790 $a 0212
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29003824