語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Posterior Sampling for Efficient Reinforcement Learning.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Posterior Sampling for Efficient Reinforcement Learning./
作者:
Dwaracherla, Vikranth Reddy.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:
118 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-09, Section: B.
Contained By:
Dissertations Abstracts International83-09B.
標題:
Algorithms. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29003855
ISBN:
9798209788881
Posterior Sampling for Efficient Reinforcement Learning.
Dwaracherla, Vikranth Reddy.
Posterior Sampling for Efficient Reinforcement Learning.
- Ann Arbor : ProQuest Dissertations & Theses, 2021 - 118 p.
Source: Dissertations Abstracts International, Volume: 83-09, Section: B.
Thesis (Ph.D.)--Stanford University, 2021.
This item must not be sold to any third party vendors.
Reinforcement learning has shown tremendous success over the past few years. Much of this recent success can be attributed to agents learning from an inordinate amount of data in simulated environments. In order to achieve similar success in real environments, it is crucial to address data efficiency. Uncertainty quantification plays a prominent role in designing an intelligent agent which exhibits data efficiency. An agent which has a notion of uncertainty can trade-off between exploration and exploitation and explore in an intelligent manner. Such an agent should not only consider immediate information gain from an action but also its consequences on future learning prospects. An agent which has this capability is said to exhibit deep exploration.Algorithms that tackle deep exploration, so far, have relied on epistemic uncertainty representation through ensembles or other hypermodels, exploration bonuses, or visitation count distributions. An open question is whether deep exploration can be achieved by an incremental reinforcement learning algorithm that tracks a single point estimate, without additional complexity required to account for epistemic uncertainty. We answer this question in the affirmative. In this dissertation, we develop Langevin DQN, a variation of DQN that differs only in perturbing parameter updates with Gaussian noise, and demonstrate through a computational study that Langevin DQN achieves deep exploration. This is the first algorithm that demonstratively achieves deep exploration using a single-point estimate.We also present index sampling, a novel method for efficiently generating approximate samples from a posterior over complex models such as neural networks, induced by a prior distribution over the model family and a set of input-output data pairs. In addition, we develop posterior sampling networks, a new approach to model this distribution over models. We are particularly motivated by the application of our method to tackle reinforcement learning problems, but it could be of independent interest to the Bayesian deep learning community. Our method is especially useful in RL when we use complex exploration schemes, which make use of more than a single sample from the posterior, such as information directed sampling.Finally, we present some preliminary results demonstrating that the Langevin DQN update rule could be used to train posterior sampling networks, as an alternative to index sampling, and further improve data efficiency.
ISBN: 9798209788881Subjects--Topical Terms:
536374
Algorithms.
Posterior Sampling for Efficient Reinforcement Learning.
LDR
:03531nmm a2200313 4500
001
2345734
005
20220613063807.5
008
241004s2021 ||||||||||||||||| ||eng d
020
$a
9798209788881
035
$a
(MiAaPQ)AAI29003855
035
$a
(MiAaPQ)STANFORDjb495qn9584
035
$a
AAI29003855
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Dwaracherla, Vikranth Reddy.
$3
3684728
245
1 0
$a
Posterior Sampling for Efficient Reinforcement Learning.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2021
300
$a
118 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-09, Section: B.
500
$a
Advisor: van Roy, Benjamin;Brunskill, Emma;Pilanci, Mert.
502
$a
Thesis (Ph.D.)--Stanford University, 2021.
506
$a
This item must not be sold to any third party vendors.
520
$a
Reinforcement learning has shown tremendous success over the past few years. Much of this recent success can be attributed to agents learning from an inordinate amount of data in simulated environments. In order to achieve similar success in real environments, it is crucial to address data efficiency. Uncertainty quantification plays a prominent role in designing an intelligent agent which exhibits data efficiency. An agent which has a notion of uncertainty can trade-off between exploration and exploitation and explore in an intelligent manner. Such an agent should not only consider immediate information gain from an action but also its consequences on future learning prospects. An agent which has this capability is said to exhibit deep exploration.Algorithms that tackle deep exploration, so far, have relied on epistemic uncertainty representation through ensembles or other hypermodels, exploration bonuses, or visitation count distributions. An open question is whether deep exploration can be achieved by an incremental reinforcement learning algorithm that tracks a single point estimate, without additional complexity required to account for epistemic uncertainty. We answer this question in the affirmative. In this dissertation, we develop Langevin DQN, a variation of DQN that differs only in perturbing parameter updates with Gaussian noise, and demonstrate through a computational study that Langevin DQN achieves deep exploration. This is the first algorithm that demonstratively achieves deep exploration using a single-point estimate.We also present index sampling, a novel method for efficiently generating approximate samples from a posterior over complex models such as neural networks, induced by a prior distribution over the model family and a set of input-output data pairs. In addition, we develop posterior sampling networks, a new approach to model this distribution over models. We are particularly motivated by the application of our method to tackle reinforcement learning problems, but it could be of independent interest to the Bayesian deep learning community. Our method is especially useful in RL when we use complex exploration schemes, which make use of more than a single sample from the posterior, such as information directed sampling.Finally, we present some preliminary results demonstrating that the Langevin DQN update rule could be used to train posterior sampling networks, as an alternative to index sampling, and further improve data efficiency.
590
$a
School code: 0212.
650
4
$a
Algorithms.
$3
536374
650
4
$a
Neural networks.
$3
677449
650
4
$a
Artificial intelligence.
$3
516317
650
4
$a
Computer science.
$3
523869
690
$a
0800
690
$a
0984
710
2
$a
Stanford University.
$3
754827
773
0
$t
Dissertations Abstracts International
$g
83-09B.
790
$a
0212
791
$a
Ph.D.
792
$a
2021
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29003855
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9468172
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入