語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Exploration and Safety in Deep Reinforcement Learning.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Exploration and Safety in Deep Reinforcement Learning./
作者:
Achiam, Joshua S.
面頁冊數:
1 online resource (176 pages)
附註:
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
Contained By:
Dissertations Abstracts International83-02B.
標題:
Artificial intelligence. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28497170click for full text (PQDT)
ISBN:
9798535552248
Exploration and Safety in Deep Reinforcement Learning.
Achiam, Joshua S.
Exploration and Safety in Deep Reinforcement Learning.
- 1 online resource (176 pages)
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
Thesis (Ph.D.)--University of California, Berkeley, 2021.
Includes bibliographical references
Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies by trial and error. However, exploration is challenging when reward signals are sparse, or when safety is a critical concern and certain errors are unacceptable. In this thesis, we address these challenges in the deep reinforcement learning setting by modifying the underlying optimization problem that agents solve, incentivizing them to explore in safer or more-efficient ways.In the first part of this thesis, we develop methods for intrinsic motivation to make progress on problems where rewards are sparse or absent. Our first approach uses an intrinsic reward to incentivize agents to visit states considered surprising under a learned dynamics model, and we show that this technique performs favorably compared to naive exploration. Our second approach uses an objective based on variational inference to endow agents with multiple skills that are distinct from each other, without the use of task-specific rewards. We show that this approach, which we call variational option discovery, can be used to learn locomotion behaviors in simulated robot environments.In the second part of this thesis, we focus on problems in safe exploration. Building on a wide range of prior work on safe reinforcement learning, we propose to standardize constrained RL as the main formalism for safe exploration; we then proceed to develop algorithms and benchmarks for constrained RL. Our presentation of material tells a story in chronological order: we begin by presenting Constrained Policy Optimization (CPO), the first algorithm for constrained deep RL with guarantees of near-constraint satisfaction at each iteration. Next, we develop the Safety Gym benchmark, which allows us to find the limits of CPO and inspires us to press in a different direction. Finally, we develop PID Lagrangian methods, where we find that a small modification to the Lagrangian primal-dual gradient baseline approach results in significantly improved stability and robustness in solving constrained RL tasks in Safety Gym.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023
Mode of access: World Wide Web
ISBN: 9798535552248Subjects--Topical Terms:
516317
Artificial intelligence.
Subjects--Index Terms:
Deep learningIndex Terms--Genre/Form:
542853
Electronic books.
Exploration and Safety in Deep Reinforcement Learning.
LDR
:03425nmm a2200373K 4500
001
2357206
005
20230622065016.5
006
m o d
007
cr mn ---uuuuu
008
241011s2021 xx obm 000 0 eng d
020
$a
9798535552248
035
$a
(MiAaPQ)AAI28497170
035
$a
AAI28497170
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Achiam, Joshua S.
$3
3697736
245
1 0
$a
Exploration and Safety in Deep Reinforcement Learning.
264
0
$c
2021
300
$a
1 online resource (176 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
500
$a
Advisor: Sastry, Shankar; Abbeel, Pieter.
502
$a
Thesis (Ph.D.)--University of California, Berkeley, 2021.
504
$a
Includes bibliographical references
520
$a
Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies by trial and error. However, exploration is challenging when reward signals are sparse, or when safety is a critical concern and certain errors are unacceptable. In this thesis, we address these challenges in the deep reinforcement learning setting by modifying the underlying optimization problem that agents solve, incentivizing them to explore in safer or more-efficient ways.In the first part of this thesis, we develop methods for intrinsic motivation to make progress on problems where rewards are sparse or absent. Our first approach uses an intrinsic reward to incentivize agents to visit states considered surprising under a learned dynamics model, and we show that this technique performs favorably compared to naive exploration. Our second approach uses an objective based on variational inference to endow agents with multiple skills that are distinct from each other, without the use of task-specific rewards. We show that this approach, which we call variational option discovery, can be used to learn locomotion behaviors in simulated robot environments.In the second part of this thesis, we focus on problems in safe exploration. Building on a wide range of prior work on safe reinforcement learning, we propose to standardize constrained RL as the main formalism for safe exploration; we then proceed to develop algorithms and benchmarks for constrained RL. Our presentation of material tells a story in chronological order: we begin by presenting Constrained Policy Optimization (CPO), the first algorithm for constrained deep RL with guarantees of near-constraint satisfaction at each iteration. Next, we develop the Safety Gym benchmark, which allows us to find the limits of CPO and inspires us to press in a different direction. Finally, we develop PID Lagrangian methods, where we find that a small modification to the Lagrangian primal-dual gradient baseline approach results in significantly improved stability and robustness in solving constrained RL tasks in Safety Gym.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2023
538
$a
Mode of access: World Wide Web
650
4
$a
Artificial intelligence.
$3
516317
650
4
$a
Robotics.
$3
519753
650
4
$a
Research.
$3
531893
650
4
$a
Deep learning.
$3
3554982
650
4
$a
Lagrange multiplier.
$3
3691773
650
4
$a
Curricula.
$3
3422445
650
4
$a
Experiments.
$3
525909
650
4
$a
Optimization.
$3
891104
650
4
$a
Neural networks.
$3
677449
650
4
$a
Approximation.
$3
3560410
650
4
$a
Algorithms.
$3
536374
650
4
$a
Skills.
$3
3221615
653
$a
Deep learning
653
$a
AI
653
$a
Artificial intelligence
653
$a
Reinforcement learning
655
7
$a
Electronic books.
$2
lcsh
$3
542853
690
$a
0800
690
$a
0771
710
2
$a
ProQuest Information and Learning Co.
$3
783688
710
2
$a
University of California, Berkeley.
$b
Electrical Engineering & Computer Sciences.
$3
1671057
773
0
$t
Dissertations Abstracts International
$g
83-02B.
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28497170
$z
click for full text (PQDT)
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9479562
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入