東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

FindBook

Google Book

Amazon

博客來

Exploration and Safety in Deep Reinforcement Learning.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Exploration and Safety in Deep Reinforcement Learning./
作者:	Achiam, Joshua S.
面頁冊數:	1 online resource (176 pages)
附註:	Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
Contained By:	Dissertations Abstracts International83-02B.
標題:	Artificial intelligence. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28497170click for full text (PQDT)
ISBN:	9798535552248

Exploration and Safety in Deep Reinforcement Learning.
Achiam, Joshua S.

Exploration and Safety in Deep Reinforcement Learning. - 1 online resource (176 pages)

Source: Dissertations Abstracts International, Volume: 83-02, Section: B.

Thesis (Ph.D.)--University of California, Berkeley, 2021.

Includes bibliographical references

Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies by trial and error. However, exploration is challenging when reward signals are sparse, or when safety is a critical concern and certain errors are unacceptable. In this thesis, we address these challenges in the deep reinforcement learning setting by modifying the underlying optimization problem that agents solve, incentivizing them to explore in safer or more-efficient ways.In the first part of this thesis, we develop methods for intrinsic motivation to make progress on problems where rewards are sparse or absent. Our first approach uses an intrinsic reward to incentivize agents to visit states considered surprising under a learned dynamics model, and we show that this technique performs favorably compared to naive exploration. Our second approach uses an objective based on variational inference to endow agents with multiple skills that are distinct from each other, without the use of task-specific rewards. We show that this approach, which we call variational option discovery, can be used to learn locomotion behaviors in simulated robot environments.In the second part of this thesis, we focus on problems in safe exploration. Building on a wide range of prior work on safe reinforcement learning, we propose to standardize constrained RL as the main formalism for safe exploration; we then proceed to develop algorithms and benchmarks for constrained RL. Our presentation of material tells a story in chronological order: we begin by presenting Constrained Policy Optimization (CPO), the first algorithm for constrained deep RL with guarantees of near-constraint satisfaction at each iteration. Next, we develop the Safety Gym benchmark, which allows us to find the limits of CPO and inspires us to press in a different direction. Finally, we develop PID Lagrangian methods, where we find that a small modification to the Lagrangian primal-dual gradient baseline approach results in significantly improved stability and robustness in solving constrained RL tasks in Safety Gym.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023

Mode of access: World Wide Web

ISBN: 9798535552248Subjects--Topical Terms:

516317
Artificial intelligence.
Subjects--Index Terms:

Deep learningIndex Terms--Genre/Form:

542853
Electronic books.

Exploration and Safety in Deep Reinforcement Learning.
LDR:03425nmm a2200373K 4500 001 2357206
005 20230622065016.5
006 m o d
007 cr mn ---uuuuu
008 241011s2021 xx obm 000 0 eng d
020 $a 9798535552248
035 $a (MiAaPQ)AAI28497170
035 $a AAI28497170
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Achiam, Joshua S. $3 3697736
245 1 0 $a Exploration and Safety in Deep Reinforcement Learning.
264 0 $c 2021
300 $a 1 online resource (176 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
500 $a Advisor: Sastry, Shankar; Abbeel, Pieter.
502 $a Thesis (Ph.D.)--University of California, Berkeley, 2021.
504 $a Includes bibliographical references
520 $a Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies by trial and error. However, exploration is challenging when reward signals are sparse, or when safety is a critical concern and certain errors are unacceptable. In this thesis, we address these challenges in the deep reinforcement learning setting by modifying the underlying optimization problem that agents solve, incentivizing them to explore in safer or more-efficient ways.In the first part of this thesis, we develop methods for intrinsic motivation to make progress on problems where rewards are sparse or absent. Our first approach uses an intrinsic reward to incentivize agents to visit states considered surprising under a learned dynamics model, and we show that this technique performs favorably compared to naive exploration. Our second approach uses an objective based on variational inference to endow agents with multiple skills that are distinct from each other, without the use of task-specific rewards. We show that this approach, which we call variational option discovery, can be used to learn locomotion behaviors in simulated robot environments.In the second part of this thesis, we focus on problems in safe exploration. Building on a wide range of prior work on safe reinforcement learning, we propose to standardize constrained RL as the main formalism for safe exploration; we then proceed to develop algorithms and benchmarks for constrained RL. Our presentation of material tells a story in chronological order: we begin by presenting Constrained Policy Optimization (CPO), the first algorithm for constrained deep RL with guarantees of near-constraint satisfaction at each iteration. Next, we develop the Safety Gym benchmark, which allows us to find the limits of CPO and inspires us to press in a different direction. Finally, we develop PID Lagrangian methods, where we find that a small modification to the Lagrangian primal-dual gradient baseline approach results in significantly improved stability and robustness in solving constrained RL tasks in Safety Gym.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2023
538 $a Mode of access: World Wide Web
650 4 $a Artificial intelligence. $3 516317
650 4 $a Robotics. $3 519753
650 4 $a Research. $3 531893
650 4 $a Deep learning. $3 3554982
650 4 $a Lagrange multiplier. $3 3691773
650 4 $a Curricula. $3 3422445
650 4 $a Experiments. $3 525909
650 4 $a Optimization. $3 891104
650 4 $a Neural networks. $3 677449
650 4 $a Approximation. $3 3560410
650 4 $a Algorithms. $3 536374
650 4 $a Skills. $3 3221615
653 $a Deep learning
653 $a AI
653 $a Artificial intelligence
653 $a Reinforcement learning
655 7 $a Electronic books. $2 lcsh $3 542853
690 $a 0800
690 $a 0771
710 2 $a ProQuest Information and Learning Co. $3 783688
710 2 $a University of California, Berkeley. $b Electrical Engineering & Computer Sciences. $3 1671057
773 0 $t Dissertations Abstracts International $g 83-02B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28497170 $z click for full text (PQDT)