東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Reinforcement learning without rewards.

Syed, Umar Ali.

FindBook

Google Book

Amazon

博客來

Reinforcement learning without rewards.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Reinforcement learning without rewards./
作者:	Syed, Umar Ali.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2010,
面頁冊數:	232 p.
附註:	Source: Dissertations Abstracts International, Volume: 72-04, Section: B.
Contained By:	Dissertations Abstracts International72-04B.
標題:	Applied Mathematics. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3428541
ISBN:	9781124280455

Reinforcement learning without rewards.
Syed, Umar Ali.

Reinforcement learning without rewards. - Ann Arbor : ProQuest Dissertations & Theses, 2010 - 232 p.

Source: Dissertations Abstracts International, Volume: 72-04, Section: B.

Thesis (Ph.D.)--Princeton University, 2010.

Machine learning can be broadly defined as the study and design of algorithms that improve with experience. Reinforcement learning is a variety of machine learning that makes minimal assumptions about the information available for learning, and, in a sense, defines the problem of learning in the broadest possible terms. Reinforcement learning algorithms are usually applied to "interactive" problems, such as learning to drive a car, operate a robotic arm, or play a game. In reinforcement learning, an autonomous agent must learn how to behave in an unknown, uncertain, and possibly hostile environment, using only the sensory feedback that it receives from the environment. As the agent moves from one state of the environment to another, it receives only a reward signal-there is no human "in the loop" to tell the algorithm exactly what to do. The goal in reinforcement learning is to learn an optimal behavior that maximizes the total reward that the agent collects. Despite its generality, the reinforcement learning framework does make one strong assumption: that the reward signal can always be directly and unambiguously observed. In other words, the feedback a reinforcement learning algorithm receives is assumed to be a part of the environment in which the agent is operating, and is included in the agent's experience of that environment. However, in practice, rewards are usually manually-specified by the practitioner applying the learning algorithm, and specifying a reward function that elicits the desired behavior from the agent can be a subtle and frustrating design problem. Our main focus in this thesis is the design and analysis of reinforcement learning algorithms which do not require complete knowledge of the rewards. The contributions of this thesis can be divided into three main parts: (1) In Chapters 2 and 3, we review the theory of two-player zero-sum games, and present a novel analysis of existing no-regret algorithms for solving these games. Our results show that no-regret algorithms can be used to compute strategies in games that satisfy a much stronger definition of optimality than is commonly used. (2) In Chapters 4 and 5, we present new algorithms for apprenticeship learning, a generalization of reinforcement learning where the true rewards are unknown. The algorithms described in Chapter 5 will leverage the game-theoretic results from Chapters 2 and 3. (3) In Chapter 6, we show how partial knowledge of the rewards can be used to accelerate imitation learning, an alternative to reinforcement learning where the goal is to imitate another agent in the environment. In summary, we design and analyse several new algorithms for reinforcement learning that do not require access to a fully observable or fully accurate reward signal, and by doing so, add considerable flexibility to the traditional reinforcement learning framework.

ISBN: 9781124280455Subjects--Topical Terms:

1669109
Applied Mathematics.

Reinforcement learning without rewards.
LDR:03824nmm a2200313 4500 001 2206874
005 20190906083306.5
008 201008s2010 ||||||||||||||||| ||eng d
020 $a 9781124280455
035 $a (MiAaPQ)AAI3428541
035 $a AAI3428541
040 $a MiAaPQ $c MiAaPQ
100 1 $a Syed, Umar Ali. $3 3433793
245 1 0 $a Reinforcement learning without rewards.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2010
300 $a 232 p.
500 $a Source: Dissertations Abstracts International, Volume: 72-04, Section: B.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Schapire, Robert E.
502 $a Thesis (Ph.D.)--Princeton University, 2010.
520 $a Machine learning can be broadly defined as the study and design of algorithms that improve with experience. Reinforcement learning is a variety of machine learning that makes minimal assumptions about the information available for learning, and, in a sense, defines the problem of learning in the broadest possible terms. Reinforcement learning algorithms are usually applied to "interactive" problems, such as learning to drive a car, operate a robotic arm, or play a game. In reinforcement learning, an autonomous agent must learn how to behave in an unknown, uncertain, and possibly hostile environment, using only the sensory feedback that it receives from the environment. As the agent moves from one state of the environment to another, it receives only a reward signal-there is no human "in the loop" to tell the algorithm exactly what to do. The goal in reinforcement learning is to learn an optimal behavior that maximizes the total reward that the agent collects. Despite its generality, the reinforcement learning framework does make one strong assumption: that the reward signal can always be directly and unambiguously observed. In other words, the feedback a reinforcement learning algorithm receives is assumed to be a part of the environment in which the agent is operating, and is included in the agent's experience of that environment. However, in practice, rewards are usually manually-specified by the practitioner applying the learning algorithm, and specifying a reward function that elicits the desired behavior from the agent can be a subtle and frustrating design problem. Our main focus in this thesis is the design and analysis of reinforcement learning algorithms which do not require complete knowledge of the rewards. The contributions of this thesis can be divided into three main parts: (1) In Chapters 2 and 3, we review the theory of two-player zero-sum games, and present a novel analysis of existing no-regret algorithms for solving these games. Our results show that no-regret algorithms can be used to compute strategies in games that satisfy a much stronger definition of optimality than is commonly used. (2) In Chapters 4 and 5, we present new algorithms for apprenticeship learning, a generalization of reinforcement learning where the true rewards are unknown. The algorithms described in Chapter 5 will leverage the game-theoretic results from Chapters 2 and 3. (3) In Chapter 6, we show how partial knowledge of the rewards can be used to accelerate imitation learning, an alternative to reinforcement learning where the goal is to imitate another agent in the environment. In summary, we design and analyse several new algorithms for reinforcement learning that do not require access to a fully observable or fully accurate reward signal, and by doing so, add considerable flexibility to the traditional reinforcement learning framework.
590 $a School code: 0181.
650 4 $a Applied Mathematics. $3 1669109
650 4 $a Artificial intelligence. $3 516317
650 4 $a Computer science. $3 523869
690 $a 0364
690 $a 0800
690 $a 0984
710 2 $a Princeton University. $3 645579
773 0 $t Dissertations Abstracts International $g 72-04B.
790 $a 0181
791 $a Ph.D.
792 $a 2010
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3428541