Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Scaling Human Feedback.
~
Candidate, Minae Kwon.
Linked to FindBook
Google Book
Amazon
博客來
Scaling Human Feedback.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Scaling Human Feedback./
Author:
Candidate, Minae Kwon.
Published:
Ann Arbor : ProQuest Dissertations & Theses, : 2023,
Description:
120 p.
Notes:
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
Contained By:
Dissertations Abstracts International85-11B.
Subject:
Robots. -
Online resource:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31049679
ISBN:
9798382642543
Scaling Human Feedback.
Candidate, Minae Kwon.
Scaling Human Feedback.
- Ann Arbor : ProQuest Dissertations & Theses, 2023 - 120 p.
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
Thesis (Ph.D.)--Stanford University, 2023.
Human-generated data has been pivotal for significant advancements in artificial intelligence (AI). As AI models scale and are applied to a wider range of tasks, the demand for more and increasingly specialized human data will grow. However, current methods of acquiring human feedback, such as learning from demonstrations or preferences, and designing objective functions or prompts, are becoming unsustainable due to their high cost and the extensive effort or domain knowledge they require from users. We addresses this challenge by developing algorithms that reduce the cost and effort of providing human feedback. We leverage Foundation models to aid users in offering feedback. Users initially define their objectives (through language or a small dataset), and Foundation models expand this into more detailed feedback. A key contribution is an algorithm, based on a large language model, that allows users to cheaply define their objectives and train a reinforcement learning agent without needing to develop a complex reward function or provide extensive data. For situations where initial objectives are poorly defined or biased, we introduce an algorithm that efficiently queries humans for more information, reducing the number of needed queries. Finally, we conclude by proposing an information-gathering algorithm that eliminates the requirement for human intervention altogether, streamlining the feedback process. By making it cheaper for users to give feedback, either during training or when queried for more information, we hope to make learning from human feedback more scalable.
ISBN: 9798382642543Subjects--Topical Terms:
529507
Robots.
Scaling Human Feedback.
LDR
:02554nmm a2200313 4500
001
2398374
005
20240812064624.5
006
m o d
007
cr#unu||||||||
008
251215s2023 ||||||||||||||||| ||eng d
020
$a
9798382642543
035
$a
(MiAaPQ)AAI31049679
035
$a
(MiAaPQ)STANFORDsy876pv8068
035
$a
AAI31049679
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Candidate, Minae Kwon.
$3
3768282
245
1 0
$a
Scaling Human Feedback.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2023
300
$a
120 p.
500
$a
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
500
$a
Advisor: Goodman, Noah;Yang, Diyi;Sadigh, Dorsa.
502
$a
Thesis (Ph.D.)--Stanford University, 2023.
520
$a
Human-generated data has been pivotal for significant advancements in artificial intelligence (AI). As AI models scale and are applied to a wider range of tasks, the demand for more and increasingly specialized human data will grow. However, current methods of acquiring human feedback, such as learning from demonstrations or preferences, and designing objective functions or prompts, are becoming unsustainable due to their high cost and the extensive effort or domain knowledge they require from users. We addresses this challenge by developing algorithms that reduce the cost and effort of providing human feedback. We leverage Foundation models to aid users in offering feedback. Users initially define their objectives (through language or a small dataset), and Foundation models expand this into more detailed feedback. A key contribution is an algorithm, based on a large language model, that allows users to cheaply define their objectives and train a reinforcement learning agent without needing to develop a complex reward function or provide extensive data. For situations where initial objectives are poorly defined or biased, we introduce an algorithm that efficiently queries humans for more information, reducing the number of needed queries. Finally, we conclude by proposing an information-gathering algorithm that eliminates the requirement for human intervention altogether, streamlining the feedback process. By making it cheaper for users to give feedback, either during training or when queried for more information, we hope to make learning from human feedback more scalable.
590
$a
School code: 0212.
650
4
$a
Robots.
$3
529507
650
4
$a
Negotiations.
$3
3564485
650
4
$a
Games.
$3
525308
650
4
$a
Robotics.
$3
519753
690
$a
0771
710
2
$a
Stanford University.
$3
754827
773
0
$t
Dissertations Abstracts International
$g
85-11B.
790
$a
0212
791
$a
Ph.D.
792
$a
2023
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31049679
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9506694
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login