語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Exploiting Structured Data for Robust and Adaptable Natural Language Representations.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Exploiting Structured Data for Robust and Adaptable Natural Language Representations./
作者:
Leszczynski, Megan Eileen.
面頁冊數:
1 online resource (142 pages)
附註:
Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
Contained By:
Dissertations Abstracts International84-12B.
標題:
Language. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30462678click for full text (PQDT)
ISBN:
9798379652845
Exploiting Structured Data for Robust and Adaptable Natural Language Representations.
Leszczynski, Megan Eileen.
Exploiting Structured Data for Robust and Adaptable Natural Language Representations.
- 1 online resource (142 pages)
Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
Thesis (Ph.D.)--Stanford University, 2023.
Includes bibliographical references
The foundations of many recent machine learning successes are natural language representations pretrained over vast amounts of unstructured text. Over the past several decades, natural language representations have been trained on increasingly larger datasets, with the most recent representations trained on over one trillion tokens. However, despite this immense scale, existing representations continue to face long-standing challenges, such as capturing rare, or long-tail, knowledge and adapting to natural language feedback. A key bottleneck is that current representations rely on memorizing knowledge in unstructured data, and thus are ultimately limited by the knowledge present in unstructured data. Unstructured data has limited facts about many entities (people, places, or things), as well as limited domain-specific data, like goal-oriented conversations.In this thesis, we exploit a largely untapped and carefully curated resource-structured data-to improve natural language representations. Structured data includes knowledge graphs and item collections (e.g., playlists) that contain rich relationships between entities, such as the birthplace of an artist, all versions of a single song, or all songs by the same artist. These relationships can be challenging to learn from unstructured data, as they may occur infrequently, or may not even exist, in unstructured data. Yet, structured data comes with limitations: humans communicate in unstructured natural language-not structured queries-and structured data also can be incomplete and noisy.Motivated by the complementary knowledge in unstructured and structured data, we present three techniques that combine structured data with unstructured data for training natural language representations. Our techniques span the three main components of a machine learning pipeline: the training data, the model architecture, and the training objective. First, with TalkTheWalk, we use structured data to generate unstructured training data for conversational recommendation systems. By training a conversational music recommendation system over the synthetic data, we demonstrate how structured data can help improve adaptability over standard recommendation baselines. Next with Bootleg, we introduce a Transformer-based architecture that leverages structured data to learn key reasoning patterns from unstructured text for named entity disambiguation. We demonstrate that learning these reasoning patterns leads to significant lift on disambiguating entities that rarely or never occur in text, and we discuss our results applying Bootleg to a production assistant task at a major technology company. Finally, with TABi, we add structured data as supervision in a contrastive loss function to improve robustness, while using more general-purpose models. We validate that TABi not only improves rare entity retrieval, but also performs strongly in settings with incomplete and noisy structured data.The three techniques introduced in this thesis-TalkTheWalk, Bootleg, and TABi- demonstrate that training approaches that combine structured data with unstructured data can enable more robust and adaptable natural language representations.
Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023
Mode of access: World Wide Web
ISBN: 9798379652845Subjects--Topical Terms:
643551
Language.
Index Terms--Genre/Form:
542853
Electronic books.
Exploiting Structured Data for Robust and Adaptable Natural Language Representations.
LDR
:04472nmm a2200337K 4500
001
2362715
005
20231102122807.5
006
m o d
007
cr mn ---uuuuu
008
241011s2023 xx obm 000 0 eng d
020
$a
9798379652845
035
$a
(MiAaPQ)AAI30462678
035
$a
(MiAaPQ)STANFORDmw196yh2577
035
$a
AAI30462678
040
$a
MiAaPQ
$b
eng
$c
MiAaPQ
$d
NTU
100
1
$a
Leszczynski, Megan Eileen.
$3
3703456
245
1 0
$a
Exploiting Structured Data for Robust and Adaptable Natural Language Representations.
264
0
$c
2023
300
$a
1 online resource (142 pages)
336
$a
text
$b
txt
$2
rdacontent
337
$a
computer
$b
c
$2
rdamedia
338
$a
online resource
$b
cr
$2
rdacarrier
500
$a
Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
500
$a
Advisor: Lam, Monica S.;Manning, Christopher D.;Re, Chris.
502
$a
Thesis (Ph.D.)--Stanford University, 2023.
504
$a
Includes bibliographical references
520
$a
The foundations of many recent machine learning successes are natural language representations pretrained over vast amounts of unstructured text. Over the past several decades, natural language representations have been trained on increasingly larger datasets, with the most recent representations trained on over one trillion tokens. However, despite this immense scale, existing representations continue to face long-standing challenges, such as capturing rare, or long-tail, knowledge and adapting to natural language feedback. A key bottleneck is that current representations rely on memorizing knowledge in unstructured data, and thus are ultimately limited by the knowledge present in unstructured data. Unstructured data has limited facts about many entities (people, places, or things), as well as limited domain-specific data, like goal-oriented conversations.In this thesis, we exploit a largely untapped and carefully curated resource-structured data-to improve natural language representations. Structured data includes knowledge graphs and item collections (e.g., playlists) that contain rich relationships between entities, such as the birthplace of an artist, all versions of a single song, or all songs by the same artist. These relationships can be challenging to learn from unstructured data, as they may occur infrequently, or may not even exist, in unstructured data. Yet, structured data comes with limitations: humans communicate in unstructured natural language-not structured queries-and structured data also can be incomplete and noisy.Motivated by the complementary knowledge in unstructured and structured data, we present three techniques that combine structured data with unstructured data for training natural language representations. Our techniques span the three main components of a machine learning pipeline: the training data, the model architecture, and the training objective. First, with TalkTheWalk, we use structured data to generate unstructured training data for conversational recommendation systems. By training a conversational music recommendation system over the synthetic data, we demonstrate how structured data can help improve adaptability over standard recommendation baselines. Next with Bootleg, we introduce a Transformer-based architecture that leverages structured data to learn key reasoning patterns from unstructured text for named entity disambiguation. We demonstrate that learning these reasoning patterns leads to significant lift on disambiguating entities that rarely or never occur in text, and we discuss our results applying Bootleg to a production assistant task at a major technology company. Finally, with TABi, we add structured data as supervision in a contrastive loss function to improve robustness, while using more general-purpose models. We validate that TABi not only improves rare entity retrieval, but also performs strongly in settings with incomplete and noisy structured data.The three techniques introduced in this thesis-TalkTheWalk, Bootleg, and TABi- demonstrate that training approaches that combine structured data with unstructured data can enable more robust and adaptable natural language representations.
533
$a
Electronic reproduction.
$b
Ann Arbor, Mich. :
$c
ProQuest,
$d
2023
538
$a
Mode of access: World Wide Web
650
4
$a
Language.
$3
643551
650
4
$a
Internet.
$3
527226
650
4
$a
Natural language.
$3
3562052
655
7
$a
Electronic books.
$2
lcsh
$3
542853
690
$a
0679
690
$a
0800
710
2
$a
ProQuest Information and Learning Co.
$3
783688
710
2
$a
Stanford University.
$3
754827
773
0
$t
Dissertations Abstracts International
$g
84-12B.
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30462678
$z
click for full text (PQDT)
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9485071
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入