東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

FindBook

Google Book

Amazon

博客來

Leveraging Prior Knowledge and Structure for Data-Efficient Machine Learning.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Leveraging Prior Knowledge and Structure for Data-Efficient Machine Learning./
作者:	Gunel, Beliz.
面頁冊數:	1 online resource (155 pages)
附註:	Source: Dissertations Abstracts International, Volume: 84-05, Section: B.
Contained By:	Dissertations Abstracts International84-05B.
標題:	Language. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29755946click for full text (PQDT)
ISBN:	9798357507303

Leveraging Prior Knowledge and Structure for Data-Efficient Machine Learning.
Gunel, Beliz.

Leveraging Prior Knowledge and Structure for Data-Efficient Machine Learning. - 1 online resource (155 pages)

Source: Dissertations Abstracts International, Volume: 84-05, Section: B.

Thesis (Ph.D.)--Stanford University, 2022.

Includes bibliographical references

Building high-performing end-to-end machine learning systems primarily consists of developing the machine learning model and gathering high-quality training data for the application of interest, assuming one has access to the right hardware. Although machine learning models are getting increasingly commoditized in the last few years with the rise of open-sourced platforms, curating high-quality labeled training datasets is still either costly or not feasible for many real-world applications. Hence, we mainly focus on data in this thesis, specifically how to (1) reduce dependence on labeled data with data-efficient machine learning methods through either injecting domain-specific prior knowledge or leveraging existing software systems and datasets that have initially been created for different tasks, (2) effectively manage training data and build associated tooling in order to maximize the utility of the data, and (3) improve the quality of the data representations achieved by embeddings by matching the structure of the data to the geometry of the embedding space.We start by describing our works on building data-efficient machine learning methods for accelerated magnetic resonance imaging (MRI) reconstruction through physics-driven augmentations for consistency training, scale-equivariant unrolled neural networks, and weak supervision using untrained neural networks. Then, we describe our works on building data-efficient machine learning methods for natural language understanding. In particular, we discuss a supervised contrastive learning approach for pre-trained language model fine-tuning and a large-scale data augmentation method to retrieve in-domain data. Related to effectively managing training data, we discuss our proposed information extraction system for form-like documents Glean and focus on the often overlooked aspects of training data management and associated tooling. We highlight the importance of effectively managing training data by showing that it is at least as critical as the machine learning model advances in terms of downstream extraction performance on a real-world dataset. Finally, to improve embedding representations for a variety of types of data, we investigate spaces with heterogeneous curvature. We demonstrate mixed-curvature representations provide higher quality representations both for graphs and for word embeddings. Also, we investigate integrating entity embeddings from Wikidata knowledge graph to an abstractive text summarization model to enhance factuality.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2023

Mode of access: World Wide Web

ISBN: 9798357507303Subjects--Topical Terms:

643551
Language.
Index Terms--Genre/Form:

542853
Electronic books.

Leveraging Prior Knowledge and Structure for Data-Efficient Machine Learning.
LDR:03875nmm a2200361K 4500 001 2354660
005 20230428105641.5
006 m o d
007 cr mn ---uuuuu
008 241011s2022 xx obm 000 0 eng d
020 $a 9798357507303
035 $a (MiAaPQ)AAI29755946
035 $a (MiAaPQ)STANFORDsb560hz7613
035 $a AAI29755946
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Gunel, Beliz. $3 3695019
245 1 0 $a Leveraging Prior Knowledge and Structure for Data-Efficient Machine Learning.
264 0 $c 2022
300 $a 1 online resource (155 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 84-05, Section: B.
500 $a Advisor: Chaudhari, Akshay; Pilanci, Mert; Vasanawala, Shreyas; Pauly, John; Bent, Stacey F.
502 $a Thesis (Ph.D.)--Stanford University, 2022.
504 $a Includes bibliographical references
520 $a Building high-performing end-to-end machine learning systems primarily consists of developing the machine learning model and gathering high-quality training data for the application of interest, assuming one has access to the right hardware. Although machine learning models are getting increasingly commoditized in the last few years with the rise of open-sourced platforms, curating high-quality labeled training datasets is still either costly or not feasible for many real-world applications. Hence, we mainly focus on data in this thesis, specifically how to (1) reduce dependence on labeled data with data-efficient machine learning methods through either injecting domain-specific prior knowledge or leveraging existing software systems and datasets that have initially been created for different tasks, (2) effectively manage training data and build associated tooling in order to maximize the utility of the data, and (3) improve the quality of the data representations achieved by embeddings by matching the structure of the data to the geometry of the embedding space.We start by describing our works on building data-efficient machine learning methods for accelerated magnetic resonance imaging (MRI) reconstruction through physics-driven augmentations for consistency training, scale-equivariant unrolled neural networks, and weak supervision using untrained neural networks. Then, we describe our works on building data-efficient machine learning methods for natural language understanding. In particular, we discuss a supervised contrastive learning approach for pre-trained language model fine-tuning and a large-scale data augmentation method to retrieve in-domain data. Related to effectively managing training data, we discuss our proposed information extraction system for form-like documents Glean and focus on the often overlooked aspects of training data management and associated tooling. We highlight the importance of effectively managing training data by showing that it is at least as critical as the machine learning model advances in terms of downstream extraction performance on a real-world dataset. Finally, to improve embedding representations for a variety of types of data, we investigate spaces with heterogeneous curvature. We demonstrate mixed-curvature representations provide higher quality representations both for graphs and for word embeddings. Also, we investigate integrating entity embeddings from Wikidata knowledge graph to an abstractive text summarization model to enhance factuality.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2023
538 $a Mode of access: World Wide Web
650 4 $a Language. $3 643551
650 4 $a Physics. $3 516296
650 4 $a Vortices. $3 3681507
650 4 $a Curricula. $3 3422445
650 4 $a Brain research. $3 3561789
650 4 $a Corruption. $3 615175
650 4 $a Neural networks. $3 677449
650 4 $a Role models. $3 3301554
650 4 $a Ablation. $3 3562462
650 4 $a Natural language. $3 3562052
650 4 $a Neurosciences. $3 588700
655 7 $a Electronic books. $2 lcsh $3 542853
690 $a 0679
690 $a 0605
690 $a 0800
690 $a 0317
710 2 $a ProQuest Information and Learning Co. $3 783688
710 2 $a Stanford University. $3 754827
773 0 $t Dissertations Abstracts International $g 84-05B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29755946 $z click for full text (PQDT)