語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Cross-Paragraph Discourse Structure ...
~
Peng, Siyao.
FindBook
Google Book
Amazon
博客來
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English./
作者:
Peng, Siyao.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2023,
面頁冊數:
168 p.
附註:
Source: Dissertations Abstracts International, Volume: 84-11, Section: B.
Contained By:
Dissertations Abstracts International84-11B.
標題:
Linguistics. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30419618
ISBN:
9798379454388
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English.
Peng, Siyao.
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English.
- Ann Arbor : ProQuest Dissertations & Theses, 2023 - 168 p.
Source: Dissertations Abstracts International, Volume: 84-11, Section: B.
Thesis (Ph.D.)--Georgetown University, 2023.
This item must not be sold to any third party vendors.
Hierarchical discourse structures benefit Natural Language Understanding tasks, such as text summarization and sentiment analysis. Rhetorical Structure Theory (RST) is particularly significant at the macro levels, such as between paragraphs. Moreover, RST parsing at the macro level is more challenging than at the micro level, where intra-sentential relations are the easiest to identify.Despite a dozen RST datasets available in multiple languages, a sizeable Chinese RST corpus still needs to be created. Moreover, there awaits an in-depth analysis regarding how much RST associates with macro-level structures and how much parsing performance deteriorates at the macro level and across genres. Using English and Chinese as examples, this dissertation examines how macro-level discourse relations are presented in RST and whether state-of-the-art RST parsers capture them properly.{A0}Firstly, I create the largest Chinese RST Corpus, namely Georgetown Chinese Discourse Treebank (GCDT), an open-source treebank with 50 medium-to-long documents from five different genres. I present basic statistics and highlight annotation decisions for Mandarin Chinese. I believe this sizeable multi-genre RST corpus can promote discourse analysis and RST parsing in Chinese and across languages.Secondly, I examine the association between paragraphs and RST trees from three aspects: a) studying how the lengths of EDU, sentence, and paragraph segments differ{A0}across genres and corpora; b) assessing whether or not paragraphs are fully contained in RST subtrees and 3) analyzing the distribution of intra- versus inter-paragraph relations across corpora and genres.Thirdly, I conduct parsing experiments on Chinese GCDT and English GUM using a state-of-the-art multilingual RST parser. I present both datasets' benchmark monolingual and multilingual parsing scores and boost the performance by pretraining and automatic translation. Moreover, I show that SOTA parsers are unsatisfactory in some genres and the inter-paragraph scenario.{A0}
ISBN: 9798379454388Subjects--Topical Terms:
524476
Linguistics.
Subjects--Index Terms:
Chinese
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English.
LDR
:03408nmm a2200433 4500
001
2394111
005
20240416125327.5
006
m o d
007
cr#unu||||||||
008
251215s2023 ||||||||||||||||| ||eng d
020
$a
9798379454388
035
$a
(MiAaPQ)AAI30419618
035
$a
AAI30419618
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Peng, Siyao.
$3
3763595
245
1 0
$a
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2023
300
$a
168 p.
500
$a
Source: Dissertations Abstracts International, Volume: 84-11, Section: B.
500
$a
Advisor: Zeldes, Amir;Schneider, Nathan.
502
$a
Thesis (Ph.D.)--Georgetown University, 2023.
506
$a
This item must not be sold to any third party vendors.
520
$a
Hierarchical discourse structures benefit Natural Language Understanding tasks, such as text summarization and sentiment analysis. Rhetorical Structure Theory (RST) is particularly significant at the macro levels, such as between paragraphs. Moreover, RST parsing at the macro level is more challenging than at the micro level, where intra-sentential relations are the easiest to identify.Despite a dozen RST datasets available in multiple languages, a sizeable Chinese RST corpus still needs to be created. Moreover, there awaits an in-depth analysis regarding how much RST associates with macro-level structures and how much parsing performance deteriorates at the macro level and across genres. Using English and Chinese as examples, this dissertation examines how macro-level discourse relations are presented in RST and whether state-of-the-art RST parsers capture them properly.{A0}Firstly, I create the largest Chinese RST Corpus, namely Georgetown Chinese Discourse Treebank (GCDT), an open-source treebank with 50 medium-to-long documents from five different genres. I present basic statistics and highlight annotation decisions for Mandarin Chinese. I believe this sizeable multi-genre RST corpus can promote discourse analysis and RST parsing in Chinese and across languages.Secondly, I examine the association between paragraphs and RST trees from three aspects: a) studying how the lengths of EDU, sentence, and paragraph segments differ{A0}across genres and corpora; b) assessing whether or not paragraphs are fully contained in RST subtrees and 3) analyzing the distribution of intra- versus inter-paragraph relations across corpora and genres.Thirdly, I conduct parsing experiments on Chinese GCDT and English GUM using a state-of-the-art multilingual RST parser. I present both datasets' benchmark monolingual and multilingual parsing scores and boost the performance by pretraining and automatic translation. Moreover, I show that SOTA parsers are unsatisfactory in some genres and the inter-paragraph scenario.{A0}
590
$a
School code: 0076.
650
4
$a
Linguistics.
$3
524476
650
4
$a
Computer science.
$3
523869
650
4
$a
Rhetoric.
$3
516647
650
4
$a
Bilingual education.
$3
2122778
653
$a
Chinese
653
$a
Corpus linguistics
653
$a
Discourse parsing
653
$a
English
653
$a
Multilingual
653
$a
Rhetorical Structure Theory
653
$a
Georgetown Chinese Discourse Treebank
690
$a
0290
690
$a
0984
690
$a
0282
690
$a
0681
710
2
$a
Georgetown University.
$b
Linguistics.
$3
1026493
773
0
$t
Dissertations Abstracts International
$g
84-11B.
790
$a
0076
791
$a
Ph.D.
792
$a
2023
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30419618
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9502431
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入