Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Cross-Paragraph Discourse Structure ...
~
Peng, Siyao.
Linked to FindBook
Google Book
Amazon
博客來
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English./
Author:
Peng, Siyao.
Published:
Ann Arbor : ProQuest Dissertations & Theses, : 2023,
Description:
168 p.
Notes:
Source: Dissertations Abstracts International, Volume: 84-11, Section: B.
Contained By:
Dissertations Abstracts International84-11B.
Subject:
Linguistics. -
Online resource:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30419618
ISBN:
9798379454388
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English.
Peng, Siyao.
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English.
- Ann Arbor : ProQuest Dissertations & Theses, 2023 - 168 p.
Source: Dissertations Abstracts International, Volume: 84-11, Section: B.
Thesis (Ph.D.)--Georgetown University, 2023.
This item must not be sold to any third party vendors.
Hierarchical discourse structures benefit Natural Language Understanding tasks, such as text summarization and sentiment analysis. Rhetorical Structure Theory (RST) is particularly significant at the macro levels, such as between paragraphs. Moreover, RST parsing at the macro level is more challenging than at the micro level, where intra-sentential relations are the easiest to identify.Despite a dozen RST datasets available in multiple languages, a sizeable Chinese RST corpus still needs to be created. Moreover, there awaits an in-depth analysis regarding how much RST associates with macro-level structures and how much parsing performance deteriorates at the macro level and across genres. Using English and Chinese as examples, this dissertation examines how macro-level discourse relations are presented in RST and whether state-of-the-art RST parsers capture them properly.{A0}Firstly, I create the largest Chinese RST Corpus, namely Georgetown Chinese Discourse Treebank (GCDT), an open-source treebank with 50 medium-to-long documents from five different genres. I present basic statistics and highlight annotation decisions for Mandarin Chinese. I believe this sizeable multi-genre RST corpus can promote discourse analysis and RST parsing in Chinese and across languages.Secondly, I examine the association between paragraphs and RST trees from three aspects: a) studying how the lengths of EDU, sentence, and paragraph segments differ{A0}across genres and corpora; b) assessing whether or not paragraphs are fully contained in RST subtrees and 3) analyzing the distribution of intra- versus inter-paragraph relations across corpora and genres.Thirdly, I conduct parsing experiments on Chinese GCDT and English GUM using a state-of-the-art multilingual RST parser. I present both datasets' benchmark monolingual and multilingual parsing scores and boost the performance by pretraining and automatic translation. Moreover, I show that SOTA parsers are unsatisfactory in some genres and the inter-paragraph scenario.{A0}
ISBN: 9798379454388Subjects--Topical Terms:
524476
Linguistics.
Subjects--Index Terms:
Chinese
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English.
LDR
:03408nmm a2200433 4500
001
2394111
005
20240416125327.5
006
m o d
007
cr#unu||||||||
008
251215s2023 ||||||||||||||||| ||eng d
020
$a
9798379454388
035
$a
(MiAaPQ)AAI30419618
035
$a
AAI30419618
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Peng, Siyao.
$3
3763595
245
1 0
$a
Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2023
300
$a
168 p.
500
$a
Source: Dissertations Abstracts International, Volume: 84-11, Section: B.
500
$a
Advisor: Zeldes, Amir;Schneider, Nathan.
502
$a
Thesis (Ph.D.)--Georgetown University, 2023.
506
$a
This item must not be sold to any third party vendors.
520
$a
Hierarchical discourse structures benefit Natural Language Understanding tasks, such as text summarization and sentiment analysis. Rhetorical Structure Theory (RST) is particularly significant at the macro levels, such as between paragraphs. Moreover, RST parsing at the macro level is more challenging than at the micro level, where intra-sentential relations are the easiest to identify.Despite a dozen RST datasets available in multiple languages, a sizeable Chinese RST corpus still needs to be created. Moreover, there awaits an in-depth analysis regarding how much RST associates with macro-level structures and how much parsing performance deteriorates at the macro level and across genres. Using English and Chinese as examples, this dissertation examines how macro-level discourse relations are presented in RST and whether state-of-the-art RST parsers capture them properly.{A0}Firstly, I create the largest Chinese RST Corpus, namely Georgetown Chinese Discourse Treebank (GCDT), an open-source treebank with 50 medium-to-long documents from five different genres. I present basic statistics and highlight annotation decisions for Mandarin Chinese. I believe this sizeable multi-genre RST corpus can promote discourse analysis and RST parsing in Chinese and across languages.Secondly, I examine the association between paragraphs and RST trees from three aspects: a) studying how the lengths of EDU, sentence, and paragraph segments differ{A0}across genres and corpora; b) assessing whether or not paragraphs are fully contained in RST subtrees and 3) analyzing the distribution of intra- versus inter-paragraph relations across corpora and genres.Thirdly, I conduct parsing experiments on Chinese GCDT and English GUM using a state-of-the-art multilingual RST parser. I present both datasets' benchmark monolingual and multilingual parsing scores and boost the performance by pretraining and automatic translation. Moreover, I show that SOTA parsers are unsatisfactory in some genres and the inter-paragraph scenario.{A0}
590
$a
School code: 0076.
650
4
$a
Linguistics.
$3
524476
650
4
$a
Computer science.
$3
523869
650
4
$a
Rhetoric.
$3
516647
650
4
$a
Bilingual education.
$3
2122778
653
$a
Chinese
653
$a
Corpus linguistics
653
$a
Discourse parsing
653
$a
English
653
$a
Multilingual
653
$a
Rhetorical Structure Theory
653
$a
Georgetown Chinese Discourse Treebank
690
$a
0290
690
$a
0984
690
$a
0282
690
$a
0681
710
2
$a
Georgetown University.
$b
Linguistics.
$3
1026493
773
0
$t
Dissertations Abstracts International
$g
84-11B.
790
$a
0076
791
$a
Ph.D.
792
$a
2023
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30419618
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9502431
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login