東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Automated classification of the narr...

Goldstein, Ira.

FindBook

Google Book

Amazon

博客來

Automated classification of the narrative of medical reports using natural language processing.

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	Automated classification of the narrative of medical reports using natural language processing./
作者:	Goldstein, Ira.
面頁冊數:	205 p.
附註:	Source: Dissertation Abstracts International, Volume: 72-08, Section: B, page: .
Contained By:	Dissertation Abstracts International72-08B.
標題:	Information Technology. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3454734
ISBN:	9781124646640

Automated classification of the narrative of medical reports using natural language processing.
Goldstein, Ira.

Automated classification of the narrative of medical reports using natural language processing. - 205 p.

Source: Dissertation Abstracts International, Volume: 72-08, Section: B, page: .

Thesis (Ph.D.)--State University of New York at Albany, 2011.

In this dissertation we present three topics critical to the document level classification of the narrative in medical reports: the use of preferred terminology in light of the presence of synonymous terms, the less than optimal performance of classification systems when presented with a non-uniform distribution of classes, and the problems associated with scarcity of labeled data when presented with an imbalance of classes in the data sets.

ISBN: 9781124646640Subjects--Topical Terms:

1030799
Information Technology.

Automated classification of the narrative of medical reports using natural language processing.
LDR:03162nam 2200337 4500 001 1403080
005 20111108080413.5
008 130515s2011 ||||||||||||||||| ||eng d
020 $a 9781124646640
035 $a (UMI)AAI3454734
035 $a AAI3454734
040 $a UMI $c UMI
100 1 $a Goldstein, Ira. $3 1682322
245 1 0 $a Automated classification of the narrative of medical reports using natural language processing.
300 $a 205 p.
500 $a Source: Dissertation Abstracts International, Volume: 72-08, Section: B, page: .
500 $a Adviser: Ozlem Uzuner.
502 $a Thesis (Ph.D.)--State University of New York at Albany, 2011.
520 $a In this dissertation we present three topics critical to the document level classification of the narrative in medical reports: the use of preferred terminology in light of the presence of synonymous terms, the less than optimal performance of classification systems when presented with a non-uniform distribution of classes, and the problems associated with scarcity of labeled data when presented with an imbalance of classes in the data sets.
520 $a The literature is replete with instances of conflicting reports regarding the value of applying preferred terminology to improve system performance when presented with synonymous terms. Our study shows that the addition of preferred terms to the text of the medical reports helps to improve true positives for a hand-crafted rule-based system and that the addition did not consistently improve performance for the two machine learning systems. We show that the differences in the data, task, and approach can account for the variations in these results as well as the conflicting reports in the literature.
520 $a The imbalance of classes in data sets can cause suboptimal classification performance by systems based on an exploration of statistics for representing attributes of data. To address this problem, we developed specializing , a panel of one-versus-all classifiers, which have been activated in a strict order, and apply it to an imbalanced data set. We show that specializing performs significantly better than voting and stacking panels of classifiers when used for multi-class classification on our data.
520 $a Machine learning systems need labeled data in order to be trained, which is expensive to develop and may not always be readily available. We combine the semi-supervised approach of co-training with specializing in order to address the issues associated with a scarcity of labeled examples when presented with an imbalance of classes in the data sets. We show that by combining co training and specializing, we are able to consistently improve recall on the less well-represented classes, even when trained on a small number of labeled samples.
590 $a School code: 0668.
650 4 $a Information Technology. $3 1030799
650 4 $a Biology, Bioinformatics. $3 1018415
690 $a 0489
690 $a 0715
710 2 $a State University of New York at Albany. $b Informatics-Information Science. $3 1681154
773 0 $t Dissertation Abstracts International $g 72-08B.
790 1 0 $a Uzuner, Ozlem, $e advisor
790 1 0 $a Gangolly, Jagdish $e committee member
790 1 0 $a Berg, George $e committee member
790 $a 0668
791 $a Ph.D.
792 $a 2011
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3454734