Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Automated classification of the narr...
~
Goldstein, Ira.
Linked to FindBook
Google Book
Amazon
博客來
Automated classification of the narrative of medical reports using natural language processing.
Record Type:
Language materials, printed : Monograph/item
Title/Author:
Automated classification of the narrative of medical reports using natural language processing./
Author:
Goldstein, Ira.
Description:
205 p.
Notes:
Source: Dissertation Abstracts International, Volume: 72-08, Section: B, page: .
Contained By:
Dissertation Abstracts International72-08B.
Subject:
Information Technology. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3454734
ISBN:
9781124646640
Automated classification of the narrative of medical reports using natural language processing.
Goldstein, Ira.
Automated classification of the narrative of medical reports using natural language processing.
- 205 p.
Source: Dissertation Abstracts International, Volume: 72-08, Section: B, page: .
Thesis (Ph.D.)--State University of New York at Albany, 2011.
In this dissertation we present three topics critical to the document level classification of the narrative in medical reports: the use of preferred terminology in light of the presence of synonymous terms, the less than optimal performance of classification systems when presented with a non-uniform distribution of classes, and the problems associated with scarcity of labeled data when presented with an imbalance of classes in the data sets.
ISBN: 9781124646640Subjects--Topical Terms:
1030799
Information Technology.
Automated classification of the narrative of medical reports using natural language processing.
LDR
:03162nam 2200337 4500
001
1403080
005
20111108080413.5
008
130515s2011 ||||||||||||||||| ||eng d
020
$a
9781124646640
035
$a
(UMI)AAI3454734
035
$a
AAI3454734
040
$a
UMI
$c
UMI
100
1
$a
Goldstein, Ira.
$3
1682322
245
1 0
$a
Automated classification of the narrative of medical reports using natural language processing.
300
$a
205 p.
500
$a
Source: Dissertation Abstracts International, Volume: 72-08, Section: B, page: .
500
$a
Adviser: Ozlem Uzuner.
502
$a
Thesis (Ph.D.)--State University of New York at Albany, 2011.
520
$a
In this dissertation we present three topics critical to the document level classification of the narrative in medical reports: the use of preferred terminology in light of the presence of synonymous terms, the less than optimal performance of classification systems when presented with a non-uniform distribution of classes, and the problems associated with scarcity of labeled data when presented with an imbalance of classes in the data sets.
520
$a
The literature is replete with instances of conflicting reports regarding the value of applying preferred terminology to improve system performance when presented with synonymous terms. Our study shows that the addition of preferred terms to the text of the medical reports helps to improve true positives for a hand-crafted rule-based system and that the addition did not consistently improve performance for the two machine learning systems. We show that the differences in the data, task, and approach can account for the variations in these results as well as the conflicting reports in the literature.
520
$a
The imbalance of classes in data sets can cause suboptimal classification performance by systems based on an exploration of statistics for representing attributes of data. To address this problem, we developed specializing , a panel of one-versus-all classifiers, which have been activated in a strict order, and apply it to an imbalanced data set. We show that specializing performs significantly better than voting and stacking panels of classifiers when used for multi-class classification on our data.
520
$a
Machine learning systems need labeled data in order to be trained, which is expensive to develop and may not always be readily available. We combine the semi-supervised approach of co-training with specializing in order to address the issues associated with a scarcity of labeled examples when presented with an imbalance of classes in the data sets. We show that by combining co training and specializing, we are able to consistently improve recall on the less well-represented classes, even when trained on a small number of labeled samples.
590
$a
School code: 0668.
650
4
$a
Information Technology.
$3
1030799
650
4
$a
Biology, Bioinformatics.
$3
1018415
690
$a
0489
690
$a
0715
710
2
$a
State University of New York at Albany.
$b
Informatics-Information Science.
$3
1681154
773
0
$t
Dissertation Abstracts International
$g
72-08B.
790
1 0
$a
Uzuner, Ozlem,
$e
advisor
790
1 0
$a
Gangolly, Jagdish
$e
committee member
790
1 0
$a
Berg, George
$e
committee member
790
$a
0668
791
$a
Ph.D.
792
$a
2011
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3454734
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9166219
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login