語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Incorporating semantic and syntactic...
~
Wang, Yong.
FindBook
Google Book
Amazon
博客來
Incorporating semantic and syntactic information into document representation for document clustering.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Incorporating semantic and syntactic information into document representation for document clustering./
作者:
Wang, Yong.
面頁冊數:
134 p.
附註:
Source: Dissertation Abstracts International, Volume: 66-10, Section: B, page: 5514.
Contained By:
Dissertation Abstracts International66-10B.
標題:
Computer Science. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3193781
ISBN:
9780542370540
Incorporating semantic and syntactic information into document representation for document clustering.
Wang, Yong.
Incorporating semantic and syntactic information into document representation for document clustering.
- 134 p.
Source: Dissertation Abstracts International, Volume: 66-10, Section: B, page: 5514.
Thesis (Ph.D.)--Mississippi State University, 2005.
Document clustering is a widely used strategy for information retrieval and text data mining. In traditional document clustering systems, documents are represented as a bag of independent words. In this project, we propose to enrich the representation of a document by incorporating semantic information and syntactic information. Semantic analysis and syntactic analysis are performed on the raw text to identify this information. A detailed survey of current research in natural language processing, syntactic analysis, and semantic analysis is provided. Our experimental results demonstrate that incorporating semantic information and syntactic information can improve the performance of our document clustering system for most of our data sets. A statistically significant improvement can be achieved when we combine both syntactic and semantic information. Our experimental results using compound words show that using only compound words does not improve the clustering performance for our data sets. When the compound words are combined with original single words, the combined feature set gets slightly better performance for most data sets. But this improvement is not statistically significant. In order to select the best clustering algorithm for our document clustering system, a comparison of several widely used clustering algorithms is performed. Although the bisecting K-means method has advantages when working with large datasets, a traditional hierarchical clustering algorithm still achieves the best performance for our small datasets.
ISBN: 9780542370540Subjects--Topical Terms:
626642
Computer Science.
Incorporating semantic and syntactic information into document representation for document clustering.
LDR
:02464nmm 2200265 4500
001
1827704
005
20070102084739.5
008
130610s2005 eng d
020
$a
9780542370540
035
$a
(UnM)AAI3193781
035
$a
AAI3193781
040
$a
UnM
$c
UnM
100
1
$a
Wang, Yong.
$3
758651
245
1 0
$a
Incorporating semantic and syntactic information into document representation for document clustering.
300
$a
134 p.
500
$a
Source: Dissertation Abstracts International, Volume: 66-10, Section: B, page: 5514.
500
$a
Major Professor: Julia E. Hodges.
502
$a
Thesis (Ph.D.)--Mississippi State University, 2005.
520
$a
Document clustering is a widely used strategy for information retrieval and text data mining. In traditional document clustering systems, documents are represented as a bag of independent words. In this project, we propose to enrich the representation of a document by incorporating semantic information and syntactic information. Semantic analysis and syntactic analysis are performed on the raw text to identify this information. A detailed survey of current research in natural language processing, syntactic analysis, and semantic analysis is provided. Our experimental results demonstrate that incorporating semantic information and syntactic information can improve the performance of our document clustering system for most of our data sets. A statistically significant improvement can be achieved when we combine both syntactic and semantic information. Our experimental results using compound words show that using only compound words does not improve the clustering performance for our data sets. When the compound words are combined with original single words, the combined feature set gets slightly better performance for most data sets. But this improvement is not statistically significant. In order to select the best clustering algorithm for our document clustering system, a comparison of several widely used clustering algorithms is performed. Although the bisecting K-means method has advantages when working with large datasets, a traditional hierarchical clustering algorithm still achieves the best performance for our small datasets.
590
$a
School code: 0132.
650
4
$a
Computer Science.
$3
626642
690
$a
0984
710
2 0
$a
Mississippi State University.
$3
1017550
773
0
$t
Dissertation Abstracts International
$g
66-10B.
790
1 0
$a
Hodges, Julia E.,
$e
advisor
790
$a
0132
791
$a
Ph.D.
792
$a
2005
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3193781
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9218567
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入