語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Term weighting revisited.
~
Singhal, Amitabh Kumar.
FindBook
Google Book
Amazon
博客來
Term weighting revisited.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Term weighting revisited./
作者:
Singhal, Amitabh Kumar.
面頁冊數:
173 p.
附註:
Source: Dissertation Abstracts International, Volume: 57-12, Section: B, page: 7609.
Contained By:
Dissertation Abstracts International57-12B.
標題:
Computer Science. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=9714899
ISBN:
0591225719
Term weighting revisited.
Singhal, Amitabh Kumar.
Term weighting revisited.
- 173 p.
Source: Dissertation Abstracts International, Volume: 57-12, Section: B, page: 7609.
Thesis (Ph.D.)--Cornell University, 1997.
Term weighting is an essential part of the modern information retrieval systems. Out of the three main components of a term weighting strategy--term frequency, inverse document frequency, and document length normalization--the term frequency factor has been investigated recently by researchers. In this work, we study the inverse document frequency, and document length normalization components of term weights.
ISBN: 0591225719Subjects--Topical Terms:
626642
Computer Science.
Term weighting revisited.
LDR
:03334nmm 2200325 4500
001
1850743
005
20051205112318.5
008
130614s1997 eng d
020
$a
0591225719
035
$a
(UnM)AAI9714899
035
$a
AAI9714899
040
$a
UnM
$c
UnM
100
1
$a
Singhal, Amitabh Kumar.
$3
1938653
245
1 0
$a
Term weighting revisited.
300
$a
173 p.
500
$a
Source: Dissertation Abstracts International, Volume: 57-12, Section: B, page: 7609.
500
$a
Chair: C. Cardie.
502
$a
Thesis (Ph.D.)--Cornell University, 1997.
520
$a
Term weighting is an essential part of the modern information retrieval systems. Out of the three main components of a term weighting strategy--term frequency, inverse document frequency, and document length normalization--the term frequency factor has been investigated recently by researchers. In this work, we study the inverse document frequency, and document length normalization components of term weights.
520
$a
We observe that a document length normalization scheme that retrieves documents of all lengths with similar chances as their likelihood of relevance will outperform another scheme which retrieves documents with chances very different from their likelihood of relevance. We present pivoted normalization, a technique that can be used to modify normalization functions to reduce the gap between the relevance and the retrieval probabilities. We present two new normalization functions--pivoted unique normalization and pivoted byte size normalization, both of which yield significant improvements over the previous state of the art normalization functions.
520
$a
When optical character recognition is used to create large information bases, term weighting schemes can be highly sensitive to the errors in the input text, introduced by the OCR process. This work examines the effects of the well known cosine normalization method in the presence of OCR errors, and proposes a new, more robust, normalization method. Experiments show that the new scheme is less sensitive to OCR errors and facilitates the use of more diverse basic weighting schemes. This study also explains why the use of cosine normalization in presence of the inverse document frequency factor is not advisable in large document collections.
520
$a
When a user types a natural language query for an IR system, certain keywords in the query are more pertinent to the user's information need than others. Most modern IR systems incorporate these distinctions by using an inverse document frequency (idf) factor in term weighting. Preliminary experiments show that the usefulness of an idf type function is high at low ranks. We observe that the main reason for this effect is the widened gap between the weights of the rare terms and the non-rare query terms. The standard idf function works very well across query sets. Experiments show that there is room for improvement in the idf function. Further studies are needed to discover a better replacement for the standard idf function.
590
$a
School code: 0058.
650
4
$a
Computer Science.
$3
626642
650
4
$a
Information Science.
$3
1017528
650
4
$a
Library Science.
$3
881164
690
$a
0984
690
$a
0723
690
$a
0399
710
2 0
$a
Cornell University.
$3
530586
773
0
$t
Dissertation Abstracts International
$g
57-12B.
790
1 0
$a
Cardie, C.,
$e
advisor
790
$a
0058
791
$a
Ph.D.
792
$a
1997
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=9714899
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9200257
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入