東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Improving the effectiveness of langu...

Lv, Yuanhua.

FindBook

Google Book

Amazon

博客來

Improving the effectiveness of language modeling approaches to information retrieval: Bridging the theory-effectiveness gap.

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	Improving the effectiveness of language modeling approaches to information retrieval: Bridging the theory-effectiveness gap./
作者:	Lv, Yuanhua.
面頁冊數:	123 p.
附註:	Source: Dissertation Abstracts International, Volume: 75-01(E), Section: B.
Contained By:	Dissertation Abstracts International75-01B(E).
標題:	Computer Science. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3600489
ISBN:	9781303505935

Improving the effectiveness of language modeling approaches to information retrieval: Bridging the theory-effectiveness gap.
Lv, Yuanhua.

Improving the effectiveness of language modeling approaches to information retrieval: Bridging the theory-effectiveness gap. - 123 p.

Source: Dissertation Abstracts International, Volume: 75-01(E), Section: B.

Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2012.

One critical common component in any language modeling approach to retrieval is a document language model. Traditional document language models follow the bag-of-words assumption that assumes term independence and ignores the positions of the query terms in a document. For example, in a query "computer virus", the occurrences of two query terms may be close to each other in one document (likely to mean computer virus) while far apart in another document (not necessarily about computer virus), which makes a huge difference for indicating relevance but is largely underexplored.

ISBN: 9781303505935Subjects--Topical Terms:

626642
Computer Science.

Improving the effectiveness of language modeling approaches to information retrieval: Bridging the theory-effectiveness gap.
LDR:04779nam a2200349 4500 001 1965607
005 20141030134124.5
008 150210s2012 ||||||||||||||||| ||eng d
020 $a 9781303505935
035 $a (MiAaPQ)AAI3600489
035 $a AAI3600489
040 $a MiAaPQ $c MiAaPQ
100 1 $a Lv, Yuanhua. $3 2102285
245 1 0 $a Improving the effectiveness of language modeling approaches to information retrieval: Bridging the theory-effectiveness gap.
300 $a 123 p.
500 $a Source: Dissertation Abstracts International, Volume: 75-01(E), Section: B.
500 $a Adviser: ChengXiang Zhai.
502 $a Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2012.
520 $a One critical common component in any language modeling approach to retrieval is a document language model. Traditional document language models follow the bag-of-words assumption that assumes term independence and ignores the positions of the query terms in a document. For example, in a query "computer virus", the occurrences of two query terms may be close to each other in one document (likely to mean computer virus) while far apart in another document (not necessarily about computer virus), which makes a huge difference for indicating relevance but is largely underexplored.
520 $a Second, accurate estimation of query language models plays a critical role in the language modeling approach to information retrieval. Pseudo-relevance feedback (PRF) has proven very effective for improving query language models. The basic idea of PRF is to assume that a small number of top-ranked documents in the initial retrieval results are relevant and select from these documents useful terms to improve the query language model. However, existing PRF algorithms simply assume that all terms in a feedback document are equally useful, again ingoring term occurrence positions. This is often non-optimal, as a feedback document may cover multiple incoherent topics and thus contain many useless or even harmful terms.
520 $a Third, although pseudo-relevance feedback approaches to the estimation of query language models can help improve the average retrieval precision, many experiments have shown that pseudo-relevance feedback often hurts many individual queries; the risk of pseudo-relevance feedback limits its usefulness in real search engines.
520 $a Fourth, the language modeling approach scores a document mainly based on the query likelihood score. A previously unknown deficiency of the query likelihood scoring function is that it is not properly lower-bounded for long documents. As a result of this deficiency, long documents which do match the query term can often be scored unfairly as having a lower relevancy than shorter documents that do not contain the query term at all. For example, for the aforementioned query "computer virus", a long document matching both "computer" and "virus" can easily be ranked lower than a short document matching only "computer".
520 $a Fifth, the justification of using the basic query likelihood score for retrieval requires an unrealistic assumption, which states that the probability that a user who dislikes a document would use a query does not depend on the particular document. In reality, however, this assumption does not hold because a user who dislikes a document would more likely avoid using words in the document when posing a query. This theoretical gap between the basic query likelihood retrieval function and the notion of relevance suggests that the basic query likelihood function is a potentially non-optimal retrieval function.
520 $a To bridge the above heuristic or theoretical "gaps" between the theoretical framework of standard language models and the empirical application of information retrieval, in this thesis, we clearly identified the causes of these gaps, and developed general methodologies to remove the causes from language models without destroying the statistical foundation and any other desirable properties of language models. My explorations have delivered several more effective and robust general language modeling approaches, which can all be applied immediately to search engines to improve their ranking accuracy. Although this thesis focuses on language models, most of the proposed methodologies are actually more general, and can also be applied to retrieval models other than language models to bridge their theory-effectiveness gap as well. (Abstract shortened by UMI.).
590 $a School code: 0090.
650 4 $a Computer Science. $3 626642
650 4 $a Artificial Intelligence. $3 769149
650 4 $a Web Studies. $3 1026830
690 $a 0984
690 $a 0800
690 $a 0646
710 2 $a University of Illinois at Urbana-Champaign. $b Computer Science. $3 2096310
773 0 $t Dissertation Abstracts International $g 75-01B(E).
790 $a 0090
791 $a Ph.D.
792 $a 2012
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3600489