東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Graph-based weakly-supervised method...

Talukdar, Partha Pratim.

FindBook

Google Book

Amazon

博客來

Graph-based weakly-supervised methods for information extraction & integration.

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	Graph-based weakly-supervised methods for information extraction & integration./
作者:	Talukdar, Partha Pratim.
面頁冊數:	170 p.
附註:	Source: Dissertation Abstracts International, Volume: 71-07, Section: B, page: 4359.
Contained By:	Dissertation Abstracts International71-07B.
標題:	Information Technology. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3414209
ISBN:	9781124064130

Graph-based weakly-supervised methods for information extraction & integration.
Talukdar, Partha Pratim.

Graph-based weakly-supervised methods for information extraction & integration. - 170 p.

Source: Dissertation Abstracts International, Volume: 71-07, Section: B, page: 4359.

Thesis (Ph.D.)--University of Pennsylvania, 2010.

The variety and complexity of potentially-related data resources available for querying---webpages, databases, data warehouses---has been growing ever more rapidly. There is a growing need to pose integrative queries across multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse sources. This has traditionally been the focus of research within Information Extraction (IE) and Information Integration (II) communities, with IE focusing on converting unstructured sources into structured sources, and II focusing on providing a unified view of diverse structured data sources. However, most of the current IE and II methods, which can potentially be applied to the problem of integration across sources, require large amounts of human supervision, often in the form of annotated data. This need for extensive supervision makes existing methods expensive to deploy and difficult to maintain. In this thesis, we develop techniques that generalize from limited human input, via weakly-supervised methods for IE and II. In particular, we argue that graph-based representation of data and learning over such graphs can result in effective and scalable methods for large-scale Information Extraction and Integration.

ISBN: 9781124064130Subjects--Topical Terms:

1030799
Information Technology.

Graph-based weakly-supervised methods for information extraction & integration.
LDR:04009nam 2200325 4500 001 1401080
005 20111013150244.5
008 130515s2010 ||||||||||||||||| ||eng d
020 $a 9781124064130
035 $a (UMI)AAI3414209
035 $a AAI3414209
040 $a UMI $c UMI
100 1 $a Talukdar, Partha Pratim. $3 1680191
245 1 0 $a Graph-based weakly-supervised methods for information extraction & integration.
300 $a 170 p.
500 $a Source: Dissertation Abstracts International, Volume: 71-07, Section: B, page: 4359.
500 $a Adviser: Fernando Pereira.
502 $a Thesis (Ph.D.)--University of Pennsylvania, 2010.
520 $a The variety and complexity of potentially-related data resources available for querying---webpages, databases, data warehouses---has been growing ever more rapidly. There is a growing need to pose integrative queries across multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse sources. This has traditionally been the focus of research within Information Extraction (IE) and Information Integration (II) communities, with IE focusing on converting unstructured sources into structured sources, and II focusing on providing a unified view of diverse structured data sources. However, most of the current IE and II methods, which can potentially be applied to the problem of integration across sources, require large amounts of human supervision, often in the form of annotated data. This need for extensive supervision makes existing methods expensive to deploy and difficult to maintain. In this thesis, we develop techniques that generalize from limited human input, via weakly-supervised methods for IE and II. In particular, we argue that graph-based representation of data and learning over such graphs can result in effective and scalable methods for large-scale Information Extraction and Integration.
520 $a Within IE, we focus on the problem of assigning semantic classes to entities. First we develop a context pattern induction method to extend small initial entity lists of various semantic classes. We also demonstrate that features derived from such extended entity lists can significantly improve performance of state-of-the-art discriminative taggers.
520 $a The output of pattern-based class-instance extractors is often high-precision and low-recall in nature, which is inadequate for many real world applications. We use Adsorption, a graph based label propagation algorithm, to significantly increase recall of an initial high-precision, low-recall pattern-based extractor by combining evidences from unstructured and structured text corpora. Building on Adsorption, we propose a new label propagation algorithm, Modified Adsorption (MAD), and demonstrate its effectiveness on various real-world datasets. Additionally, we also show how class-instance acquisition performance in the graph-based SSL setting can be improved by incorporating additional semantic constraints available in independently developed knowledge bases.
520 $a Within Information Integration, we develop a novel system, Q, which draws ideas from machine learning and databases to help a non-expert user construct data-integrating queries based on keywords (across databases) and interactive feedback on answers. We also present an information need-driven strategy for automatically incorporating new sources and their information in Q. We also demonstrate that Q's learning strategy is highly effective in combining the outputs of "black box" schema matchers and in re-weighting bad alignments. This removes the need to develop an expensive mediated schema which has been necessary for most previous systems.
590 $a School code: 0175.
650 4 $a Information Technology. $3 1030799
650 4 $a Information Science. $3 1017528
650 4 $a Computer Science. $3 626642
690 $a 0489
690 $a 0723
690 $a 0984
710 2 $a University of Pennsylvania. $3 1017401
773 0 $t Dissertation Abstracts International $g 71-07B.
790 1 0 $a Pereira, Fernando, $e advisor
790 $a 0175
791 $a Ph.D.
792 $a 2010
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3414209