東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Using structural information in mach...

Princeton University.

Linked to FindBook

Google Book

Amazon

博客來

Using structural information in machine learning applications .

Record Type:	Electronic resources : Monograph/item
Title/Author:	Using structural information in machine learning applications ./
Author:	Barutcuoglu, Zafer.
Description:	97 p.
Notes:	Adviser: Robert E. Schapire.
Contained By:	Dissertation Abstracts International69-08B.
Subject:	Artificial Intelligence. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3324284
ISBN:	9780549764670

Using structural information in machine learning applications .
Barutcuoglu, Zafer.

Using structural information in machine learning applications . - 97 p.

Adviser: Robert E. Schapire.

Thesis (Ph.D.)--Princeton University, 2008.

Classification problems encountered in real-life applications often have domain-specific structural information available on the measured data, which cannot be readily accommodated by conventional machine learning algorithms. Ignoring the structure and blindly running a conventional algorithm on the numerical data can compromise the quality of solutions.

ISBN: 9780549764670Subjects--Topical Terms:

769149
Artificial Intelligence.

Using structural information in machine learning applications .
LDR:04249nmm 2200325 a 45 001 867645
005 20100804
008 100804s2008 ||||||||||||||||| ||eng d
020 $a 9780549764670
035 $a (UMI)AAI3324284
035 $a AAI3324284
040 $a UMI $c UMI
100 1 $a Barutcuoglu, Zafer. $3 1036391
245 1 0 $a Using structural information in machine learning applications .
300 $a 97 p.
500 $a Adviser: Robert E. Schapire.
500 $a Source: Dissertation Abstracts International, Volume: 69-08, Section: B, page: 4832.
502 $a Thesis (Ph.D.)--Princeton University, 2008.
520 $a Classification problems encountered in real-life applications often have domain-specific structural information available on the measured data, which cannot be readily accommodated by conventional machine learning algorithms. Ignoring the structure and blindly running a conventional algorithm on the numerical data can compromise the quality of solutions.
520 $a This thesis provides answers to two such complementary settings: one where there is a hierarchy among multiple class labels (output structure), and one where the input features are known to be sequentially correlated (input structure). Probabilistic graphical models are used to encode the dependencies, and model parameters are estimated using efficient inference algorithms. While both scenarios are motivated by real bioinformatics problems, namely gene function prediction and aneuploidy-based cancer classification, they have applications in other domains as well, such as computer graphics, music, and text classification.
520 $a The first part focuses on structure among a group of output classes. Large numbers of overlapping classes are found to be organized in hierarchies in many domains. In multi-label classification over such a hierarchy, members of a class must also belong to all of its parents. Training an independent classifier for each class is a common approach, but this may yield labels for a given example that collectively violate this constraint. We propose a principled method of resolving such inconsistencies to increase accuracy over all classes. Our approach is to view the hierarchy as a graphical model, and then to employ Bayesian inference to infer the most likely set of hierarchically consistent class labels from independent base classifier predictions. This method is applicable over any type of base classification algorithm. Experiments on synthetic data, as well as real data sets from bioinformatics and computer graphics domains, illustrate its behavior under a range of conditions, and demonstrate that it is able to improve accuracy at all levels of a hierarchy.
520 $a The second part focuses on structure among input features, in the form of a sequential relationship. Generic non-sequential machine learning models assume no importance in the order of inputs. Conversely, sequence models (e.g. Hidden Markov Models) need to assume stationarity to keep the number of parameters manageable, modeling only sequence-wide stability and losing the significance of particular positions. We propose a fixed-length sequence classification method that combines sequential correlations with positional features in a sparsely regularized solution, with training and inference algorithms in linear-time of sequence length. Motivated by the problem of tumor classification by genetic copy number changes, our method can identify copy number alteration regions in noisy array-CGH data, and locate the genes of clinical relevance driving these alterations and affecting the cancer label. Experiments on synthetic array-CGH data modeled from real human breast tumors, as well as real tumor datasets from breast cancer, bladder cancer, and uveal melanoma, demonstrate that the our method matches or exceeds state-of-the-art methods in accuracy, and is able to produce biologically significant predictions for clinically relevant genes.
590 $a School code: 0181.
650 4 $a Artificial Intelligence. $3 769149
650 4 $a Biology, Bioinformatics. $3 1018415
650 4 $a Computer Science. $3 626642
690 $a 0715
690 $a 0800
690 $a 0984
710 2 $a Princeton University. $3 645579
773 0 $t Dissertation Abstracts International $g 69-08B.
790 $a 0181
790 1 0 $a Schapire, Robert E., $e advisor
791 $a Ph.D.
792 $a 2008
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3324284