東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Statistical learning in drug discove...

Wang, Xu.

Linked to FindBook

Google Book

Amazon

博客來

Statistical learning in drug discovery via clustering and mixtures.

Record Type:	Language materials, printed : Monograph/item
Title/Author:	Statistical learning in drug discovery via clustering and mixtures./
Author:	Wang, Xu.
Description:	208 p.
Notes:	Source: Dissertation Abstracts International, Volume: 69-01, Section: B, page: 0400.
Contained By:	Dissertation Abstracts International69-01B.
Subject:	Biology, Bioinformatics. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=NR36445
ISBN:	9780494364451

Statistical learning in drug discovery via clustering and mixtures.
Wang, Xu.

Statistical learning in drug discovery via clustering and mixtures. - 208 p.

Source: Dissertation Abstracts International, Volume: 69-01, Section: B, page: 0400.

Thesis (Ph.D.)--University of Waterloo (Canada), 2007.

In drug discovery, thousands of compounds are assayed to detect activity against a biological target. The goal of drug discovery is to identify compounds that are active against the target (e.g. inhibit a virus). Statistical learning in drug discovery seeks to build a model that uses descriptors characterizing molecular structure to predict biological activity. However, the characteristics of drug discovery data can make it difficult to model the relationship between molecular descriptors and biological activity. Among these characteristics are the rarity of active compounds, the large volume of compounds tested by high-throughput screening, and the complexity of molecular structure and its relationship to activity.

ISBN: 9780494364451Subjects--Topical Terms:

1018415
Biology, Bioinformatics.

Statistical learning in drug discovery via clustering and mixtures.
LDR:04735nam 2200313 a 45 001 962260
005 20110830
008 110831s2007 ||||||||||||||||| ||eng d
020 $a 9780494364451
035 $a (UMI)AAINR36445
035 $a AAINR36445
040 $a UMI $c UMI
100 1 $a Wang, Xu. $3 1028898
245 1 0 $a Statistical learning in drug discovery via clustering and mixtures.
300 $a 208 p.
500 $a Source: Dissertation Abstracts International, Volume: 69-01, Section: B, page: 0400.
502 $a Thesis (Ph.D.)--University of Waterloo (Canada), 2007.
520 $a In drug discovery, thousands of compounds are assayed to detect activity against a biological target. The goal of drug discovery is to identify compounds that are active against the target (e.g. inhibit a virus). Statistical learning in drug discovery seeks to build a model that uses descriptors characterizing molecular structure to predict biological activity. However, the characteristics of drug discovery data can make it difficult to model the relationship between molecular descriptors and biological activity. Among these characteristics are the rarity of active compounds, the large volume of compounds tested by high-throughput screening, and the complexity of molecular structure and its relationship to activity.
520 $a This thesis focuses on the design of statistical learning algorithms/models and their applications to drug discovery. The two main parts of the thesis are: an algorithm-based statistical method and a more formal model-based approach. Both approaches can facilitate and accelerate the process of developing new drugs. A unifying theme is the use of unsupervised methods as components of supervised learning algorithms/models.
520 $a In the first part of the thesis, we explore a sequential screening approach, Cluster Structure-Activity Relationship Analysis (CSARA). Sequential screening integrates High Throughput Screening with mathematical modeling to sequentially select the best compounds. CSARA is a cluster-based and algorithm driven method. To gain further insight into this method, we use three carefully designed experiments to compare predictive accuracy with Recursive Partitioning, a popular structure-activity relationship analysis method. The experiments show that CSARA outperforms Recursive Partitioning. Comparisons include problems with many descriptor sets and situations in which many descriptors are not important for activity.
520 $a In the second part of the thesis, we propose and develop constrained mixture discriminant analysis (CMDA), a model-based method. The main idea of CMDA is to model the distribution of the observations given the class label (e.g. active or inactive class) as a constrained mixture distribution, and then use Bayes' rule to predict the probability of being active for each observation in the testing set. Constraints are used to deal with the otherwise explosive growth of the number of parameters with increasing dimensionality. CMDA is designed to solve several challenges in modeling drug data sets, such as multiple mechanisms, the rare target problem (i.e. imbalanced classes), and the identification of relevant subspaces of descriptors (i.e. variable selection).
520 $a We focus on the CMDA1 model, in which univariate densities form the building blocks of the mixture components. Due to the unboundedness of the CMDA1 log likelihood function, it is easy for the EM algorithm to converge to degenerate solutions. A special Multi-Step EM algorithm is therefore developed and explored via several experimental comparisons. Using the multi-step EM algorithm, the CMDA1 model is compared to model-based clustering discriminant analysis (MclustDA). The CMDA1 model is either superior to or competitive with the MclustDA model, depending on which model generates the data. The CMDA1 model has better performance than the MclustDA model when the data are high-dimensional and unbalanced, an essential feature of the drug discovery problem!
520 $a An alternate approach to the problem of degeneracy is penalized estimation. By introducing a group of simple penalty functions, we consider penalized maximum likelihood estimation of the CMDA1 and CMDA2 models. This strategy improves the convergence of the conventional EM algorithm, and helps avoid degenerate solutions. Extending techniques from Chen et al. (2007), we prove that the PMLE's of the two-dimensional CMDA1 model can be asymptotically consistent.
590 $a School code: 1141.
650 4 $a Biology, Bioinformatics. $3 1018415
650 4 $a Statistics. $3 517247
690 $a 0463
690 $a 0715
710 2 $a University of Waterloo (Canada). $3 1017669
773 0 $t Dissertation Abstracts International $g 69-01B.
790 $a 1141
791 $a Ph.D.
792 $a 2007
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=NR36445