東華大學圖書館 |

Nearest Neighbor Methods with Applications in Functional Estimation and Machine Learning.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Nearest Neighbor Methods with Applications in Functional Estimation and Machine Learning./
作者:	Zhao, Puning.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	295 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
Contained By:	Dissertations Abstracts International83-02B.
標題:	Electrical engineering. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28499249
ISBN:	9798538100361

Nearest Neighbor Methods with Applications in Functional Estimation and Machine Learning.
Zhao, Puning.

Nearest Neighbor Methods with Applications in Functional Estimation and Machine Learning. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 295 p.

Source: Dissertations Abstracts International, Volume: 83-02, Section: B.

Thesis (Ph.D.)--University of California, Davis, 2021.

This item must not be sold to any third party vendors.

k Nearest Neighbor (kNN) method is an important statistical method. There are several advantages of kNN methods. Firstly, they are usually computationally fast and do not require too much parameter tuning. Secondly, kNN methods are purely nonparametric, which means that it can automatically adapt to any continuous underlying distributions, without relying on any specific models. Thirdly, for many statistical problems, including density estimation, functional estimation, classification and regression, kNN methods are all proven to be consistent, as long as a proper $k$ is selected. With these advantages, kNN methods are widely used in these problems.In this dissertation, we mainly investigate theoretical properties of kNN method under three scenarios.Firstly, we discuss the theoretical properties of kNN methods for estimation of differential entropy and mutual information. A commonly used kNN entropy estimator is called Kozachenko-Leonenko estimator, which achieves the best empirical performance for a large variety of distributions. We study the convergence rate of the Kozachenko-Leonenko estimator under different scenarios. If the distribution has heavy tails, then the Kozachenko-Leonenko estimator may not be consistent. To improve Kozachenko-Leonenko estimator, we use truncated kNN distance instead. We derive the minimax convergence rate, which characterizes the fundamental limits of entropy estimation. We show that the Kozachenko-Leonenko estimator with truncated kNN distances is nearly minimax rate optimal, up to a log polynomial factor. Building on the analysis of Kozachenko-Leonenko entropy estimator, we then investigate mutual information estimation. A widely used kNN based mutual information estimator is called called Kraskov, St{\\"o}gbauer and Grassberger (KSG) estimator. We derive the convergence rate of an upper bound of bias and variance of KSG mutual information estimator. Our results hold for distributions whose densities can approach zero. Secondly, we analyze the kNN method in Kullback-Leibler (KL) divergence estimation. Estimating KL divergence from identical and independently distributed samples is an importantproblem in various domains. One simple and effective estimator isbased on the $k$ nearest neighbor distances between these samples.We analyze the convergence rates of the bias andvariance of this estimator. We discuss two types of distributions, including those with densities bounded away from zero and those whose densities can approach zero. Furthermore, for both two cases, we derive a lower boundof the minimax mean square error and show that kNN methodis asymptotically minimax rate optimal.Finally, we analyze the kNN method in supervised learning, i.e. classification and regression. The problem can be formulated as the prediction of target $Y$ based on feature vector $\\mathbf{X}\\in \\mathbb{R}.

ISBN: 9798538100361Subjects--Topical Terms:

649834
Electrical engineering.
Subjects--Index Terms:

Estimation

Nearest Neighbor Methods with Applications in Functional Estimation and Machine Learning.
LDR:04641nmm a2200361 4500 001 2352475
005 20221128103952.5
008 241004s2021 ||||||||||||||||| ||eng d
020 $a 9798538100361
035 $a (MiAaPQ)AAI28499249
035 $a AAI28499249
040 $a MiAaPQ $c MiAaPQ
100 1 $a Zhao, Puning. $3 3692101
245 1 0 $a Nearest Neighbor Methods with Applications in Functional Estimation and Machine Learning.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 295 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
500 $a Advisor: Lai, Lifeng.
502 $a Thesis (Ph.D.)--University of California, Davis, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a k Nearest Neighbor (kNN) method is an important statistical method. There are several advantages of kNN methods. Firstly, they are usually computationally fast and do not require too much parameter tuning. Secondly, kNN methods are purely nonparametric, which means that it can automatically adapt to any continuous underlying distributions, without relying on any specific models. Thirdly, for many statistical problems, including density estimation, functional estimation, classification and regression, kNN methods are all proven to be consistent, as long as a proper $k$ is selected. With these advantages, kNN methods are widely used in these problems.In this dissertation, we mainly investigate theoretical properties of kNN method under three scenarios.Firstly, we discuss the theoretical properties of kNN methods for estimation of differential entropy and mutual information. A commonly used kNN entropy estimator is called Kozachenko-Leonenko estimator, which achieves the best empirical performance for a large variety of distributions. We study the convergence rate of the Kozachenko-Leonenko estimator under different scenarios. If the distribution has heavy tails, then the Kozachenko-Leonenko estimator may not be consistent. To improve Kozachenko-Leonenko estimator, we use truncated kNN distance instead. We derive the minimax convergence rate, which characterizes the fundamental limits of entropy estimation. We show that the Kozachenko-Leonenko estimator with truncated kNN distances is nearly minimax rate optimal, up to a log polynomial factor. Building on the analysis of Kozachenko-Leonenko entropy estimator, we then investigate mutual information estimation. A widely used kNN based mutual information estimator is called called Kraskov, St{\\"o}gbauer and Grassberger (KSG) estimator. We derive the convergence rate of an upper bound of bias and variance of KSG mutual information estimator. Our results hold for distributions whose densities can approach zero. Secondly, we analyze the kNN method in Kullback-Leibler (KL) divergence estimation. Estimating KL divergence from identical and independently distributed samples is an importantproblem in various domains. One simple and effective estimator isbased on the $k$ nearest neighbor distances between these samples.We analyze the convergence rates of the bias andvariance of this estimator. We discuss two types of distributions, including those with densities bounded away from zero and those whose densities can approach zero. Furthermore, for both two cases, we derive a lower boundof the minimax mean square error and show that kNN methodis asymptotically minimax rate optimal.Finally, we analyze the kNN method in supervised learning, i.e. classification and regression. The problem can be formulated as the prediction of target $Y$ based on feature vector $\\mathbf{X}\\in \\mathbb{R}.
520 $a d$. Depending on whether $Y$ is numerical or categorical, the problem is called classification and regression, respectively. In our analysis, we discuss kNN methods for binary classification and regression. We first analyze the convergence rate of the standard kNN classification and regression, in which the same $k$ is used for all training samples, under a large variety of underlying feature distributions. We then derive the minimax convergence rate. The result shows that there exists a gap between the convergence rate standard kNN method and the minimax rate. We then design an adaptive kNN method, and prove that the proposed method is minimax rate optimal.
590 $a School code: 0029.
650 4 $a Electrical engineering. $3 649834
650 4 $a Theoretical mathematics. $3 3173530
650 4 $a Statistics. $3 517247
650 4 $a Theoretical physics. $3 2144760
650 4 $a Random variables. $3 646291
650 4 $a Normal distribution. $3 3561025
650 4 $a Classification. $3 595585
650 4 $a Methods. $3 3560391
650 4 $a Variance analysis. $3 3544969
650 4 $a Bias. $2 gtt $3 1374837
653 $a Estimation
653 $a kNN
690 $a 0544
690 $a 0642
690 $a 0753
690 $a 0463
710 2 $a University of California, Davis. $b Electrical and Computer Engineering. $3 1672487
773 0 $t Dissertations Abstracts International $g 83-02B.
790 $a 0029
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28499249