語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Statistical Inference of Online Decision Making.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Statistical Inference of Online Decision Making./
作者:
Chen, Haoyu.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:
130 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
Contained By:
Dissertations Abstracts International83-02B.
標題:
Numerical analysis. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28552548
ISBN:
9798522941758
Statistical Inference of Online Decision Making.
Chen, Haoyu.
Statistical Inference of Online Decision Making.
- Ann Arbor : ProQuest Dissertations & Theses, 2021 - 130 p.
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
Thesis (Ph.D.)--North Carolina State University, 2021.
This item must not be sold to any third party vendors.
Nowadays, the concept of personalization has gained huge popularity and every decision maker would like to optimize the decisions by exploiting personal data. Personalized decision making has been applied to scenarios including web content recommendation, clinical trials, hiring and admission, and will be used more frequently in the future. Built upon that, online decision making aims to learn the optimal decision rule by making personalized decisions and updating the decision rule recursively. It has become easier than before with the help of big data, but new challenges also come along. First, we need to figure out what strategies and models are suitable for the task. Second, we also need inferential tools to assess the strategies and models. Third, we need a good implementation of the strategies that can efficiently learn and assess decision making rules onlineIn Chapter 1, we first give a brief introduction to the origin of the online decision making problem and then review the recent solutions from a statistical perspective. The problem is set up in the contextual bandit framework since it is well modeled and possesses the main features shared by all online decision making problems. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward using different exploration strategies. We focus our attention on solutions that use the randomized strategies to deal with the exploration-and-exploitation dilemma and group them into parametric and non-parametric methods by how they estimate the reward model.In online decision making, it is meaningful to know if the posited reward model is reasonable and how the model performs in the asymptotic sense. Therefore, both Chapters 2 and 3 focus on the statistical inference of online decision making. In Chapter 2, we study this problem using the contextual bandit setting with a linear reward model. The "-greedy policy is adopted to address the classic exploration-and-exploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic normality. Based on the properties of the parameter estimators, we further show that the in-sample inverse propensity weighted value estimator is asymptotically normalWhile still stressing the importance of statistical inference, we also want to devise an efficient algorithm for online decision making in Chapter 3. Since the decision rule should be updated once per step, an offline update that uses all the historical data is inefficient in computation and storage. To this end, we propose a completely online algorithm that can make decisions and update the decision rule online via stochastic gradient descent. It is not only efficient but also supports all kinds of parametric reward models. Again, we establish the asymptotic normality of the parameter estimator produced by our algorithm and the online inverse probability weighted value estimator we used to estimate the optimal value. Online plugin estimators for the variance of the parameter and value estimators are also provided and shown to be consistent, so that interval estimation and hypothesis test are possible using our method. The theoretical results in both Chapters 2 and 3 are tested by simulations and an application to the Yahoo! news article recommendation data. The proofs of the theorems and extended simulation results are included in the appendix.
ISBN: 9798522941758Subjects--Topical Terms:
517751
Numerical analysis.
Statistical Inference of Online Decision Making.
LDR
:04849nmm a2200385 4500
001
2349577
005
20230509091117.5
006
m o d
007
cr#unu||||||||
008
241004s2021 ||||||||||||||||| ||eng d
020
$a
9798522941758
035
$a
(MiAaPQ)AAI28552548
035
$a
(MiAaPQ)NCState_Univ18402038618
035
$a
AAI28552548
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Chen, Haoyu.
$3
3688987
245
1 0
$a
Statistical Inference of Online Decision Making.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2021
300
$a
130 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
500
$a
Advisor: Lu, Wenbin;Song, Rui.
502
$a
Thesis (Ph.D.)--North Carolina State University, 2021.
506
$a
This item must not be sold to any third party vendors.
520
$a
Nowadays, the concept of personalization has gained huge popularity and every decision maker would like to optimize the decisions by exploiting personal data. Personalized decision making has been applied to scenarios including web content recommendation, clinical trials, hiring and admission, and will be used more frequently in the future. Built upon that, online decision making aims to learn the optimal decision rule by making personalized decisions and updating the decision rule recursively. It has become easier than before with the help of big data, but new challenges also come along. First, we need to figure out what strategies and models are suitable for the task. Second, we also need inferential tools to assess the strategies and models. Third, we need a good implementation of the strategies that can efficiently learn and assess decision making rules onlineIn Chapter 1, we first give a brief introduction to the origin of the online decision making problem and then review the recent solutions from a statistical perspective. The problem is set up in the contextual bandit framework since it is well modeled and possesses the main features shared by all online decision making problems. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward using different exploration strategies. We focus our attention on solutions that use the randomized strategies to deal with the exploration-and-exploitation dilemma and group them into parametric and non-parametric methods by how they estimate the reward model.In online decision making, it is meaningful to know if the posited reward model is reasonable and how the model performs in the asymptotic sense. Therefore, both Chapters 2 and 3 focus on the statistical inference of online decision making. In Chapter 2, we study this problem using the contextual bandit setting with a linear reward model. The "-greedy policy is adopted to address the classic exploration-and-exploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic normality. Based on the properties of the parameter estimators, we further show that the in-sample inverse propensity weighted value estimator is asymptotically normalWhile still stressing the importance of statistical inference, we also want to devise an efficient algorithm for online decision making in Chapter 3. Since the decision rule should be updated once per step, an offline update that uses all the historical data is inefficient in computation and storage. To this end, we propose a completely online algorithm that can make decisions and update the decision rule online via stochastic gradient descent. It is not only efficient but also supports all kinds of parametric reward models. Again, we establish the asymptotic normality of the parameter estimator produced by our algorithm and the online inverse probability weighted value estimator we used to estimate the optimal value. Online plugin estimators for the variance of the parameter and value estimators are also provided and shown to be consistent, so that interval estimation and hypothesis test are possible using our method. The theoretical results in both Chapters 2 and 3 are tested by simulations and an application to the Yahoo! news article recommendation data. The proofs of the theorems and extended simulation results are included in the appendix.
590
$a
School code: 0155.
650
4
$a
Numerical analysis.
$3
517751
650
4
$a
Confidence intervals.
$3
566017
650
4
$a
Expected values.
$3
3563993
650
4
$a
Distance learning.
$3
3557921
650
4
$a
Normal distribution.
$3
3561025
650
4
$a
Decision making.
$3
517204
650
4
$a
Bias.
$2
gtt
$3
1374837
650
4
$a
Applied mathematics.
$3
2122814
650
4
$a
Mathematics.
$3
515831
650
4
$a
Philosophy.
$3
516511
650
4
$a
Psychology.
$3
519075
650
4
$a
Statistics.
$3
517247
650
4
$a
Standard deviation.
$3
3560390
650
4
$a
Simulation.
$3
644748
650
4
$a
Hypotheses.
$3
3560118
650
4
$a
Experiments.
$3
525909
650
4
$a
Data analysis.
$2
bisacsh
$3
3515250
690
$a
0364
690
$a
0338
690
$a
0405
690
$a
0422
690
$a
0621
690
$a
0463
710
2
$a
North Carolina State University.
$3
1018772
773
0
$t
Dissertations Abstracts International
$g
83-02B.
790
$a
0155
791
$a
Ph.D.
792
$a
2021
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28552548
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9472015
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入