東華大學圖書館 |

Adaptivity and Efficiency in the Design of Sequential Experiments.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Adaptivity and Efficiency in the Design of Sequential Experiments./
作者:	Momenisedei, Ahmadreza.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	176 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-06, Section: B.
Contained By:	Dissertations Abstracts International83-06B.
標題:	Numerical analysis. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28812960
ISBN:	9798494453877

Adaptivity and Efficiency in the Design of Sequential Experiments.
Momenisedei, Ahmadreza.

Adaptivity and Efficiency in the Design of Sequential Experiments. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 176 p.

Source: Dissertations Abstracts International, Volume: 83-06, Section: B.

Thesis (Ph.D.)--Stanford University, 2021.

This item must not be sold to any third party vendors.

Sequential experiments that take place in many practical settings are often characterized by an exploration-exploitation tradeoff that is captured by the multi-armed bandit framework. This framework has been studied extensively in the literature and applied in various application domains including revenue management, service operations, online retail, recommender systems, and healthcare. We study fundamental problems where there are uncertainties about key characteristics governing the sequential experimentation environment, such as the information acquisition process, the payoff structure, and the feedback structure. Furthermore, we design and analyze near-optimal dynamic optimization policies that are adaptive with respect to these characteristics.In many practical settings additional data may become available between decision epochs of a sequential experiments. In Chapter 2, we study the performance gain one may achieve when leveraging such auxiliary data, and the design of algorithm that effectively do so without prior information on the underlying information arrival process. We introduce a generalized formulation, which considers a broad class of distributions that are informative about rewards from actions, and allows observations from these distributions to arrive according to an arbitrary and a priori unknown process. When it is known how to map auxiliary data to reward estimates, we obtain matching bounds that characterize the best achievable performance as a function of the information arrival process. In terms of achieving optimal performance, we establish that upper confidence bound and Thompson sampling policies possess natural robustness with respect to the information arrival process, which uncovers a novel property of these popular algorithms and further lends credence to their appeal. In Chapter 3, we consider the challenging setting where the mappings connecting auxiliary data and rewards are a unknown, characterize a necessary and sufficient condition under which auxiliary information allows performance improvement, and devise a new policy that is near-optimal in that setting. We use data from a large media site to analyze the value that may be captured by leveraging auxiliary data for designing content recommendations.In Chapter 4, we study a non-parametric multi-armed bandit problem with stochastic covariates, where a key complexity driver is the smoothness of payoff functions with respect to covariates. Previous studies have focused on deriving minimax-optimal algorithms in cases where it is a priori known how smooth the payoff functions are. In practice, however, the smoothness of payoff functions is typically not known in advance, and misspecification of smoothness may severely deteriorate the performance of existing methods. In this work, we consider a framework where the smoothness of payoff functions is not known, and study when and how algorithms may adapt to unknown smoothness. First, we establish that designing algorithms that adapt to unknown smoothness of payoff functions is, in general, impossible. However, under a self-similarity condition (which does not reduce the minimax complexity of the dynamic optimization problem at hand), we establish that adapting to unknown smoothness is possible, and further devise a general policy for achieving smoothness-adaptive performance. Our policy infers the smoothness of payoffs throughout the decision-making process, while leveraging the structure of non-adaptive off-the-shelf policies. We establish that for problem settings with either differentiable or non-differentiable payoff functions this policy matches (up to a logarithmic scale) the regret rate that is achievable when the smoothness of payoffs is known a priori.

ISBN: 9798494453877Subjects--Topical Terms:

517751
Numerical analysis.

Adaptivity and Efficiency in the Design of Sequential Experiments.
LDR:04819nmm a2200313 4500 001 2346911
005 20220706051321.5
008 241004s2021 ||||||||||||||||| ||eng d
020 $a 9798494453877
035 $a (MiAaPQ)AAI28812960
035 $a (MiAaPQ)STANFORDhs511rc0790
035 $a AAI28812960
040 $a MiAaPQ $c MiAaPQ
100 1 $a Momenisedei, Ahmadreza. $3 3686115
245 1 0 $a Adaptivity and Efficiency in the Design of Sequential Experiments.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 176 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-06, Section: B.
500 $a Advisor: Gur, Yonatan;Boyd, Stephen;Lall, Sanjay;Spiess, Jann;Wager, Stefan.
502 $a Thesis (Ph.D.)--Stanford University, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a Sequential experiments that take place in many practical settings are often characterized by an exploration-exploitation tradeoff that is captured by the multi-armed bandit framework. This framework has been studied extensively in the literature and applied in various application domains including revenue management, service operations, online retail, recommender systems, and healthcare. We study fundamental problems where there are uncertainties about key characteristics governing the sequential experimentation environment, such as the information acquisition process, the payoff structure, and the feedback structure. Furthermore, we design and analyze near-optimal dynamic optimization policies that are adaptive with respect to these characteristics.In many practical settings additional data may become available between decision epochs of a sequential experiments. In Chapter 2, we study the performance gain one may achieve when leveraging such auxiliary data, and the design of algorithm that effectively do so without prior information on the underlying information arrival process. We introduce a generalized formulation, which considers a broad class of distributions that are informative about rewards from actions, and allows observations from these distributions to arrive according to an arbitrary and a priori unknown process. When it is known how to map auxiliary data to reward estimates, we obtain matching bounds that characterize the best achievable performance as a function of the information arrival process. In terms of achieving optimal performance, we establish that upper confidence bound and Thompson sampling policies possess natural robustness with respect to the information arrival process, which uncovers a novel property of these popular algorithms and further lends credence to their appeal. In Chapter 3, we consider the challenging setting where the mappings connecting auxiliary data and rewards are a unknown, characterize a necessary and sufficient condition under which auxiliary information allows performance improvement, and devise a new policy that is near-optimal in that setting. We use data from a large media site to analyze the value that may be captured by leveraging auxiliary data for designing content recommendations.In Chapter 4, we study a non-parametric multi-armed bandit problem with stochastic covariates, where a key complexity driver is the smoothness of payoff functions with respect to covariates. Previous studies have focused on deriving minimax-optimal algorithms in cases where it is a priori known how smooth the payoff functions are. In practice, however, the smoothness of payoff functions is typically not known in advance, and misspecification of smoothness may severely deteriorate the performance of existing methods. In this work, we consider a framework where the smoothness of payoff functions is not known, and study when and how algorithms may adapt to unknown smoothness. First, we establish that designing algorithms that adapt to unknown smoothness of payoff functions is, in general, impossible. However, under a self-similarity condition (which does not reduce the minimax complexity of the dynamic optimization problem at hand), we establish that adapting to unknown smoothness is possible, and further devise a general policy for achieving smoothness-adaptive performance. Our policy infers the smoothness of payoffs throughout the decision-making process, while leveraging the structure of non-adaptive off-the-shelf policies. We establish that for problem settings with either differentiable or non-differentiable payoff functions this policy matches (up to a logarithmic scale) the regret rate that is achievable when the smoothness of payoffs is known a priori.
590 $a School code: 0212.
650 4 $a Numerical analysis. $3 517751
650 4 $a Headphones. $3 3686116
650 4 $a Recommender systems. $3 3562220
650 4 $a Information science. $3 554358
650 4 $a Mathematics. $3 515831
690 $a 0723
690 $a 0405
710 2 $a Stanford University. $3 754827
773 0 $t Dissertations Abstracts International $g 83-06B.
790 $a 0212
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28812960