Consider the problem of learning how long to wait for a bus before walking, experimenting each day and assuming
that the bus arrival times are independent and identically distributed random variables with an unknown distribution.
Similar uncertain optimal stopping problems arise when devising power-saving strategies, e.g., learning the optimal disk spin-down time for
mobile computers, or speeding up certain types of satisficing search procedures by switching from
a potentially fast search method that is unreliable, to one that is reliable, but slower.
Formally, the problem can be described as a repeated game. In each round of the game an agent is waiting for an event to occur.
If the event occurs while the agent is waiting, the agent suffers a loss that is the sum of the event's ``arrival time'' and some
fixed loss. If the agents decides to give up waiting before the event occurs, he suffers a loss that is the sum of the waiting time and some other
fixed loss. It is assumed that the arrival times are independent random quantities with the same distribution, which is unknown, while the agent knows the
loss associated with each outcome. Two versions of the game are considered. In the full information case the agent
observes the arrival times regardless of its actions, while
in the partial information case the arrival time is observed only if it does not exceed the waiting time. After
some general structural observations about the problem,
we present a number of algorithms for both cases that learn the optimal weighting time with nearly matching
minimax upper and lower bounds on their regret.
%0 Conference Paper
%1 LaGySze14
%A Lattimore, T.
%A György, A.
%A Szepesvári, Cs.
%B ALT
%D 2014
%K bandits, censored information, learning, monitoring, observations online partial stochastic theory,
%T On Learning the Optimal Waiting Time
%X Consider the problem of learning how long to wait for a bus before walking, experimenting each day and assuming
that the bus arrival times are independent and identically distributed random variables with an unknown distribution.
Similar uncertain optimal stopping problems arise when devising power-saving strategies, e.g., learning the optimal disk spin-down time for
mobile computers, or speeding up certain types of satisficing search procedures by switching from
a potentially fast search method that is unreliable, to one that is reliable, but slower.
Formally, the problem can be described as a repeated game. In each round of the game an agent is waiting for an event to occur.
If the event occurs while the agent is waiting, the agent suffers a loss that is the sum of the event's ``arrival time'' and some
fixed loss. If the agents decides to give up waiting before the event occurs, he suffers a loss that is the sum of the waiting time and some other
fixed loss. It is assumed that the arrival times are independent random quantities with the same distribution, which is unknown, while the agent knows the
loss associated with each outcome. Two versions of the game are considered. In the full information case the agent
observes the arrival times regardless of its actions, while
in the partial information case the arrival time is observed only if it does not exceed the waiting time. After
some general structural observations about the problem,
we present a number of algorithms for both cases that learn the optimal weighting time with nearly matching
minimax upper and lower bounds on their regret.
@inproceedings{LaGySze14,
abstract = {Consider the problem of learning how long to wait for a bus before walking, experimenting each day and assuming
that the bus arrival times are independent and identically distributed random variables with an unknown distribution.
Similar uncertain optimal stopping problems arise when devising power-saving strategies, e.g., learning the optimal disk spin-down time for
mobile computers, or speeding up certain types of satisficing search procedures by switching from
a potentially fast search method that is unreliable, to one that is reliable, but slower.
Formally, the problem can be described as a repeated game. In each round of the game an agent is waiting for an event to occur.
If the event occurs while the agent is waiting, the agent suffers a loss that is the sum of the event's ``arrival time'' and some
fixed loss. If the agents decides to give up waiting before the event occurs, he suffers a loss that is the sum of the waiting time and some other
fixed loss. It is assumed that the arrival times are independent random quantities with the same distribution, which is unknown, while the agent knows the
loss associated with each outcome. Two versions of the game are considered. In the full information case the agent
observes the arrival times regardless of its actions, while
in the partial information case the arrival time is observed only if it does not exceed the waiting time. After
some general structural observations about the problem,
we present a number of algorithms for both cases that learn the optimal weighting time with nearly matching
minimax upper and lower bounds on their regret. },
added-at = {2020-03-17T03:03:01.000+0100},
author = {Lattimore, T. and Gy{\"o}rgy, A. and {Sz}epesv{\'a}ri, {Cs}.},
biburl = {https://www.bibsonomy.org/bibtex/23f260448d420503ed60ff1fb653a3a48/csaba},
booktitle = {ALT},
date-added = {2014-07-18 01:11:43 -0600},
date-modified = {2014-07-18 01:14:59 -0600},
interhash = {9fe1e50215bf95f69b656afea34f501a},
intrahash = {3f260448d420503ed60ff1fb653a3a48},
keywords = {bandits, censored information, learning, monitoring, observations online partial stochastic theory,},
pdf = {papers/bus-ALT04.pdf},
timestamp = {2020-03-17T03:03:01.000+0100},
title = {On Learning the Optimal Waiting Time},
year = 2014
}