Bayesian Optimization

Several applications, such as hyperparameter tuning, can be formulated as the problem of optimizing a noisy black-box function ($f$) that is expensive to evaluate. In the field of Bayesian Optimization (also referred to as Gaussian Process Bandits), the process of optimization is driven by utilizing a Gaussian Process surrogate model to guide the search for optimum.

Under the assumption that the unknown function $f$ is a sample from a zero mean Gaussian Process (GP) with known covariance function $K$, and given a sampling budget $n$, the goal of the agent is to design a sequential strategy to query the black-box function (noisy zeroth order oracle), in order to learn about its maximizer. The performance of an algorithm is measured by its cumulative regret $\mathcal{R}_n$, defined as $\mathcal{R}_n=\sum_{t=1}^n f(x^*) - f(x_t)$.

Existing results in literature have obtained regret bounds of the form $\mathcal{R}_n = \mathcal{O} \left ( \sqrt{n \log n \gamma_n}\right)$, where $\gamma_n$ is the maximum information that can be gained about $f$ from $n$ samples.

Improved Bounds for Bayesian GP Bandits.

Our work in this area was motivated by the following informal idea:

Since our goal is to learn about a maximizer of $f$, and not necessarily about the whole function, can we identify cases in which the $\gamma_n$ based bounds are loose.

Our main contributions were:

  • Following the above intuition, we first constructed two Gaussian Processes for which we showed that the $\gamma_n$ based bounds were very loose.

  • Next, we proposed an algorithm which constructs a non-uniform partition of the input space and focuses sampling in the near optimal regions. For this algorithm we obtained bounds on $\mathcal{R}_n$ which were tighter than the existing results. In particular, we obtained the first sublinear regret bounds for the exponential kernel, and strictly better regret bounds for Matern kernels when $D>\nu-1$, where $D$ is the dimension of the input space, and $\nu$ is the smoothness parameter of the Matern kernel.

  • We then extended our algorithm to the case of Contextual GP bandits, and obtained improvements over the results of (Krause and Ong, 2011)1.

  • Finally, we also showed that the techniques developed can also be used to propose a Bayesian version of the Zooming algorithm of (Kleinberg et al., 2008)2.

Extension to GP Level Set Estimation

In some problems the goal is not to learn about the optimizer of $f$, but instead to estimate the $\lambda$-level set of the function, i.e., the region of the input space where $f$ is above a threshold $\lambda$. In (Shekhar and Javidi, 2019)3 , we extended the techniques developed in (Shekhar and Javidi, 2018) 4 to propose a GP level set estimation algorithm with improved convergence rates and lower computational complexity than the previous known results of Gotovos et al., 2013 5. In addition, by exploiting the structured nature of the evaluation points of our proposed algorithm, we also obtained tighter bounds on the information gain of our algorithm.

References


  1. Krause, A., & Ong, C. S. (2011). Contextual gaussian process bandit optimization. Neurips ↩︎

  2. Kleinberg, R., Slivkins, A., & Upfal, E. (2008). Multi-armed bandits in metric spaces. STOC ↩︎

  3. Shekhar, S., & Javidi, T. (2019). Multi-Scale Gaussian Process Level Set Estimation. AISTATS. ↩︎

  4. Shekhar, S., & Javidi, T. (2018). Gaussian Process Bandits with Adaptive Discretization. Electronic Journal of Statistics. ↩︎

  5. Gotovos, A., Casati, N., Hitz, G., & Krause, A. (2013). Active learning for level set estimation. IJCAI. ↩︎

Shubhanshu Shekhar
Shubhanshu Shekhar
Ph.D. Candidate

Interested in sequential learning and optimization.

Related