Kernelized Bandits

Performance of LP-GP-UCB~(L) and Heuristic~(H) algorithms along with some baselines (IGPUCB~(I), GP-EI~(E), GP-PI~(P)) on 8 dimensional Branin function

Kernelized bandits refers to a non-Bayesian formulation of the problem of optimizing a black-box function $f$ which can only be accessed through a noisy zero-order oracle. Here, instead of assuming that $f$ is a sample from a Gaussian Process (GP), we assume that the function $f$ has bounded norm in the RKHS associated with the positive-definite kernel $K$. Existing algorithms in this area admit an upper bound on the cumulative regret of the form $$ \mathcal{R}_n = \tilde{\mathcal{O}}\left( \gamma_n \sqrt{n} \right)$$ where $\gamma_n = \max_{S} I(y_S; f)$ is the maximum information gain associated with kernel $K$, where the maximum is over subsets $S$ of cardinality $n$.

Recent work by (Scarlett et al., 2017)1 demonstrated a large gap in the existing upper bounds on $\mathcal{R}_n$ and algorithm-independent lower bounds for am important family of kernels. Our work2 seeks to address this issue by proposing an novel algorithm for kernelized bandits with uniformly tighter regret bounds.

The key idea of our work is to augment the global GP surrogate with Local Polynomial (LP) estimators on the elements of an adaptively constructed partition, $\mathcal{P}_t$ of the input space $\mathcal{X}$.

This idea, combined with an embedding result, which imples that the elements of the RKHS associated with the Matern family of kernels can be embedded into certain Holder spaces, allows us to derive uniformly better regret bounds.

Our main contributions are:

  • We first propose an algorithm, called LP-GP-UCB, which uses LP estimators along with the global GP surrogate to construct tighter UCB for $f$ to guide the query strategy.

  • Under assumptions that $f$ has finite norm in RKHS and in a Holder space, we obtain general regret bounds for our proposed algorithm.

  • For commonly used kernels, we then obtain embedding results which show that for these kernels the Holder assumption follows from the bounded RKHS norm assumption. This allows us to specialize the general regret bounds for several important kernel families such as Squared Exponential, Matern, Rational-Quadratic and Gamma-Exponential kernels.

  • Next, we propose a computationally efficient heuristic, which employs standard regression trees to construct the non-uniform partition of the input space. Experiments with benchmark functions as well as a hyperparameter tuning task demonstrate the benefits of our proposed approach.

For more details please refer to the overview slides here and the preprint here.


  1. Scarlett, J., Bogunovic, I., & Cevher, V. (2017). Lower bounds on regret for noisy gaussian process bandit optimization. COLT. ↩︎

  2. Shekhar, S., & Javidi, T. (2020). Multi-scale Zero Order Optimization of Smooth Functions in an RKHS. preprint ↩︎

Shubhanshu Shekhar
Shubhanshu Shekhar
Ph.D. Candidate

Interested in sequential learning and optimization.