Instance-Dependent Regret Analysis of Kernelized Bandits


We study the kernelized bandit problem, that involves designing an adpative strategy for querying a noisy zeroth-order-oracle to efficiently learn about the optimizer of an unknow function $f$ with a norm bounded by $M$ in a Reproducing kernel Hilbert space (RKHS) associated with a positive definite kernel $K$. Prior results, working in a minimax setting have characterized the worst-case (over all functions in the problem class) limts on regret achievable by any algorithm, and have constrcuted algorithms with matching worst-case performance for the Matern family of kernels. These results suffer from two drawbacks. First, the minimax lower bound gives no information about the limits of regret achievable by the commonly used algorithms on specific problem instances. Second, due to their worst-case nature, the existing upper bound analysis fails to adapt to easier problem instances within the function class. Our work takes steps to address both these issues. First, we derive instance-dependent regret lower bounds for algorithms with uniformly (over the function class) vanishing noramlized cumulative regret. Our result, valid for all the practically relevant kernelized bandit algorithms, such as, GP-UCB, GP-TS and SupKernelUCB, identifies a fundamental complexity measure associated with every problem instance. We then address the second issue, by proposing anew minimax near-optimal algorithm which also adapts to easier problem instances.

ICML (under review)