# [UIT 1] Regret bound for mixture method

Summary: Derivation of upper bound on the regret for the mixture method (KT scheme) for individual sequence prediction.

Suppose $\mathcal{X} = \{1, \ldots, m\}$ for some integer $m \geq 2$ and let $\Theta = \Delta_m$ denote the $m-1$ dimensional probability simplex. We will use $p_\theta$ on $\mathcal{X}$ to denote the distribution parametrized by any $\theta \in \Theta$.

Consider the following game.

For $t=1,2, \ldots, n$:

• Player predicts a distribution $q_n(\cdot|x^{t-1})$ on $\mathcal{X}$
• Nature reveals an arbitrary element $x_t \in \mathcal{X}$
• Player incurs a loss of $-\log(q_n(x_t|x^{t-1}))$.

The goal of the player is to select a distribution $q_n$ whose cumulative loss is small when compared to the best distribution (with hindsight) among the family $\{p_{\theta}^n: \theta \in \Theta\}$. More formally, the goal is to sequentially select a distribution $q_n$ on $\mathcal{X}^n$ with small worst-case regret: \begin{aligned} r_n(q_n) &= \max_{x^n \in \mathcal{X}^n} \;\max_{\theta \in \Theta} \; \log \left( \frac{\log(p_{\theta}^n(x^n))}{\log(q_n(x^n))} \right ) \\ & \ \\ & = \max_{x^n \in \mathcal{X}^n} \; \log \left( \frac{\log(p_{\hat{\theta}}^n(x^n))}{\log(q_n(x^n))} \right ), \end{aligned}

where $\hat{\theta}$ is the maximum likelihood estimator and $x^n = (x_1, \ldots, x_n)$.

The optimum worst case regret or the minimax regret can then be defined as $$r_n = \min_{q_n} \;\max_{x^n \in \mathcal{X}^n} \; \log \left( \frac{\log(p_{\hat{\theta}}^n(x^n))}{\log(q_n(x^n))} \right ).$$

#### Mixture Method

In this note, we will analyze the regret of the mixture distribution that assigns the probability $\int_{\Theta} p_{\theta}^n(x^n) w(\theta) d\theta$ to $x^n$ according to some prior distribution $w$ on $\Theta$. In particular, we will consider the mixture distribution with $\texttt{Dirichlet}(1/2, \ldots, 1/2)$ prior, and denote it by $m_J(\cdot)$. Here the subscript indicates that $w(\theta)$ is the Jeffrey’s prior in this case.

With this choice of prior, the mixture distribution as well as the conditional distributions can be written in closed form. For any $t \geq 1$, and $x^t \in \mathcal{X}^t$, suppose $T_{i,t} = \sum_{j=1}^t 1_{x_j=i}$ is the number of occurrences of $i$ in $x^t$. Then we can write $m_{J}(x^n)$ as follows: \begin{aligned} &m_J(x^n) =\frac{D_m \left( T_{1,n} +1/2, \ldots, T_{m,n} + 1/2 \right) }{ D_m\left(1/2, \ldots, 1/2 \right)}, \qquad \text{where} \\ & \ \\ &D_m(\alpha_1, \ldots, \alpha_m) = \prod_{i=1}^m \Gamma(\alpha_i)/\Gamma(\sum_{i'=1}^m \alpha_{i’}), \end{aligned} and $\Gamma(\cdot)$ is the Gamma function. This gives the following closed form expression for the conditional distribution $$m_{J}(x_{t+1} = i|x^{t}) = \frac{T_{i,t} + 1/2}{t+m/2},$$ which can be used to sequentially implement $m_J(x^n)$ using the identity $m_J(x^n) = \prod_{t=1}^n m_J(x_t|x^{t-1})$.

We can now state the main result, which gives us a bound on the regret of $m_J$ for any sequence $x^n$.

Theorem: Suppose $C_m = \log (D_m(1/2, \ldots, 1/2))$, and $T_{\min} = \min_{i} T_{i,n}$. Then, for any $x^n \in \mathcal{X}^n$, we have the following bound: $$\log \left( \frac{\hat{p}_{\theta}^n(x^n)}{m_J(x^n)} \right) \leq \frac{m-1}{2} \log \left( \frac{n}{2\pi} \right) + C_m + \frac{m}{4 T_{\min} + 2} + o(1).$$

In particular, the above bound implies that if the relative frequencies of the terms in $x^n$ lies strictly inside the simplex (i.e., $\lim_{n \to \infty} T_{\min} = \infty$), then the regret incurred by $m_J$ is $(m-1)\log(n/2\pi)/2 + C_m + o(1)$. As proved in Theorem 2 of Xie and Barron (2000), this is equal to the minimax regret ($r_n$) asymptotically.

Before proceeding to the proof, we first discuss the implication of the above result to the problem of betting in horse races1.

### Connections to betting

Consider $n$ rounds of horse races, each with $m$ horses numbered $\{1, \ldots, m \}$. Let $\mathcal{X}^n = (X_1, \ldots, X_n)$ denote the indices of the winning horses in these $n$ rounds. Suppose a gambler begins with an initial wealth of $1$ dollar, and bets $q(i |x^{t-1})$ proportion of his wealth on horse $i$ to win race $t$ with odds2 $O(i | x^{t-1})$-for-$1$. Then his total wealth after $n$ races is $$S_n(q, X^n) = \prod_{t=1}^n q(X_t | X^{t-1}) O(X_t|X^{t-1}) = q(X^n)O(X^n).$$

Now, if the winning indices $X_1, \ldots, X_n$ were generated i.i.d. according to distribution $p_{\theta}$, then the betting strategy that maximizes the expected growth-rate of the wealth process is $q^* = p_{\theta}$. This is known as Kelly betting or proportional betting strategy. \begin{aligned} q^* & \in \text{argmax}_{q} \; \mathbb{E}_{p_{\theta}} \left[ \log \left( O(X) q(X) \right) \right] \\ & \ \\ & = \text{argmax}_{q} \; \mathbb{E}_{p_{\theta}} \left[ \log\left( \frac{q(X)}{p_{\theta}(X)} \right) + \log \left( p(X) O(X) \right) \right] \\ & \ \\ & = \text{argmax}_{q} \; \mathbb{E}_{p_{\theta}} \left[ \log \left( p(X) O(X) \right) \right] - D_{KL}(p_{\theta}//q) \\ & \ \\ & = \mathbb{E}_{p_{\theta}} \left[ \log \left( p(X) O(X) \right) \right] - \text{argmin}_{q} \; D_{KL}(p_{\theta}//q), \end{aligned} which implies that $q^*=p_{\theta}$ due to non-negativity of KL-divergence.

Now, suppose we don’t make any probabilistic assumptions on the process generating $X^n$. We can still define the best constant betting strategy (in hindsight) for a sequence $x^n$ as the one which maximizes the wealth: $$\max_{\theta} p_{\theta}^n(x^n) O(x^n) = p_{\hat{\theta}}^n(x^n) O(x^n).$$

If the gambler follows some strategy $q_n$ for betting, then the ratio the wealth gained by the two policies (i.e., $p_{\hat{\theta}}^n$ and $q_n$) is given by $$\frac{S_n(p_{\hat{\theta}}^n, x^n)}{S_n(q_n, x^n)} = \frac{p_{\hat{\theta}}^n(x^n)}{q_n(x^n)}$$ independent of the odds offered. Hence, the problem of finding the distribution $q_n$ that maximizes this ratio in the worst-case (i.e., for the worst sequence $x^n \in \mathcal{X}^n$) reduces to the sequential prediction problem considered above. In particular, the regret bound for $m_J$ stated earlier implies the following:

$$\frac{S_n(p_{\hat{\theta}}^n, x^n)}{S_n(q_n, x^n)} \leq \frac{m-1}{2} \log(n/2\pi) + C_m + o(1),$$ assuming that the relative frequencies of $x^n$ lies in the interior of the simplex. This further gives us the following bound $$S_n(m_J, x^n) \geq S_n \left( \hat{p}_{\theta}^n, x^n \right) \left( \frac{n}{2\pi} \right)^{(m-1)/2}. e^{-C_m - o(1)}.$$

### Proof of the Regret Bound

Throughout this section, we will use $T_i$ instead of $T_{i,n}$. To prove the result, we first note the following:

\begin{aligned} \log \left( \frac{p_{\hat{\theta}}(x^n)}{m_J(x^n)} \right) & = \log( p_{\hat{\theta}}(x^n) ) + C_m - \log \left( D_m\left(T_1+\frac{1}{2}, \ldots, T_m+\frac{1}{2}\right) \right) \qquad (1) \end{aligned}

Since $\hat{\theta} = \left( (T_1/n), \ldots, (T_m/n) \right)$, we have $$\log(p_{\hat{\theta}}^n(x^n)) = \sum_{i=1}^m T_i \log(T_i/n) = -n \log n + \sum_{i}T_i \log(T_i). \qquad (2)$$

For the remaining term, we use Stirling’s approximation $\Gamma(x) = x^{x-1/2}e^{-x} \sqrt{2\pi} e^{s_x}$ with $s_x \in (0, 1/(12x))$. Plugging this into the expression for $A = D_m(T_1+1/2, \ldots, T_m+1/2)$, we get

\begin{aligned} A &= \frac{ \prod_{i=1}^m \Gamma(T_i + 1/2) }{\Gamma(n+m/2)} \\ &= \frac{ \prod_{i} \left( (T_i+1/2)^{T_i} e^{-T_i} \sqrt{2\pi} e^{-s_i} \right) }{ (n+m/2)^{n+(m-1)/2} e^{-n-m/2} \sqrt{2\pi}e^{-s_n} }, \end{aligned} where $s_i \in (0, 1/12T_i)$ and $s_n \in (0, 1/(12(n+m/2)))$. Simplifying further, we get \begin{aligned} A &= \frac{ \left( \prod_{i} (T_i+1/2)^{T_i} \right) (2\pi)^{(m-1)/2} }{(n+m/2)^{n+(m-1)/2}e^{-m/2} e^{s_n - \sum_{i}s_i} } \end{aligned} which implies

\begin{aligned} \log(1/A) &= \left( n + \frac{m-1}{2} \right) \log(n+m/2) - \frac{m-1}{2} \log(2\pi) \\ &\ \qquad + (s_n - \sum_i s_i) - \sum_i T_i \log(T_i + 1/2). \qquad (3) \end{aligned}

Plugging equations (2) and (3) into (1), we get the following:

\begin{aligned} \log \left( \frac{p_{\hat{\theta}}^n(x^n)}{m_J(x^n)} \right) &= \frac{m-1}{2} \log(n/2\pi) + C_m + \texttt{Rem}_n, \end{aligned}

where the remainder terms is defined as

\begin{aligned} \texttt{Rem}_n &= -\sum_{i=1}^{m} T_i \log \left( 1 + \frac{1}{2T_i} \right) + n \log(1+\frac{m}{2n}) \\ & \quad + \frac{m-1}{2} \log \left( 1 + \frac{m}{2n} \right) + (s_n - \sum_{i=1}^m s_i). \qquad (4) \end{aligned}

To complete the proof, we will obtain upper bounds on the terms in the RHS of (4). First we observe that $$s_n - \sum_{i=1}^m s_i < s_n < \frac{1}{12(n+m/2)} < \frac{1}{12n + 6}.$$

Next, we use the following inequality for any $x>0$ $$\frac{1}{2} - \frac{1}{4(x+1/2)} \leq \quad x \log \left(1 + \frac{1}{2x} \right) \quad \leq \frac{1}{2}$$ to bound the remaining terms in (4) as follows:

\begin{aligned} n \log\left( 1 + \frac{m}{2n} \right) & \leq \frac{m}{2} \\ \frac{m-1}{2} \log\left( 1 + \frac{m}{2n} \right) & \leq \frac{m(m-1)}{4n} \leq \frac{m^2}{4n} \\ -\sum_{i=1}^{m} T_i \log \left( 1 + \frac{1}{2T_i}\right) & \leq -\frac{m}{2} + \frac{m}{4(T_{\min} + 1/2)} \end{aligned}

Combining the inequalities in the above display, we get \begin{aligned} \texttt{Rem}_n &\leq \frac{1}{12n + 6} + \frac{m^2}{4n} + \frac{m}{4T_{\min} + 2} \\ & = \frac{m}{4T_{\min} + 2} + o(1). \end{aligned} This completes the proof.

1. here horse races model betting games where in every round only one of $m$ possible outcomes occur (i.e. only one out of $m$ horses wins the race). ↩︎

2. Odds of $a$-for-$1$ on horse $i$ means the following: the gambler pays $1$ dollar before the race to bet on horse $i$; if horse $i$ wins, he gets back $a$ dollars while if horse $i$ loses, he receives nothing. ↩︎