Summary: Derivation of upper bound on the regret for the mixture method (KT scheme) for individual sequence prediction over finite alphabets.
Suppose X={1,…,m} for some integer m≥2 and let Θ=Δm denote the m−1 dimensional probability simplex. We will use pθ on X to denote the distribution parametrized by any θ∈Θ.
Consider the following game.
For t=1,2,…,n:
Player predicts a distribution qn(⋅∣xt−1) on X
Nature reveals an arbitrary element xt∈X
Player incurs a loss of −log(qn(xt∣xt−1)).
The goal of the player is to select a distribution qn whose cumulative loss is small when compared to the best distribution (with hindsight) among the family {pθn:θ∈Θ}. More formally, the goal is to sequentially select a distribution qn on Xn with small worst-case regret: rn(qn)=xn∈Xnmaxθ∈Θmaxlog(log(qn(xn))log(pθn(xn)))=xn∈Xnmaxlog(log(qn(xn))log(pθ^n(xn))),
where θ^ is the maximum likelihood estimator and xn=(x1,…,xn).
The optimum worst case regret or the minimax regret can then be defined as rn=qnminxn∈Xnmaxlog(log(qn(xn))log(pθ^n(xn))).
Mixture Method
In this note, we will analyze the regret of the mixture distribution that assigns the probability ∫Θpθn(xn)w(θ)dθ to xn according to some prior distribution w on Θ. In particular, we will consider the mixture distribution with Dirichlet(1/2,…,1/2) prior, and denote it by mJ(⋅). Here the subscript indicates that w(θ) is the Jeffreys prior in this case.
With this choice of prior, the mixture distribution as well as the conditional distributions can be written in closed form. For any t≥1, and xt∈Xt, suppose Ti,t=∑j=1t1xj=i is the number of occurrences of i in xt. Then we can write mJ(xn) as follows: mJ(xn)=Dm(1/2,…,1/2)Dm(T1,n+1/2,…,Tm,n+1/2),whereDm(α1,…,αm)=i=1∏mΓ(αi)/Γ(i′=1∑mαi′), and Γ(⋅) is the Gamma function. This gives the following closed form expression for the conditional distribution mJ(xt+1=i∣xt)=t+m/2Ti,t+1/2, which can be used to sequentially implement mJ(xn) using the identity mJ(xn)=∏t=1nmJ(xt∣xt−1).
We can now state the main result, which gives us a bound on the regret of mJ for any sequence xn.
Theorem: Suppose Cm=log(Dm(1/2,…,1/2)), and Tmin=miniTi,n. Then, for any xn∈Xn, we have the following bound: log(mJ(xn)p^θn(xn))≤2m−1log(2πn)+Cm+4Tmin+2m+o(1).
In particular, the above bound implies that if the relative frequencies of the terms in xn lies strictly inside the simplex (i.e., limn→∞Tmin=∞), then the regret incurred by mJ is (m−1)log(n/2π)/2+Cm+o(1). As proved in Theorem 2 of Xie and Barron (2000), this is equal to the minimax regret (rn) asymptotically.
Before proceeding to the proof, we first discuss the implication of the above result to the problem of betting in horse raceshere horse races model betting games where in every round only one of m possible outcomes occur (i.e. only one out of m horses wins the race).
.
Connections to betting
Consider n rounds of horse races, each with m horses numbered 1,…,m. Let Xn=(X1,…,Xn) denote the indices of the winning horses in these n rounds. Suppose a gambler begins with an initial wealth of 1 dollar, and bets q(i∣xt−1) proportion of his wealth on horse i to win race t with oddsOdds of a-for-1 on horse i means the following: the gambler pays 1 dollar before the race to bet on horse i; if horse i wins, he gets back a dollars while if horse i loses, he receives nothing.
O(i∣xt−1)-for-1. Then his total wealth after n races is Sn(q,Xn)=t=1∏nq(Xt∣Xt−1)O(Xt∣Xt−1)=q(Xn)O(Xn).
Now, if the winning indices X1,…,Xn were generated i.i.d. according to distribution pθ, then the betting strategy that maximizes the expected growth-rate of the wealth process is q∗=pθ. This is known as Kelly betting or proportional betting strategy. q∗∈argmaxqEpθ[log(O(X)q(X))]=argmaxqEpθ[log(pθ(X)q(X))+log(p(X)O(X))]=argmaxqEpθ[log(p(X)O(X))]−DKL(pθ//q)=Epθ[log(p(X)O(X))]−argminqDKL(pθ//q), which implies that q∗=pθ due to non-negativity of KL-divergence.
Now, suppose we don’t make any probabilistic assumptions on the process generating Xn. We can still define the best constant betting strategy (in hindsight) for a sequence xn as the one which maximizes the wealth: θmaxpθn(xn)O(xn)=pθ^n(xn)O(xn).
If the gambler follows some strategy qn for betting, then the ratio the wealth gained by the two policies (i.e., pθ^n and qn) is given by Sn(qn,xn)Sn(pθ^n,xn)=qn(xn)pθ^n(xn) independent of the odds offered. Hence, the problem of finding the distribution qn that maximizes this ratio in the worst-case (i.e., for the worst sequence xn∈Xn) reduces to the sequential prediction problem considered above. In particular, the regret bound for mJ stated earlier implies the following:
Sn(qn,xn)Sn(pθ^n,xn)≤2m−1log(n/2π)+Cm+o(1), assuming that the relative frequencies of xn lies in the interior of the simplex. This further gives us the following bound Sn(mJ,xn)≥Sn(p^θn,xn)(2πn)(m−1)/2.e−Cm−o(1).
Proof of the Regret Bound
Throughout this section, we will use Ti instead of Ti,n. To prove the result, we first note the following:
Since θ^=((T1/n),…,(Tm/n)), we have log(pθ^n(xn))=i=1∑mTilog(Ti/n)=−nlogn+i∑Tilog(Ti).(2)
For the remaining term, we use Stirling’s approximationΓ(x)=xx−1/2e−x2πesx with sx∈(0,1/(12x)). Plugging this into the expression for A=Dm(T1+1/2,…,Tm+1/2), we get
A=Γ(n+m/2)∏i=1mΓ(Ti+1/2)=(n+m/2)n+(m−1)/2e−n−m/22πe−sn∏i((Ti+1/2)Tie−Ti2πe−si), where si∈(0,1/12Ti) and sn∈(0,1/(12(n+m/2))). Simplifying further, we get A=(n+m/2)n+(m−1)/2e−m/2esn−∑isi(∏i(Ti+1/2)Ti)(2π)(m−1)/2 which implies