Adaptive Sampling

Active Learning in Bandits and MDPs

Optimal scheme for allocating samples to learn $K$ distributions uniformly well.