## Empirical Bayes Estimation of Binomial Proportion using R

Binomial distribution pops up in our problems daily, given that the number of occurrences $k$ of events with probability $p$ in a sequence of size $n$ can be described as

$k \sim Binomial(n,p)$

Question that naturally arises in this context is - given observations of $k$ and $n$ , how do we estimate $p$ ? One might say that simply computing $p = {k \over n}$ should be enough, since that's both uniformly minimum variance and a maximum likelihood estimator. However, such estimator might be misleading when $n$ is small (as anyone trying to estimate clickthrough rates from small number of impressions can testify).

The question is - what can we do ? One thing that naturally comes to mind is incorporating any prior knowledge about the distribution we might have. A wise choice for prior of binomial distribution is usually Beta distribution, not just because of it's convenience (given that it's conjugate prior), but also because of flexibility in incorporating different distribution shapes

In this context, we can express our model as:

$k_i \sim Binomial(n_i, p_i)$

$p_i \sim Beta(a, b), i = 1...N$

where $N$ is total number of observations and $a$ and $b$ are parameters to be estimated. Such model is also called Empirical Bayes. Unlike traditional Bayes, in which we pull prior distribution and it's parameters out of the thin air, Empirical Bayes estimates prior parameters from the data.

In order to estimate parameters of the prior, we calculate marginal distribution as

$m(k|a,b) = \int \prod_{i=1}^{N} f(k_i|p)\pi(p|a,b)dp = \prod_{i=1}^N {n_i \choose k_i} {{\Gamma(a+b)\Gamma(a+k_i)\Gamma(n_i-k_i+b)}\over{\Gamma(a)\Gamma(b)\Gamma(a+b+n_i)}}$

where $f$ and $\pi$ are density functions of binomial and beta distributions, respectively. Parameter estimates $\hat{a}$ and $\hat{b}$ can be obtained by maximizing the log likelihood of the marginal distribution.

Finally, Empirical Bayes estimator can be constructed as expectation of posterior distribution:

$\hat{p_i} = E(p_i|k_i, \hat{a}, \hat{b}) = {{\hat{a}+k_i}\over{\hat{a} + \hat{b} + n_i}}$

Pretty easy. The real question is - how do we do this in practice ? It turns out that there is no off-the-shelf package in R for doing this, so we have built one. It relies pretty heavily on fitdistrplus package and there are certainly a number of things to be improved, but it's a start. You can grab it at our Github repository.