bayes-rules-notes/R/chapter-3-beta-binomial/ch3-beta-binomial.qmd

---
title: "Chapter 3 Beta-Binomial Bayesian Model Notes"
author: "Emanuel Rodriguez"
execute:
    message: false
    warning: false
format:
    html:
        monofont: "Cascadia Mono"
        highlight-style: gruvbox-dark
        css: styles.css
        callout-icon: false
        callout-apperance: simple
        toc: false
        html-math-method: katex
---

```{r}
library(bayesrules)
library(tidyverse)
library(patchwork)
```

The chapter is set up with an example of polling results. We are put into
the scenario where we are managig the campaing for a candidate. We know
that on average her support based on recent polls is around 45%. In the
next few sections we'll work through our Bayesian framework and incorporate
a new tool the **Beta-Binomial** model. This model will take develop a
continuous prior, as opposed to the discrete one's we've been working with
so far.


## The Beta prior

:::{.callout-note}
## Probability Density Function

Let $\pi$ be a continuous random variable with probability density
function (pdf) $f(\pi)$. Then $f(\pi)$ has the following properties:

1. $f(\pi) \geq 0$
2. $\int_{\pi}f(\pi)d\pi = 1$ (this is analogous to $\sum$ in the case of pmfs)
3. $P(a < \pi < b) = \int_a^bf(\pi)d\pi$ when $a\leq b$
:::

:::{.callout-tip icon="true"}
a quick note on (1) above. Note that it does not place a restriction on
$f(\pi)$ being less than 1. This means that we can't interpret values of
$f$ as probabilities, we can however use to interpret plausability of
two different events, the greater the value of $f$ the more plausible.
To calculate probabilities using $f$ we must determine the area under the
curve it defines, as shown in (3).
:::

```{r}

x <- seq(0, 1, by = .05)
y1 <- dbeta(x=x, 5, 5)
y2 <- dbeta(x=x, 5, 1)
y3 <- dbeta(x=x, 1, 5)

d <- tibble(
    x,
    `beta(5, 1)`=y2,
    `beta(5, 5)`=y1,
    `beta(1, 5)`=y3
) |>
    pivot_longer(names_to = "beta_shape", values_to="beta",
    -x) |>
    mutate(beta_shape=factor(beta_shape,
                            levels=c("beta(5, 1)",
                                    "beta(5, 5)",
                                    "beta(1, 5)")))
```

```{r}
#| label: fig-beta-shapes
#| fig-cap: The basic shapes of beta based on the hyperparameters

ggplot(data=d, aes(x, beta)) + geom_point() +
    geom_line() +
    facet_wrap(vars(beta_shape))
```

In general the shape of the beta distribution is skewed-left when
$\alpha > \beta$, symmetrical when $\alpha = \beta$ and skewed-right
when $\alpha < \beta$, see @fig-beta-shapes.

:::{.callout-note}
## The Standard Uniform

When $\pi$ can take equally take on any value between 0 and 1,
we can model $\pi$ using the standard uniform model.

$$\pi \sim Unif(0, 1)$$

the pdf of $Unif(0, 1)$ is $f(\pi) = 1$

Note that $Unif(0, 1)$ is just a special case of the Beta with
hyperparameters $\alpha = \beta = 1$, see @fig-std-unif-as-beta
:::

```{r}
#| label: fig-std-unif-as-beta
#| fig-cap: "The standard uniform is a special case of the beta distrubtion with a = b = 1"
std_unif <- tibble(
    x, `beta(1, 1)`=dbeta(x, 1, 1)
)

ggplot(data=std_unif, aes(x, `beta(1, 1)`)) +
    geom_point() +
    geom_line()
```

### Mean and Mode of the Beta

The mean and mode are both measures of centrality. The mean is average
value the mode is the most "common", in the case of pmf this is just
the value that occurs the most in the pdf its the max value.

The formulations of these for the beta are:

$$E(\pi) = \frac{\alpha}{\alpha + \beta}$$
$$\text{Mode}(\pi) = \frac{\alpha - 1}{\alpha + \beta -2}\;\; \text{when} \;\;\alpha,\beta > 1$$


When can also measure the variability of $\pi$. Take @fig-beta-vars
we can see the variability of $\pi$ differ based on the values
$\alpha, \beta$.

```{r}
#| label: fig-beta-vars
#| fig-cap: "Two symmetrical shapes of beta with different variance"
beta_variances <- tibble(
    x,
    `beta(5, 5)`=dbeta(x, 5, 5),
    `beta(20, 20)`=dbeta(x, 20, 20)
) |>
    pivot_longer(names_to = "beta_shape", values_to = "beta", -x)

ggplot(data=beta_variances, aes(x, beta)) +
    geom_point() +
    geom_line() +
    facet_wrap(vars(beta_shape))
```

We can formulate the variance of $Beta(\alpha, \beta)$ with

$$Var(\pi) = \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha+\beta+1)}$$

it follows that

$$SD = \sqrt{Var(\pi)}$$


### Tuning the beta prior

Now that we know we can use the beta model to represent
$\pi$ for $\pi \in [0, 1]$, what we need to try and do is
tune the model so that it best represents our prior. Our
options of course are chaning values of $\alpha$ and $\beta$.
We can make use of the fact that we know that on average
the candidate is polling at around 45%. That is we know
that

$$E(\pi) \approx .45 \approx \frac{\alpha}{\alpha + \beta}$$

*so I tried to work through the algebra myself, but was
unable to do so, the book gets to the following conclusion*

$$\alpha = \frac{9}{11}\beta$$


```{r}
p <- plot_beta(9, 11) +
    labs(title="Beta(9,11)") +
    scale_x_continuous(breaks=seq(0, 1, by=.1))
p2 <- plot_beta(27, 33) +
    labs(title="Beta(27,33)") +
    scale_x_continuous(breaks=seq(0, 1, by=.1))
p3 <- plot_beta(45, 55) +
    labs(title="Beta(45,55)") +
    scale_x_continuous(breaks=seq(0, 1, by=.1))

p / p2 / p3
```

when compared to the original polling values, we can see that the closest
of these is $Beta(45, 55)$

we can get the pdf, and then mean, mode and variance of the pdf.

$$f(\pi) = \frac{\Gamma(100)}{\Gamma(45)\Gamma(55)}\pi^{44}(1-\pi)^{54}$$

$$E(\pi) = \frac{45}{45 + 55} = .4500$$
$$\text{Mode}(\pi) = \frac{45 - 1}{45 + 55 - 2} = .449$$
$$ SD(\pi) = .05$$


```{r}
#| label: fig-betas-for-poll
#| fig-caption: "the prior distrubtions with Y = 30 hihglighted"

rb1 <- rbinom(10000, 50, .2)
rb2 <- rbinom(10000, 50, .3)
rb3 <- rbinom(10000, 50, .5)
rb4 <- rbinom(10000, 50, .6)
rb5 <- rbinom(10000, 50, .7)
rb6 <- rbinom(10000, 50, .8)

results <- tibble(
    `Bin(50, .2)` = rb1,
    `Bin(50, .3)` = rb2,
    `Bin(50, .5)` = rb3,
    `Bin(50, .6)` = rb4,
    `Bin(50, .7)` = rb5,
    `Bin(50, .8)` = rb6,
) |>
    pivot_longer(names_to = "bin_model", values_to = "value",
    `Bin(50, .2)`:`Bin(50, .8)`) |>
    mutate(is_30 = value == 30)

results |>
    ggplot(aes(value, fill=is_30)) + geom_bar(stat = "count") +
    scale_fill_manual(values = c("grey30", "darkblue"), guide="none") +
    facet_wrap(vars(bin_model))

```

After the finish the poll, we see that the actual value of
$Y$ is 30. The vertical line in @fig-betas-for-poll


```{r}
results <- tibble(
    ys = 1:50,
    `beta(50, .1)` = dbinom(x = ys, size = 50, prob = .1),
    `beta(50, .2)` = dbinom(x = ys, size = 50, prob = .2),
    `beta(50, .3)` = dbinom(x = ys, size = 50, prob = .3),
    `beta(50, .4)` = dbinom(x = ys, size = 50, prob = .4),
    `beta(50, .5)` = dbinom(x = ys, size = 50, prob = .5),
    `beta(50, .6)` = dbinom(x = ys, size = 50, prob = .6),
    `beta(50, .7)` = dbinom(x = ys, size = 50, prob = .7),
    `beta(50, .8)` = dbinom(x = ys, size = 50, prob = .8),
    `beta(50, .9)` = dbinom(x = ys, size = 50, prob = .9),
) |>
    pivot_longer(names_to = "beta_model", values_to="value", -ys) |>
    mutate(
        is_30 = ys == 30,
        pie = rep(c(.1, .2, .3, .4, .5, .6, .7, .8, .9), 50))

results |> ggplot(aes(ys, value, fill = is_30)) +
    geom_col() +
    scale_fill_manual(values = c("darkgray", "darkblue"), guide="none") +
    facet_wrap(vars(beta_model))
```


```{r}
results |>
    filter(is_30) |>
    ggplot(aes(pie, value)) + geom_col() +
    scale_x_continuous(breaks = seq(.1, .9, by = .1))
```


```{r}
summarize_beta_binomial(alpha=45, beta=55, n=50, y=30)
```

## Simulation

In this simulation we first generate 10,000 random values from
a beta distribution with parameter $\alpha = 45, \beta = 55$. Recall,
that this is what we believe to be support based on recent polls for the
candidate, and therefore our prior. We
use each one of these as the `prob` value (the $\pi$ value) into
10,000 more random samples from `rbinom` with
$n = 50$. This binomial distribution is how we model the likelihood. Our
simualtion will

```{r}
sim <- tibble(
    pi = rbeta(10000, 45, 55)
) |>
    mutate(y = rbinom(10000, size = 50, prob = pi))
```


```{r}
sim |>
    ggplot(aes(pi, y)) + geom_point(aes(color = (y == 30))) +
    scale_color_manual(values = c("darkgrey", "navyblue"))
```


```{r}
sim |>
    filter(y == 30) |> # the posterior
    ggplot(aes(pi)) + geom_density()
```