more work

2022-09-11 01:27:09 -07:00
parent 838f9e05f3
commit 0d08907a15
8 changed files with 399 additions and 114 deletions
--- a/R/ch2.qmd
+++ b/R/ch2.qmd
@@ -339,7 +339,7 @@ in the book that we will learn how to build these later on):
 |--------|----|----|----|-------|
 |$f(\pi)$|.10 |.25 |.65 | 1 |

-:::{.callout-caution}
+:::{.callout-tip}
 ## Note

 its important to note here that the sum of the values of $\pi$ **do
@@ -364,14 +364,15 @@ and has the following properties
 :::


-:::{.callout-caution}
+:::{.callout-tip}
 ## in emanuel's words
 what does this mean? well its very straightforward a pmf is a function that takes
 in a some value y and outputs the probability that the random variable 
 $Y$ equals $y$. 
 :::

-### The Binomial Model
+next we would like add a the dependancy of $Y$ on $\pi$, we do so by introducing
+the conditional pmf.

 :::{.callout-note}
 ## Conditional probability model of data $Y$
@@ -388,8 +389,158 @@ and has the following properties,
 2. $\sum_{\forall y}f(y|\pi) = 1$
 :::

-:::{.callout-caution}
+:::{.callout-tip}
 ## in emanuel's words
 this is essentially the same probability model had defined above, except
 now we are condition probabilities by some parameter $\pi$
-:::
+:::
+
+in the example of the chess player we must make some assumptions:
+
+1. the chances of winning any match in the game stay constant. So if
+at match number 1 human has a .65% of winning, then that is the same
+for match 2-6.
+
+2. Winning or loosing a game does not affect the chances of winning 
+or loosing the next game, i.e matches are independent of one another. 
+
+These two assumptions lead us to the **Binomial Model**.
+
+:::{.callout-note}
+## The Binomial Model
+
+Let the random variable $Y$ represent the number of successes in $n$ trials.
+Assume that each trial is independent, and the probability of sucess in a 
+given trial is $\pi$. Then the conditional dependence of $Y$ on $\pi$ can
+be modeled by the **Binomial Model** with parameters $n$ and $\pi$. We can
+write this as, 
+
+$$Y|\pi \sim Bin(n, \pi)$$
+
+the binomial model is specified by the pmf:
+
+$$f(y|\pi) = {n \choose y} \pi^y(1 - \pi)^{n-y}$$
+:::
+
+knowing this we can represent $Y$ the total number of matches out of 6
+that the human can win. 
+
+$$Y|\pi \sim Bin(6, \pi)$$
+
+and conditional pmf:
+
+$$f(y|\pi) = {6 \choose y}\pi^y(1 - \pi)^{6 - y}\;\; \text{for } y \in \{1, 2, 3, 4, 5, 6\}$$
+
+with the pmf we can now determine the probability of the human winning $Y$ matches 
+out of 6 for any given value of $\pi$
+
+```{r}
+chess_pmf <- function(y, p, n = 6) {
+    choose(n, y) * (p ^ y) * (1 - p)^(n - y)
+}
+
+# what is probability that human wins 6 games given a pi value of .8 
+chess_pmf(y = 5, p = .8)
+
+```
+
+:::{.callout-tip}
+##
+
+the formula for the binomial is actually pretty intuitive, first you have
+the scalar ${n \choose y}$ this will determine the total number of ways 
+the player can win $y$ games out of the possible $n$. This is first multiplied
+by the probablility of success in the $n$ trials since $(p ^ y)$ can be 
+re-written as $p\times p\times \cdots \times p$, and then multiplied by 
+the probability of $n-y$ failures $(1 - p)^{n - y}$
+:::
+
+```{r}
+pies <- seq(0, 1, by = .05)
+py <- chess_pmf(y = 4, p = pies)
+
+d <- data.frame(pies = pies, py = py)
+
+d |>
+    ggplot(aes(pies, py)) + geom_col()
+```
+
+
+```{r}
+pies <- c(.2, .5, .8)
+ys <- 0:6
+
+d <- tidyr::expand_grid(pies, ys)
+fys <- purrr::map2_dbl(d$ys, d$pies, ~chess_pmf(.x, .y), n=6)
+
+d$fys <- fys
+d$display_pi <- as.factor(paste("pi =", d$pies))
+
+d |>
+    ggplot(aes(x = ys, y = fys)) + 
+    geom_col() + 
+    scale_x_continuous(breaks = 0:6) + 
+    facet_wrap(vars(display_pi))
+```
+
+The plot shows the three possible values for $\pi$ along 
+with the value of the pmf for each of the possible 
+matches the human can win in a game. The values of $f(y|\pi)$ 
+are pretty intuitive, we would expect the random variable $Y$
+to be lower when the value of $\pi$ is lower and higher when 
+the value of $\pi$ is higher.
+
+For the sake of the excercise lets add more values of $\pi$ 
+so that we can see this shift happen in more detail. 
+
+```{r}
+pies <- seq(.1, .9, by = .1)
+ys <- 0:6
+
+d <- tidyr::expand_grid(pies, ys)
+fys <- purrr::map2_dbl(d$ys, d$pies, ~chess_pmf(.x, .y), n=6)
+
+d$fys <- fys
+d$display_pi <- as.factor(paste("pi =", d$pies))
+
+d |>
+    ggplot(aes(x = ys, y = fys)) + 
+    geom_col() + 
+    scale_x_continuous(breaks = 0:6) + 
+    facet_wrap(vars(display_pi), nrow = 3)
+```
+
+as it turns out we learn that the human ended up winning just
+one game in the 1997 rematch, $Y = 1$. The next step in our 
+analysis is to determine how compatible this new data is with 
+each value of $\pi$, the likelihood that is. 
+
+This is very easy to do with all the work we have done so far:
+
+```{r}
+d |>
+    filter(ys == 1) |>
+    ggplot(aes(pies, fys)) + 
+    geom_col() + 
+    scale_x_continuous(breaks = seq(.1, .9, by = .1))
+```
+
+It's very important to note the following 
+
+```{r}
+# this will sum to a value greater than 1!!
+d |>
+    filter(ys == 1) |>
+    pull(fys) |>
+    sum()
+```
+
+:::{.callout-important icon="true"}
+this has been mentioned before but its an important message
+to drive home. Note that the reason why thes values sum to a 
+value greater than 1 is that they are **not** probabilities, they 
+are likelihoods. We are determining how likely each value of 
+$\pi$ is given that we have observed $Y = 1$.
+:::
+
+