more work

This commit is contained in:
2022-09-11 01:27:09 -07:00
parent 838f9e05f3
commit 0d08907a15
8 changed files with 399 additions and 114 deletions

161
R/ch2.qmd
View File

@@ -339,7 +339,7 @@ in the book that we will learn how to build these later on):
|--------|----|----|----|-------|
|$f(\pi)$|.10 |.25 |.65 | 1 |
:::{.callout-caution}
:::{.callout-tip}
## Note
its important to note here that the sum of the values of $\pi$ **do
@@ -364,14 +364,15 @@ and has the following properties
:::
:::{.callout-caution}
:::{.callout-tip}
## in emanuel's words
what does this mean? well its very straightforward a pmf is a function that takes
in a some value y and outputs the probability that the random variable
$Y$ equals $y$.
:::
### The Binomial Model
next we would like add a the dependancy of $Y$ on $\pi$, we do so by introducing
the conditional pmf.
:::{.callout-note}
## Conditional probability model of data $Y$
@@ -388,8 +389,158 @@ and has the following properties,
2. $\sum_{\forall y}f(y|\pi) = 1$
:::
:::{.callout-caution}
:::{.callout-tip}
## in emanuel's words
this is essentially the same probability model had defined above, except
now we are condition probabilities by some parameter $\pi$
:::
:::
in the example of the chess player we must make some assumptions:
1. the chances of winning any match in the game stay constant. So if
at match number 1 human has a .65% of winning, then that is the same
for match 2-6.
2. Winning or loosing a game does not affect the chances of winning
or loosing the next game, i.e matches are independent of one another.
These two assumptions lead us to the **Binomial Model**.
:::{.callout-note}
## The Binomial Model
Let the random variable $Y$ represent the number of successes in $n$ trials.
Assume that each trial is independent, and the probability of sucess in a
given trial is $\pi$. Then the conditional dependence of $Y$ on $\pi$ can
be modeled by the **Binomial Model** with parameters $n$ and $\pi$. We can
write this as,
$$Y|\pi \sim Bin(n, \pi)$$
the binomial model is specified by the pmf:
$$f(y|\pi) = {n \choose y} \pi^y(1 - \pi)^{n-y}$$
:::
knowing this we can represent $Y$ the total number of matches out of 6
that the human can win.
$$Y|\pi \sim Bin(6, \pi)$$
and conditional pmf:
$$f(y|\pi) = {6 \choose y}\pi^y(1 - \pi)^{6 - y}\;\; \text{for } y \in \{1, 2, 3, 4, 5, 6\}$$
with the pmf we can now determine the probability of the human winning $Y$ matches
out of 6 for any given value of $\pi$
```{r}
chess_pmf <- function(y, p, n = 6) {
choose(n, y) * (p ^ y) * (1 - p)^(n - y)
}
# what is probability that human wins 6 games given a pi value of .8
chess_pmf(y = 5, p = .8)
```
:::{.callout-tip}
##
the formula for the binomial is actually pretty intuitive, first you have
the scalar ${n \choose y}$ this will determine the total number of ways
the player can win $y$ games out of the possible $n$. This is first multiplied
by the probablility of success in the $n$ trials since $(p ^ y)$ can be
re-written as $p\times p\times \cdots \times p$, and then multiplied by
the probability of $n-y$ failures $(1 - p)^{n - y}$
:::
```{r}
pies <- seq(0, 1, by = .05)
py <- chess_pmf(y = 4, p = pies)
d <- data.frame(pies = pies, py = py)
d |>
ggplot(aes(pies, py)) + geom_col()
```
```{r}
pies <- c(.2, .5, .8)
ys <- 0:6
d <- tidyr::expand_grid(pies, ys)
fys <- purrr::map2_dbl(d$ys, d$pies, ~chess_pmf(.x, .y), n=6)
d$fys <- fys
d$display_pi <- as.factor(paste("pi =", d$pies))
d |>
ggplot(aes(x = ys, y = fys)) +
geom_col() +
scale_x_continuous(breaks = 0:6) +
facet_wrap(vars(display_pi))
```
The plot shows the three possible values for $\pi$ along
with the value of the pmf for each of the possible
matches the human can win in a game. The values of $f(y|\pi)$
are pretty intuitive, we would expect the random variable $Y$
to be lower when the value of $\pi$ is lower and higher when
the value of $\pi$ is higher.
For the sake of the excercise lets add more values of $\pi$
so that we can see this shift happen in more detail.
```{r}
pies <- seq(.1, .9, by = .1)
ys <- 0:6
d <- tidyr::expand_grid(pies, ys)
fys <- purrr::map2_dbl(d$ys, d$pies, ~chess_pmf(.x, .y), n=6)
d$fys <- fys
d$display_pi <- as.factor(paste("pi =", d$pies))
d |>
ggplot(aes(x = ys, y = fys)) +
geom_col() +
scale_x_continuous(breaks = 0:6) +
facet_wrap(vars(display_pi), nrow = 3)
```
as it turns out we learn that the human ended up winning just
one game in the 1997 rematch, $Y = 1$. The next step in our
analysis is to determine how compatible this new data is with
each value of $\pi$, the likelihood that is.
This is very easy to do with all the work we have done so far:
```{r}
d |>
filter(ys == 1) |>
ggplot(aes(pies, fys)) +
geom_col() +
scale_x_continuous(breaks = seq(.1, .9, by = .1))
```
It's very important to note the following
```{r}
# this will sum to a value greater than 1!!
d |>
filter(ys == 1) |>
pull(fys) |>
sum()
```
:::{.callout-important icon="true"}
this has been mentioned before but its an important message
to drive home. Note that the reason why thes values sum to a
value greater than 1 is that they are **not** probabilities, they
are likelihoods. We are determining how likely each value of
$\pi$ is given that we have observed $Y = 1$.
:::