more work
This commit is contained in:
161
R/ch2.qmd
161
R/ch2.qmd
@@ -339,7 +339,7 @@ in the book that we will learn how to build these later on):
|
||||
|--------|----|----|----|-------|
|
||||
|$f(\pi)$|.10 |.25 |.65 | 1 |
|
||||
|
||||
:::{.callout-caution}
|
||||
:::{.callout-tip}
|
||||
## Note
|
||||
|
||||
its important to note here that the sum of the values of $\pi$ **do
|
||||
@@ -364,14 +364,15 @@ and has the following properties
|
||||
:::
|
||||
|
||||
|
||||
:::{.callout-caution}
|
||||
:::{.callout-tip}
|
||||
## in emanuel's words
|
||||
what does this mean? well its very straightforward a pmf is a function that takes
|
||||
in a some value y and outputs the probability that the random variable
|
||||
$Y$ equals $y$.
|
||||
:::
|
||||
|
||||
### The Binomial Model
|
||||
next we would like add a the dependancy of $Y$ on $\pi$, we do so by introducing
|
||||
the conditional pmf.
|
||||
|
||||
:::{.callout-note}
|
||||
## Conditional probability model of data $Y$
|
||||
@@ -388,8 +389,158 @@ and has the following properties,
|
||||
2. $\sum_{\forall y}f(y|\pi) = 1$
|
||||
:::
|
||||
|
||||
:::{.callout-caution}
|
||||
:::{.callout-tip}
|
||||
## in emanuel's words
|
||||
this is essentially the same probability model had defined above, except
|
||||
now we are condition probabilities by some parameter $\pi$
|
||||
:::
|
||||
:::
|
||||
|
||||
in the example of the chess player we must make some assumptions:
|
||||
|
||||
1. the chances of winning any match in the game stay constant. So if
|
||||
at match number 1 human has a .65% of winning, then that is the same
|
||||
for match 2-6.
|
||||
|
||||
2. Winning or loosing a game does not affect the chances of winning
|
||||
or loosing the next game, i.e matches are independent of one another.
|
||||
|
||||
These two assumptions lead us to the **Binomial Model**.
|
||||
|
||||
:::{.callout-note}
|
||||
## The Binomial Model
|
||||
|
||||
Let the random variable $Y$ represent the number of successes in $n$ trials.
|
||||
Assume that each trial is independent, and the probability of sucess in a
|
||||
given trial is $\pi$. Then the conditional dependence of $Y$ on $\pi$ can
|
||||
be modeled by the **Binomial Model** with parameters $n$ and $\pi$. We can
|
||||
write this as,
|
||||
|
||||
$$Y|\pi \sim Bin(n, \pi)$$
|
||||
|
||||
the binomial model is specified by the pmf:
|
||||
|
||||
$$f(y|\pi) = {n \choose y} \pi^y(1 - \pi)^{n-y}$$
|
||||
:::
|
||||
|
||||
knowing this we can represent $Y$ the total number of matches out of 6
|
||||
that the human can win.
|
||||
|
||||
$$Y|\pi \sim Bin(6, \pi)$$
|
||||
|
||||
and conditional pmf:
|
||||
|
||||
$$f(y|\pi) = {6 \choose y}\pi^y(1 - \pi)^{6 - y}\;\; \text{for } y \in \{1, 2, 3, 4, 5, 6\}$$
|
||||
|
||||
with the pmf we can now determine the probability of the human winning $Y$ matches
|
||||
out of 6 for any given value of $\pi$
|
||||
|
||||
```{r}
|
||||
chess_pmf <- function(y, p, n = 6) {
|
||||
choose(n, y) * (p ^ y) * (1 - p)^(n - y)
|
||||
}
|
||||
|
||||
# what is probability that human wins 6 games given a pi value of .8
|
||||
chess_pmf(y = 5, p = .8)
|
||||
|
||||
```
|
||||
|
||||
:::{.callout-tip}
|
||||
##
|
||||
|
||||
the formula for the binomial is actually pretty intuitive, first you have
|
||||
the scalar ${n \choose y}$ this will determine the total number of ways
|
||||
the player can win $y$ games out of the possible $n$. This is first multiplied
|
||||
by the probablility of success in the $n$ trials since $(p ^ y)$ can be
|
||||
re-written as $p\times p\times \cdots \times p$, and then multiplied by
|
||||
the probability of $n-y$ failures $(1 - p)^{n - y}$
|
||||
:::
|
||||
|
||||
```{r}
|
||||
pies <- seq(0, 1, by = .05)
|
||||
py <- chess_pmf(y = 4, p = pies)
|
||||
|
||||
d <- data.frame(pies = pies, py = py)
|
||||
|
||||
d |>
|
||||
ggplot(aes(pies, py)) + geom_col()
|
||||
```
|
||||
|
||||
|
||||
```{r}
|
||||
pies <- c(.2, .5, .8)
|
||||
ys <- 0:6
|
||||
|
||||
d <- tidyr::expand_grid(pies, ys)
|
||||
fys <- purrr::map2_dbl(d$ys, d$pies, ~chess_pmf(.x, .y), n=6)
|
||||
|
||||
d$fys <- fys
|
||||
d$display_pi <- as.factor(paste("pi =", d$pies))
|
||||
|
||||
d |>
|
||||
ggplot(aes(x = ys, y = fys)) +
|
||||
geom_col() +
|
||||
scale_x_continuous(breaks = 0:6) +
|
||||
facet_wrap(vars(display_pi))
|
||||
```
|
||||
|
||||
The plot shows the three possible values for $\pi$ along
|
||||
with the value of the pmf for each of the possible
|
||||
matches the human can win in a game. The values of $f(y|\pi)$
|
||||
are pretty intuitive, we would expect the random variable $Y$
|
||||
to be lower when the value of $\pi$ is lower and higher when
|
||||
the value of $\pi$ is higher.
|
||||
|
||||
For the sake of the excercise lets add more values of $\pi$
|
||||
so that we can see this shift happen in more detail.
|
||||
|
||||
```{r}
|
||||
pies <- seq(.1, .9, by = .1)
|
||||
ys <- 0:6
|
||||
|
||||
d <- tidyr::expand_grid(pies, ys)
|
||||
fys <- purrr::map2_dbl(d$ys, d$pies, ~chess_pmf(.x, .y), n=6)
|
||||
|
||||
d$fys <- fys
|
||||
d$display_pi <- as.factor(paste("pi =", d$pies))
|
||||
|
||||
d |>
|
||||
ggplot(aes(x = ys, y = fys)) +
|
||||
geom_col() +
|
||||
scale_x_continuous(breaks = 0:6) +
|
||||
facet_wrap(vars(display_pi), nrow = 3)
|
||||
```
|
||||
|
||||
as it turns out we learn that the human ended up winning just
|
||||
one game in the 1997 rematch, $Y = 1$. The next step in our
|
||||
analysis is to determine how compatible this new data is with
|
||||
each value of $\pi$, the likelihood that is.
|
||||
|
||||
This is very easy to do with all the work we have done so far:
|
||||
|
||||
```{r}
|
||||
d |>
|
||||
filter(ys == 1) |>
|
||||
ggplot(aes(pies, fys)) +
|
||||
geom_col() +
|
||||
scale_x_continuous(breaks = seq(.1, .9, by = .1))
|
||||
```
|
||||
|
||||
It's very important to note the following
|
||||
|
||||
```{r}
|
||||
# this will sum to a value greater than 1!!
|
||||
d |>
|
||||
filter(ys == 1) |>
|
||||
pull(fys) |>
|
||||
sum()
|
||||
```
|
||||
|
||||
:::{.callout-important icon="true"}
|
||||
this has been mentioned before but its an important message
|
||||
to drive home. Note that the reason why thes values sum to a
|
||||
value greater than 1 is that they are **not** probabilities, they
|
||||
are likelihoods. We are determining how likely each value of
|
||||
$\pi$ is given that we have observed $Y = 1$.
|
||||
:::
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user