diff --git a/R/ch2.html b/R/ch2.html index 1d52094..169bd3b 100644 --- a/R/ch2.html +++ b/R/ch2.html @@ -113,10 +113,7 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni @@ -244,12 +241,12 @@ Probability and Likelihood gt::cols_width(everything() ~ px(100))
-
+
@@ -854,12 +851,12 @@ Baye’s Rule gt::cols_width(everything() ~ px(100))
-
+
@@ -1275,11 +1272,11 @@ Baye’s Rule fake -4031 -0.4031 +3967 +0.3967 real -5969 -0.5969 +6033 +0.6033 @@ -1316,8 +1313,8 @@ Baye’s Rule # Groups: usage [2] usage fake real <chr> <int> <int> -1 no 2955 5845 -2 yes 1076 124 +1 no 2891 5910 +2 yes 1076 123
@@ -1345,7 +1342,7 @@ Baye’s Rule type total prop <chr> <int> <dbl> 1 fake 1076 0.897 -2 real 124 0.103 +2 real 123 0.103
@@ -1373,7 +1370,7 @@ Baye’s Rule -
+
@@ -1404,7 +1401,7 @@ Discrete Probability Model
-
+
@@ -1417,8 +1414,7 @@ in emanuel’s words

what does this mean? well its very straightforward a pmf is a function that takes in a some value y and outputs the probability that the random variable \(Y\) equals \(y\).

-
-

The Binomial Model

+

next we would like add a the dependancy of \(Y\) on \(\pi\), we do so by introducing the conditional pmf.

@@ -1438,7 +1434,7 @@ Conditional probability model of data \(Y\)
-
+
@@ -1451,7 +1447,145 @@ in emanuel’s words

this is essentially the same probability model had defined above, except now we are condition probabilities by some parameter \(\pi\)

-
+

in the example of the chess player we must make some assumptions:

+
    +
  1. the chances of winning any match in the game stay constant. So if at match number 1 human has a .65% of winning, then that is the same for match 2-6.

  2. +
  3. Winning or loosing a game does not affect the chances of winning or loosing the next game, i.e matches are independent of one another.

  4. +
+

These two assumptions lead us to the Binomial Model.

+
+
+
+ +
+
+The Binomial Model +
+
+
+

Let the random variable \(Y\) represent the number of successes in \(n\) trials. Assume that each trial is independent, and the probability of sucess in a given trial is \(\pi\). Then the conditional dependence of \(Y\) on \(\pi\) can be modeled by the Binomial Model with parameters \(n\) and \(\pi\). We can write this as,

+

\[Y|\pi \sim Bin(n, \pi)\]

+

the binomial model is specified by the pmf:

+

\[f(y|\pi) = {n \choose y} \pi^y(1 - \pi)^{n-y}\]

+
+
+

knowing this we can represent \(Y\) the total number of matches out of 6 that the human can win.

+

\[Y|\pi \sim Bin(6, \pi)\]

+

and conditional pmf:

+

\[f(y|\pi) = {6 \choose y}\pi^y(1 - \pi)^{6 - y}\;\; \text{for } y \in \{1, 2, 3, 4, 5, 6\}\]

+

with the pmf we can now determine the probability of the human winning \(Y\) matches out of 6 for any given value of \(\pi\)

+
+
chess_pmf <- function(y, p, n = 6) {
+    choose(n, y) * (p ^ y) * (1 - p)^(n - y)
+}
+
+# what is probability that human wins 6 games given a pi value of .8 
+chess_pmf(y = 5, p = .8)
+
+
[1] 0.393216
+
+
+
+
+
+ +
+
+ +
+
+
+

the formula for the binomial is actually pretty intuitive, first you have the scalar \({n \choose y}\) this will determine the total number of ways the player can win \(y\) games out of the possible \(n\). This is first multiplied by the probablility of success in the \(n\) trials since \((p ^ y)\) can be re-written as \(p\times p\times \cdots \times p\), and then multiplied by the probability of \(n-y\) failures \((1 - p)^{n - y}\)

+
+
+
+
pies <- seq(0, 1, by = .05)
+py <- chess_pmf(y = 4, p = pies)
+
+d <- data.frame(pies = pies, py = py)
+
+d |>
+    ggplot(aes(pies, py)) + geom_col()
+
+

+
+
+
+
pies <- c(.2, .5, .8)
+ys <- 0:6
+
+d <- tidyr::expand_grid(pies, ys)
+fys <- purrr::map2_dbl(d$ys, d$pies, ~chess_pmf(.x, .y), n=6)
+
+d$fys <- fys
+d$display_pi <- as.factor(paste("pi =", d$pies))
+
+d |>
+    ggplot(aes(x = ys, y = fys)) + 
+    geom_col() + 
+    scale_x_continuous(breaks = 0:6) + 
+    facet_wrap(vars(display_pi))
+
+

+
+
+

The plot shows the three possible values for \(\pi\) along with the value of the pmf for each of the possible matches the human can win in a game. The values of \(f(y|\pi)\) are pretty intuitive, we would expect the random variable \(Y\) to be lower when the value of \(\pi\) is lower and higher when the value of \(\pi\) is higher.

+

For the sake of the excercise lets add more values of \(\pi\) so that we can see this shift happen in more detail.

+
+
pies <- seq(.1, .9, by = .1)
+ys <- 0:6
+
+d <- tidyr::expand_grid(pies, ys)
+fys <- purrr::map2_dbl(d$ys, d$pies, ~chess_pmf(.x, .y), n=6)
+
+d$fys <- fys
+d$display_pi <- as.factor(paste("pi =", d$pies))
+
+d |>
+    ggplot(aes(x = ys, y = fys)) + 
+    geom_col() + 
+    scale_x_continuous(breaks = 0:6) + 
+    facet_wrap(vars(display_pi), nrow = 3)
+
+

+
+
+

as it turns out we learn that the human ended up winning just one game in the 1997 rematch, \(Y = 1\). The next step in our analysis is to determine how compatible this new data is with each value of \(\pi\), the likelihood that is.

+

This is very easy to do with all the work we have done so far:

+
+
d |>
+    filter(ys == 1) |>
+    ggplot(aes(pies, fys)) + 
+    geom_col() + 
+    scale_x_continuous(breaks = seq(.1, .9, by = .1))
+
+

+
+
+

It’s very important to note the following

+
+
# this will sum to a value greater than 1!!
+d |>
+    filter(ys == 1) |>
+    pull(fys) |>
+    sum()
+
+
[1] 1.37907
+
+
+
+
+
+ +
+
+Important +
+
+
+

this has been mentioned before but its an important message to drive home. Note that the reason why thes values sum to a value greater than 1 is that they are not probabilities, they are likelihoods. We are determining how likely each value of \(\pi\) is given that we have observed \(Y = 1\).

+
+
diff --git a/R/ch2.qmd b/R/ch2.qmd index c4b1d05..4701623 100644 --- a/R/ch2.qmd +++ b/R/ch2.qmd @@ -339,7 +339,7 @@ in the book that we will learn how to build these later on): |--------|----|----|----|-------| |$f(\pi)$|.10 |.25 |.65 | 1 | -:::{.callout-caution} +:::{.callout-tip} ## Note its important to note here that the sum of the values of $\pi$ **do @@ -364,14 +364,15 @@ and has the following properties ::: -:::{.callout-caution} +:::{.callout-tip} ## in emanuel's words what does this mean? well its very straightforward a pmf is a function that takes in a some value y and outputs the probability that the random variable $Y$ equals $y$. ::: -### The Binomial Model +next we would like add a the dependancy of $Y$ on $\pi$, we do so by introducing +the conditional pmf. :::{.callout-note} ## Conditional probability model of data $Y$ @@ -388,8 +389,158 @@ and has the following properties, 2. $\sum_{\forall y}f(y|\pi) = 1$ ::: -:::{.callout-caution} +:::{.callout-tip} ## in emanuel's words this is essentially the same probability model had defined above, except now we are condition probabilities by some parameter $\pi$ -::: \ No newline at end of file +::: + +in the example of the chess player we must make some assumptions: + +1. the chances of winning any match in the game stay constant. So if +at match number 1 human has a .65% of winning, then that is the same +for match 2-6. + +2. Winning or loosing a game does not affect the chances of winning +or loosing the next game, i.e matches are independent of one another. + +These two assumptions lead us to the **Binomial Model**. + +:::{.callout-note} +## The Binomial Model + +Let the random variable $Y$ represent the number of successes in $n$ trials. +Assume that each trial is independent, and the probability of sucess in a +given trial is $\pi$. Then the conditional dependence of $Y$ on $\pi$ can +be modeled by the **Binomial Model** with parameters $n$ and $\pi$. We can +write this as, + +$$Y|\pi \sim Bin(n, \pi)$$ + +the binomial model is specified by the pmf: + +$$f(y|\pi) = {n \choose y} \pi^y(1 - \pi)^{n-y}$$ +::: + +knowing this we can represent $Y$ the total number of matches out of 6 +that the human can win. + +$$Y|\pi \sim Bin(6, \pi)$$ + +and conditional pmf: + +$$f(y|\pi) = {6 \choose y}\pi^y(1 - \pi)^{6 - y}\;\; \text{for } y \in \{1, 2, 3, 4, 5, 6\}$$ + +with the pmf we can now determine the probability of the human winning $Y$ matches +out of 6 for any given value of $\pi$ + +```{r} +chess_pmf <- function(y, p, n = 6) { + choose(n, y) * (p ^ y) * (1 - p)^(n - y) +} + +# what is probability that human wins 6 games given a pi value of .8 +chess_pmf(y = 5, p = .8) + +``` + +:::{.callout-tip} +## + +the formula for the binomial is actually pretty intuitive, first you have +the scalar ${n \choose y}$ this will determine the total number of ways +the player can win $y$ games out of the possible $n$. This is first multiplied +by the probablility of success in the $n$ trials since $(p ^ y)$ can be +re-written as $p\times p\times \cdots \times p$, and then multiplied by +the probability of $n-y$ failures $(1 - p)^{n - y}$ +::: + +```{r} +pies <- seq(0, 1, by = .05) +py <- chess_pmf(y = 4, p = pies) + +d <- data.frame(pies = pies, py = py) + +d |> + ggplot(aes(pies, py)) + geom_col() +``` + + +```{r} +pies <- c(.2, .5, .8) +ys <- 0:6 + +d <- tidyr::expand_grid(pies, ys) +fys <- purrr::map2_dbl(d$ys, d$pies, ~chess_pmf(.x, .y), n=6) + +d$fys <- fys +d$display_pi <- as.factor(paste("pi =", d$pies)) + +d |> + ggplot(aes(x = ys, y = fys)) + + geom_col() + + scale_x_continuous(breaks = 0:6) + + facet_wrap(vars(display_pi)) +``` + +The plot shows the three possible values for $\pi$ along +with the value of the pmf for each of the possible +matches the human can win in a game. The values of $f(y|\pi)$ +are pretty intuitive, we would expect the random variable $Y$ +to be lower when the value of $\pi$ is lower and higher when +the value of $\pi$ is higher. + +For the sake of the excercise lets add more values of $\pi$ +so that we can see this shift happen in more detail. + +```{r} +pies <- seq(.1, .9, by = .1) +ys <- 0:6 + +d <- tidyr::expand_grid(pies, ys) +fys <- purrr::map2_dbl(d$ys, d$pies, ~chess_pmf(.x, .y), n=6) + +d$fys <- fys +d$display_pi <- as.factor(paste("pi =", d$pies)) + +d |> + ggplot(aes(x = ys, y = fys)) + + geom_col() + + scale_x_continuous(breaks = 0:6) + + facet_wrap(vars(display_pi), nrow = 3) +``` + +as it turns out we learn that the human ended up winning just +one game in the 1997 rematch, $Y = 1$. The next step in our +analysis is to determine how compatible this new data is with +each value of $\pi$, the likelihood that is. + +This is very easy to do with all the work we have done so far: + +```{r} +d |> + filter(ys == 1) |> + ggplot(aes(pies, fys)) + + geom_col() + + scale_x_continuous(breaks = seq(.1, .9, by = .1)) +``` + +It's very important to note the following + +```{r} +# this will sum to a value greater than 1!! +d |> + filter(ys == 1) |> + pull(fys) |> + sum() +``` + +:::{.callout-important icon="true"} +this has been mentioned before but its an important message +to drive home. Note that the reason why thes values sum to a +value greater than 1 is that they are **not** probabilities, they +are likelihoods. We are determining how likely each value of +$\pi$ is given that we have observed $Y = 1$. +::: + + diff --git a/R/ch2_files/figure-html/unnamed-chunk-11-1.png b/R/ch2_files/figure-html/unnamed-chunk-11-1.png index d6b4e1a..7057949 100644 Binary files a/R/ch2_files/figure-html/unnamed-chunk-11-1.png and b/R/ch2_files/figure-html/unnamed-chunk-11-1.png differ diff --git a/R/ch2_files/figure-html/unnamed-chunk-14-1.png b/R/ch2_files/figure-html/unnamed-chunk-14-1.png new file mode 100644 index 0000000..847f110 Binary files /dev/null and b/R/ch2_files/figure-html/unnamed-chunk-14-1.png differ diff --git a/R/ch2_files/figure-html/unnamed-chunk-15-1.png b/R/ch2_files/figure-html/unnamed-chunk-15-1.png new file mode 100644 index 0000000..0dc24a7 Binary files /dev/null and b/R/ch2_files/figure-html/unnamed-chunk-15-1.png differ diff --git a/R/ch2_files/figure-html/unnamed-chunk-16-1.png b/R/ch2_files/figure-html/unnamed-chunk-16-1.png new file mode 100644 index 0000000..0f99fa6 Binary files /dev/null and b/R/ch2_files/figure-html/unnamed-chunk-16-1.png differ diff --git a/R/ch2_files/figure-html/unnamed-chunk-17-1.png b/R/ch2_files/figure-html/unnamed-chunk-17-1.png new file mode 100644 index 0000000..66a3b69 Binary files /dev/null and b/R/ch2_files/figure-html/unnamed-chunk-17-1.png differ diff --git a/R/ch2_files/figure-html/unnamed-chunk-6-1.png b/R/ch2_files/figure-html/unnamed-chunk-6-1.png index 303bf1d..d2ed0b1 100644 Binary files a/R/ch2_files/figure-html/unnamed-chunk-6-1.png and b/R/ch2_files/figure-html/unnamed-chunk-6-1.png differ