In the above calculations we also step through joint probabilities
+
+
+
+
+
+
+Joint and conditional probability
+
+
+
+
\[P(A \cap B) = P(A|B)P(B)\]
+
\(A\) and \(B\) are said to be independent events, if and only if
+
\[P(A \cap B) = P(A)P(B)\]
+
from this we can also derive the definition of a conditional probability
+
\[P(A|B) = \frac{P(A \cap B)}{P(B)}\]
+
+
+
At this point we are able to answer the question, “What is the probability, the new article is fake?”. Given that the new article has an exclamation point, we can zoom into the top row of the table of probabilitties. Within this row we have probabilities \(.1068/.12 = .833\) for fake and \(.0132 / .12 = .11\) for real.
+
This is essentially Baye’s Rule. We developed a posterior probability for an event \(B\) given some observation \(A\). We did so by combining the likelihood of event \(B\) given some new data \(A\) and the prior probability of event \(B\). More formally we have the following definition:
+
+
+
+
+
+
+Baye’s Rule
+
+
+
+
The posterior probability of an event \(B\) given a \(A\) is:
# A tibble: 2 × 3
+ type total prop
+ <chr> <int> <dbl>
+1 fake 1005 0.888
+2 real 127 0.112
+
+
+
+
+
+
+
+
+Discrete Probability Model
+
+
+
+
Let \(Y\) be a discrete random variable. The probability model for \(Y\) is described by a probability mass function (pmf) defined as: \[f(y) = P(Y = y)\]
+
and has the following properties
+
+
\(0 \leq f(y) \leq 1\;\; \forall y\)
+
\(\sum_{\forall y}f(y) = 1\)
+
+
+
+
+
+
+
+
+
+in emanuel’s words
+
+
+
+
what does this mean? well its very straightforward a pmf is a function that takes in a some value y and outputs the probability that the random variable \(Y\) equals \(y\).
+
+
diff --git a/R/ch2.qmd b/R/ch2.qmd
index ef93a80..4f2dd87 100644
--- a/R/ch2.qmd
+++ b/R/ch2.qmd
@@ -25,6 +25,8 @@ library(bayesrules)
library(dplyr)
library(tidyr)
library(gt)
+library(tibble)
+library(ggplot2)
data(fake_news)
fake_news <- tibble::as_tibble(fake_news)
```
@@ -180,4 +182,158 @@ parts. Namely
$$P(B^c) = P(A \cap B^c) + P(A^c \cap B^c)$$
$$=P(A|B^c)P(B^c) + P(A^c|B^c)P(B^c)$$
$$=.0132 + .5868 = .6$$
+:::
+
+In the above calculations we also step through **joint probabilities**
+
+:::{.callout-note}
+## Joint and conditional probability
+
+$$P(A \cap B) = P(A|B)P(B)$$
+
+$A$ and $B$ are said to be independent events, if and only if
+
+$$P(A \cap B) = P(A)P(B)$$
+
+from this we can also derive the definition of a conditional probability
+
+$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$
+
+:::
+
+At this point we are able to answer the question, "What is the probability,
+the new article is fake?". Given that the new article has an exclamation
+point, we can zoom into the top row of the table of probabilitties. Within
+this row we have probabilities $.1068/.12 = .833$ for fake and $.0132 / .12 = .11$
+for real.
+
+This is essentially Baye's Rule. We developed a posterior probability for an event
+$B$ given some observation $A$. We did so by combining the likelihood of event $B$
+given some new data $A$ and the prior probability of event $B$. More formally we
+have the following definition:
+
+:::{.callout-note}
+## Baye's Rule
+
+The posterior probability of an event $B$ given a $A$ is:
+
+$$ P(B|A) = \frac{P(A \cap B)}{P(A)} = \frac{L(B|A)P(B)}{P(A)}$$
+
+where $L$ is the likelihood function $L(B|A) = P(B|A)$ and $P(A)$ is the
+total probability of $A$.
+
+More generally,
+
+$$ \frac{likelihood \cdot prior}{normalizing \;\; constant}$$
+:::
+
+### Simualation
+
+
+```{r}
+articles <- tibble::tibble(type = c("real", "fake"))
+
+priors <- c(.6, .4)
+
+articles_sim <- sample_n(articles, 10000, replace = TRUE, weight = priors)
+```
+
+```{r}
+articles_sim |>
+ ggplot(aes(x = type)) + geom_bar()
+```
+
+and a summary table
+
+```{r}
+articles_sim |>
+ group_by(type) |>
+ summarise(
+ total = n(),
+ prop = total / nrow(articles_sim)
+ ) |>
+ gt()|>
+ gt::cols_width(everything() ~ px(100))
+```
+
+the simulation of 10,000 articles shows us very nearly
+the same priors we had from the data. We can now add
+the exclamation usage into the data.
+
+```{r}
+
+articles_sim <- articles_sim |>
+ mutate(model_data = case_when(
+ type == "fake" ~ .267,
+ type == "real" ~ .022
+ ))
+```
+
+
+The plan here is to iterate through the 10,000 samples
+and use the `data_model` value to assign either, "yes" or
+"no" using the `sample` function.
+
+```{r}
+data <- c("yes", "no")
+
+articles_sim <- articles_sim |>
+ mutate(id = row_number()) |>
+ group_by(id) |>
+ mutate(usage = sample(data, 1, prob = c(model_data, 1 - model_data)))
+```
+
+
+```{r}
+articles_sim |>
+ group_by(usage, type) |>
+ summarise(
+ total = n()
+ ) |>
+ pivot_wider(names_from = type, values_from = total)
+```
+
+```{r}
+articles_sim |>
+ ggplot(aes(x = type, fill = usage)) +
+ geom_bar() +
+ scale_fill_discrete(type = c("gray8", "dodgerblue4"))
+```
+
+So far have compute both the priors and likelihoods, we can simply
+filter our data to reflect the incoming article and determine our
+posterior.
+
+```{r}
+articles_sim |>
+ filter(usage == "yes") |>
+ group_by(type) |>
+ summarise(
+ total = n()
+ ) |>
+ mutate(
+ prop = total / sum(total)
+ )
+```
+
+
+:::{.callout-note}
+## Discrete Probability Model
+
+Let $Y$ be a discrete random variable. The probability model for $Y$ is
+described by a **probability mass function** (pmf) defined as:
+$$f(y) = P(Y = y)$$
+
+and has the following properties
+
+1. $0 \leq f(y) \leq 1\;\; \forall y$
+2. $\sum_{\forall y}f(y) = 1$
+:::
+
+
+:::{.callout-caution}
+## in emanuel's words
+what does this mean? well its very straightforward a pmf is a function that takes
+in a some value y and outputs the probability that the random variable
+$Y$ equals $y$.
:::
\ No newline at end of file
diff --git a/R/ch2_files/figure-html/unnamed-chunk-11-1.png b/R/ch2_files/figure-html/unnamed-chunk-11-1.png
new file mode 100644
index 0000000..faa2b66
Binary files /dev/null and b/R/ch2_files/figure-html/unnamed-chunk-11-1.png differ
diff --git a/R/ch2_files/figure-html/unnamed-chunk-6-1.png b/R/ch2_files/figure-html/unnamed-chunk-6-1.png
new file mode 100644
index 0000000..84c0bbf
Binary files /dev/null and b/R/ch2_files/figure-html/unnamed-chunk-6-1.png differ