adds more work

2022-09-10 00:54:37 -07:00
parent 5c071fcbb2
commit c01c507087
4 changed files with 797 additions and 48 deletions
--- a/R/ch2.qmd
+++ b/R/ch2.qmd
@@ -25,6 +25,8 @@ library(bayesrules)
 library(dplyr)
 library(tidyr)
 library(gt)
+library(tibble)
+library(ggplot2)
 data(fake_news)
 fake_news <- tibble::as_tibble(fake_news)
 ```
@@ -180,4 +182,158 @@ parts. Namely
 $$P(B^c) = P(A \cap B^c) + P(A^c \cap B^c)$$
 $$=P(A|B^c)P(B^c) + P(A^c|B^c)P(B^c)$$
 $$=.0132 + .5868 = .6$$
+:::
+
+In the above calculations we also step through **joint probabilities**
+
+:::{.callout-note}
+## Joint and conditional probability
+
+$$P(A \cap B) = P(A|B)P(B)$$
+
+$A$ and $B$ are said to be independent events, if and only if 
+
+$$P(A \cap B) = P(A)P(B)$$
+
+from this we can also derive the definition of a conditional probability
+
+$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$
+
+:::
+
+At this point we are able to answer the question, "What is the probability, 
+the new article is fake?". Given that the new article has an exclamation 
+point, we can zoom into the top row of the table of probabilitties. Within 
+this row we have probabilities $.1068/.12 = .833$ for fake and $.0132 / .12 = .11$
+for real. 
+
+This is essentially Baye's Rule. We developed a posterior probability for an event 
+$B$ given some observation $A$. We did so by combining the likelihood of event $B$
+given some new data $A$ and the prior probability of event $B$. More formally we
+have the following definition:
+
+:::{.callout-note}
+## Baye's Rule
+
+The posterior probability of an event $B$ given a $A$ is:
+
+$$ P(B|A) = \frac{P(A \cap B)}{P(A)} = \frac{L(B|A)P(B)}{P(A)}$$
+
+where $L$ is the likelihood function $L(B|A) = P(B|A)$ and $P(A)$ is the 
+total probability of $A$.
+
+More generally,
+
+$$ \frac{likelihood \cdot prior}{normalizing \;\; constant}$$
+:::
+
+### Simualation 
+
+
+```{r}
+articles <- tibble::tibble(type = c("real", "fake"))
+
+priors <- c(.6, .4)
+
+articles_sim <- sample_n(articles, 10000, replace = TRUE, weight = priors)
+```
+
+```{r}
+articles_sim |>
+    ggplot(aes(x = type)) + geom_bar()
+```
+
+and a summary table
+
+```{r}
+articles_sim |>
+    group_by(type) |>
+    summarise(
+        total = n(), 
+        prop = total / nrow(articles_sim)
+    ) |>
+    gt()|>
+    gt::cols_width(everything() ~ px(100))
+```
+
+the simulation of 10,000 articles shows us very nearly
+the same priors we had from the data. We can now add 
+the exclamation usage into the data. 
+
+```{r}
+
+articles_sim <- articles_sim |>
+    mutate(model_data = case_when(
+        type == "fake" ~ .267, 
+        type == "real" ~ .022
+    ))
+```
+
+
+The plan here is to iterate through the 10,000 samples
+and use the `data_model` value to assign either, "yes" or
+"no" using the `sample` function.
+
+```{r}
+data <- c("yes", "no")
+
+articles_sim <- articles_sim |>
+    mutate(id = row_number()) |>
+    group_by(id) |>
+    mutate(usage = sample(data, 1, prob = c(model_data, 1 - model_data)))
+```
+
+
+```{r}
+articles_sim |>
+    group_by(usage, type) |>
+    summarise(
+        total = n()
+    ) |>
+    pivot_wider(names_from = type, values_from = total)
+```
+
+```{r}
+articles_sim |>
+    ggplot(aes(x = type, fill = usage)) + 
+    geom_bar() + 
+    scale_fill_discrete(type = c("gray8", "dodgerblue4"))
+```
+
+So far have compute both the priors and likelihoods, we can simply 
+filter our data to reflect the incoming article and determine our 
+posterior. 
+
+```{r}
+articles_sim |>
+    filter(usage == "yes") |>
+    group_by(type) |>
+    summarise(
+        total = n()
+    ) |>
+    mutate(
+        prop = total / sum(total)
+    )
+```
+
+
+:::{.callout-note}
+## Discrete Probability Model
+
+Let $Y$ be a discrete random variable. The probability model for $Y$ is
+described by a **probability mass function** (pmf) defined as:
+$$f(y) = P(Y = y)$$
+
+and has the following properties 
+
+1. $0 \leq f(y) \leq 1\;\; \forall y$
+2. $\sum_{\forall y}f(y) = 1$
+:::
+
+
+:::{.callout-caution}
+## in emanuel's words
+what does this mean? well its very straightforward a pmf is a function that takes
+in a some value y and outputs the probability that the random variable 
+$Y$ equals $y$. 
 :::