adds more work based on chapter two

2022-09-04 23:51:35 -07:00
parent f9340fd7aa
commit cac5ac9243
6 changed files with 710 additions and 152 deletions
--- a/R/ch2.qmd
+++ b/R/ch2.qmd
@@ -1,11 +1,16 @@
 ---
 title: "Chapter 2 Notes"
 author: "Emanuel Rodriguez"
+execute:
+    message: false
+    warning: false
 format:
    html:
-        mainfont: arial
        monofont: "Cascadia Mono"
-        highlight-style: ayu-dark
+        highlight-style: gruvbox-dark
+        css: styles.css
+        callout-icon: false
+        callout-apperance: simple
 ---

 In this chapter we step through an example 
@@ -18,6 +23,8 @@ of real vs fake of a new news article titled "The President has a secret!"
 # libraries
 library(bayesrules)
 library(dplyr)
+library(tidyr)
+library(gt)
 data(fake_news)
 fake_news <- tibble::as_tibble(fake_news)
 ```
@@ -45,4 +52,66 @@ This is the first "clue" or set of data that we have to build into our framework
 Namely, majority of articles are "real", therefore we could simply predict that 
 the new article is "real". This updated sense or reality now becomes our priors.

-Getting additional data, and updating our priors, based on additional data. 
+Getting additional data, and updating our priors, based on additional data. The 
+new observation we make is the use of exclamation marks "!". We note that the use
+of "!" is more frequent in news articles labeled as "fake". We will want to incorporate
+this into our framework to decide whether the new incoming should be labelled as 
+real or fake.
+
+### Likelihood
+
+:::{.callout-note}
+## Probability and Likelihood
+
+When the event $B$ is known, then we can evaluate the uncertainy of events 
+$A$ and $A^c$ given $B$
+
+$$P(A|B) \text{ vs } P(A^c|B)$$
+
+If on the other hand, we know event $A$ then we can evaluate the relative
+compatability of data $A$ with $B$ and $B^c$ using likelihood functions
+
+$$L(B|A) \text{ vs } L(B^c|A)$$
+$$=P(A|B) \text{ vs } P(A|B^c)$$
+
+:::
+
+So in our case, we don't know whether this new incoming article is real or not, 
+but we do know that the title has an exclamation mark. This means we can 
+evaluate how likely this article is real or not given that it contains
+an "!" in the title using likelihood functions. We can formualte this as:
+
+$$L(B|A) \text{ vs } L(B^c|A)$$
+
+And perform the computation in R as follows:
+
+```{r}
+# if fake, what are the proprotions of ! vs no-!
+prop_of_excl_within_type <- fake_news |>
+    group_by(type, title_has_excl) |>
+    summarise(
+        total = n()
+    ) |>
+    ungroup() |>
+    group_by(type) |>
+    summarise(
+        has_excl = title_has_excl,
+        prop_within_type = total / sum(total)
+    )    
+```
+
+```{r}
+prop_of_excl_within_type |>
+    pivot_wider(names_from = "type", values_from = prop_within_type) |>
+    gt() |>
+    gt::cols_label(
+        has_excl = "Contains Exclamtion",
+        fake = "Fake", 
+        real = "Real") |>
+    gt::fmt_number(columns=c("fake", "real"), decimals = 3) |>
+    gt::cols_width(everything() ~ px(100))
+```
+
+The table above also shows the likelihoods for the case 
+when an article does not contain exclamation point in 
+the title.