bayes-rules-notes/R/ch2.qmd

---
title: "Chapter 2 Notes"
author: "Emanuel Rodriguez"
execute:
    message: false
    warning: false
format:
    html:
        monofont: "Cascadia Mono"
        highlight-style: gruvbox-dark
        css: styles.css
        callout-icon: false
        callout-apperance: simple
---

In this chapter we step through an example
of "fake" vs "real" news to build a framework to determine the probability
of real vs fake of a new news article titled "The President has a secret!"

```{r}
#| message: false
#| warning: false
# libraries
library(bayesrules)
library(dplyr)
library(tidyr)
library(gt)
data(fake_news)
fake_news <- tibble::as_tibble(fake_news)
```

What is the proportion of news articles that were labeled fake vs real.

```{r}
fake_news |> glimpse()

fake_news |>
    group_by(type) |>
    summarise(
        total = n(),
        prop = total / nrow(fake_news)
    )
```

If we let $B$ be the event that a news article is "fake" news, and
$B^c$ be the event that a news article is "real", we can write the following:

$$P(B) = .4$$
$$P(B^c) = .6$$

This is the first "clue" or set of data that we have to build into our framework.
Namely, majority of articles are "real", therefore we could simply predict that
the new article is "real". This updated sense or reality now becomes our priors.

Getting additional data, and updating our priors, based on additional data. The
new observation we make is the use of exclamation marks "!". We note that the use
of "!" is more frequent in news articles labeled as "fake". We will want to incorporate
this into our framework to decide whether the new incoming should be labelled as
real or fake.

### Likelihood

:::{.callout-note}
## Probability and Likelihood

When the event $B$ is known, then we can evaluate the uncertainy of events
$A$ and $A^c$ given $B$

$$P(A|B) \text{ vs } P(A^c|B)$$

If on the other hand, we know event $A$ then we can evaluate the relative
compatability of data $A$ with $B$ and $B^c$ using likelihood functions

$$L(B|A) \text{ vs } L(B^c|A)$$
$$=P(A|B) \text{ vs } P(A|B^c)$$

:::

So in our case, we don't know whether this new incoming article is real or not,
but we do know that the title has an exclamation mark. This means we can
evaluate how likely this article is real or not given that it contains
an "!" in the title using likelihood functions. We can formualte this as:

$$L(B|A) \text{ vs } L(B^c|A)$$

And perform the computation in R as follows:

```{r}
# if fake, what are the proprotions of ! vs no-!
prop_of_excl_within_type <- fake_news |>
    group_by(type, title_has_excl) |>
    summarise(
        total = n()
    ) |>
    ungroup() |>
    group_by(type) |>
    summarise(
        has_excl = title_has_excl,
        prop_within_type = total / sum(total)
    )
```

```{r}
prop_of_excl_within_type |>
    pivot_wider(names_from = "type", values_from = prop_within_type) |>
    gt() |>
    gt::cols_label(
        has_excl = "Contains Exclamtion",
        fake = "Fake",
        real = "Real") |>
    gt::fmt_number(columns=c("fake", "real"), decimals = 3) |>
    gt::cols_width(everything() ~ px(100))
```

The table above also shows the likelihoods for the case
when an article does not contain exclamation point in
the title as well. It's really important to note that these are likelihoods,
and its not the case that $L(B|A) + L(B^c|A) = 1$ as a matter of fact this
value evaluates to a number less than one. However, since we have that
$L(B|A) = .267$ and $L(B^c|A) = .022$ then we have gained additional
knowledge in knowing the use of "!" in a title is more compatible
with a fake news article than a real one.

Up to this point we can summarize our framework as follows

| event      | $B$ | $B^c$ | Total |
|-------     |-----|-------|------|
| prior      | .4  |  .6   | 1    |
| likelihood |.267 | .022  | .289 |

Our next goal is come up with normalizing factors in order to build our
probability table:

|      | $B$| $B^c$| Total |
|------|----|------|-------|
|$A$   | (1)| (2)  |       |
|$A^c$ | (3)|  (4) |       |
|Total | .4 | .6   | 1     |

A couple things to note about our table (1) + (3) = .4 and (2) + (4) = .6.
(1) + (2) + (3) + (4) = 1.

(1.) $P(A \cap B) = P(A|B)P(B)$ we know the likelihood of $L(B|A) = P(A|B)$ and we also
know the prior so we insert these to get
$$ P(A \cap B) = P(A|B)P(B)  = .267 \times .4 = .1068$$

(3.) $P(A^c \cap B) = P(A^c|B)P(B)$ in this case we do know the prior $P(B) = .4$, but we
don't directly know the value of $P(A^c|B)$, however, we note that $P(A|B) + P(A^c|B) = 1$,
therefore we compute $P(A^c|B) = 1 - P(A|B) = 1 - .267 = .733$
$$ P(A^c \cap B) = P(A^c|B)P(B)  = .733 \times .4 = .2932$$

we now can confirm that $.1068 + .2932 = .4$

Moving on to (2), (4)

(2.) $P(A \cap B^c) = P(A|B^c)P(B^c)$. In this case know the likelihood $L(B^c|A) = P(A|B^c)$ and
we know the prior $P(B^c)$ therefore,
$$P(A \cap B^c) = P(A|B^c)P(B^c) = .022 \times .6 = .0132$$

(4.) $P(A^c \cap B^c) = P(A^c|B^c)P(B^c) = (1 - .022) \times .6 = .5868$

and can confirm that $.0132 + .5868 = .6$

and we can fill the rest of the table:

|      | $B$   | $B^c$  | Total |
|------|----   |------  |-------|
|$A$   | .1068 | .0132  | .12   |
|$A^c$ | .2932 |  .5868 |   .88 |
|Total | .4    | .6     | 1     |

An important concept we implemented in above is the idea of **total probability**

:::{.callout-tip}
## total probability

The **total probability** of observing a real article is made up the sum of its
parts. Namely

$$P(B^c) = P(A \cap B^c) + P(A^c \cap B^c)$$
$$=P(A|B^c)P(B^c) + P(A^c|B^c)P(B^c)$$
$$=.0132 + .5868 = .6$$
:::