adds more work based on chapter two

This commit is contained in:
2022-09-04 23:51:35 -07:00
parent f9340fd7aa
commit cac5ac9243
6 changed files with 710 additions and 152 deletions

View File

@@ -1,11 +1,16 @@
---
title: "Chapter 2 Notes"
author: "Emanuel Rodriguez"
execute:
message: false
warning: false
format:
html:
mainfont: arial
monofont: "Cascadia Mono"
highlight-style: ayu-dark
highlight-style: gruvbox-dark
css: styles.css
callout-icon: false
callout-apperance: simple
---
In this chapter we step through an example
@@ -18,6 +23,8 @@ of real vs fake of a new news article titled "The President has a secret!"
# libraries
library(bayesrules)
library(dplyr)
library(tidyr)
library(gt)
data(fake_news)
fake_news <- tibble::as_tibble(fake_news)
```
@@ -45,4 +52,66 @@ This is the first "clue" or set of data that we have to build into our framework
Namely, majority of articles are "real", therefore we could simply predict that
the new article is "real". This updated sense or reality now becomes our priors.
Getting additional data, and updating our priors, based on additional data.
Getting additional data, and updating our priors, based on additional data. The
new observation we make is the use of exclamation marks "!". We note that the use
of "!" is more frequent in news articles labeled as "fake". We will want to incorporate
this into our framework to decide whether the new incoming should be labelled as
real or fake.
### Likelihood
:::{.callout-note}
## Probability and Likelihood
When the event $B$ is known, then we can evaluate the uncertainy of events
$A$ and $A^c$ given $B$
$$P(A|B) \text{ vs } P(A^c|B)$$
If on the other hand, we know event $A$ then we can evaluate the relative
compatability of data $A$ with $B$ and $B^c$ using likelihood functions
$$L(B|A) \text{ vs } L(B^c|A)$$
$$=P(A|B) \text{ vs } P(A|B^c)$$
:::
So in our case, we don't know whether this new incoming article is real or not,
but we do know that the title has an exclamation mark. This means we can
evaluate how likely this article is real or not given that it contains
an "!" in the title using likelihood functions. We can formualte this as:
$$L(B|A) \text{ vs } L(B^c|A)$$
And perform the computation in R as follows:
```{r}
# if fake, what are the proprotions of ! vs no-!
prop_of_excl_within_type <- fake_news |>
group_by(type, title_has_excl) |>
summarise(
total = n()
) |>
ungroup() |>
group_by(type) |>
summarise(
has_excl = title_has_excl,
prop_within_type = total / sum(total)
)
```
```{r}
prop_of_excl_within_type |>
pivot_wider(names_from = "type", values_from = prop_within_type) |>
gt() |>
gt::cols_label(
has_excl = "Contains Exclamtion",
fake = "Fake",
real = "Real") |>
gt::fmt_number(columns=c("fake", "real"), decimals = 3) |>
gt::cols_width(everything() ~ px(100))
```
The table above also shows the likelihoods for the case
when an article does not contain exclamation point in
the title.