adds more work based on chapter two
This commit is contained in:
75
R/ch2.qmd
75
R/ch2.qmd
@@ -1,11 +1,16 @@
|
||||
---
|
||||
title: "Chapter 2 Notes"
|
||||
author: "Emanuel Rodriguez"
|
||||
execute:
|
||||
message: false
|
||||
warning: false
|
||||
format:
|
||||
html:
|
||||
mainfont: arial
|
||||
monofont: "Cascadia Mono"
|
||||
highlight-style: ayu-dark
|
||||
highlight-style: gruvbox-dark
|
||||
css: styles.css
|
||||
callout-icon: false
|
||||
callout-apperance: simple
|
||||
---
|
||||
|
||||
In this chapter we step through an example
|
||||
@@ -18,6 +23,8 @@ of real vs fake of a new news article titled "The President has a secret!"
|
||||
# libraries
|
||||
library(bayesrules)
|
||||
library(dplyr)
|
||||
library(tidyr)
|
||||
library(gt)
|
||||
data(fake_news)
|
||||
fake_news <- tibble::as_tibble(fake_news)
|
||||
```
|
||||
@@ -45,4 +52,66 @@ This is the first "clue" or set of data that we have to build into our framework
|
||||
Namely, majority of articles are "real", therefore we could simply predict that
|
||||
the new article is "real". This updated sense or reality now becomes our priors.
|
||||
|
||||
Getting additional data, and updating our priors, based on additional data.
|
||||
Getting additional data, and updating our priors, based on additional data. The
|
||||
new observation we make is the use of exclamation marks "!". We note that the use
|
||||
of "!" is more frequent in news articles labeled as "fake". We will want to incorporate
|
||||
this into our framework to decide whether the new incoming should be labelled as
|
||||
real or fake.
|
||||
|
||||
### Likelihood
|
||||
|
||||
:::{.callout-note}
|
||||
## Probability and Likelihood
|
||||
|
||||
When the event $B$ is known, then we can evaluate the uncertainy of events
|
||||
$A$ and $A^c$ given $B$
|
||||
|
||||
$$P(A|B) \text{ vs } P(A^c|B)$$
|
||||
|
||||
If on the other hand, we know event $A$ then we can evaluate the relative
|
||||
compatability of data $A$ with $B$ and $B^c$ using likelihood functions
|
||||
|
||||
$$L(B|A) \text{ vs } L(B^c|A)$$
|
||||
$$=P(A|B) \text{ vs } P(A|B^c)$$
|
||||
|
||||
:::
|
||||
|
||||
So in our case, we don't know whether this new incoming article is real or not,
|
||||
but we do know that the title has an exclamation mark. This means we can
|
||||
evaluate how likely this article is real or not given that it contains
|
||||
an "!" in the title using likelihood functions. We can formualte this as:
|
||||
|
||||
$$L(B|A) \text{ vs } L(B^c|A)$$
|
||||
|
||||
And perform the computation in R as follows:
|
||||
|
||||
```{r}
|
||||
# if fake, what are the proprotions of ! vs no-!
|
||||
prop_of_excl_within_type <- fake_news |>
|
||||
group_by(type, title_has_excl) |>
|
||||
summarise(
|
||||
total = n()
|
||||
) |>
|
||||
ungroup() |>
|
||||
group_by(type) |>
|
||||
summarise(
|
||||
has_excl = title_has_excl,
|
||||
prop_within_type = total / sum(total)
|
||||
)
|
||||
```
|
||||
|
||||
```{r}
|
||||
prop_of_excl_within_type |>
|
||||
pivot_wider(names_from = "type", values_from = prop_within_type) |>
|
||||
gt() |>
|
||||
gt::cols_label(
|
||||
has_excl = "Contains Exclamtion",
|
||||
fake = "Fake",
|
||||
real = "Real") |>
|
||||
gt::fmt_number(columns=c("fake", "real"), decimals = 3) |>
|
||||
gt::cols_width(everything() ~ px(100))
|
||||
```
|
||||
|
||||
The table above also shows the likelihoods for the case
|
||||
when an article does not contain exclamation point in
|
||||
the title.
|
||||
Reference in New Issue
Block a user