48 lines
1.2 KiB
Plaintext
48 lines
1.2 KiB
Plaintext
---
|
|
title: "Chapter 2 Notes"
|
|
author: "Emanuel Rodriguez"
|
|
format:
|
|
html:
|
|
mainfont: arial
|
|
monofont: "Cascadia Mono"
|
|
highlight-style: ayu-dark
|
|
---
|
|
|
|
In this chapter we step through an example
|
|
of "fake" vs "real" news to build a framework to determine the probability
|
|
of real vs fake of a new news article titled "The President has a secret!"
|
|
|
|
```{r}
|
|
#| message: false
|
|
#| warning: false
|
|
# libraries
|
|
library(bayesrules)
|
|
library(dplyr)
|
|
data(fake_news)
|
|
fake_news <- tibble::as_tibble(fake_news)
|
|
```
|
|
|
|
What is the proportion of news articles that were labeled fake vs real.
|
|
|
|
```{r}
|
|
fake_news |> glimpse()
|
|
|
|
fake_news |>
|
|
group_by(type) |>
|
|
summarise(
|
|
total = n(),
|
|
prop = total / nrow(fake_news)
|
|
)
|
|
```
|
|
|
|
If we let $B$ be the event that a news article is "fake" news, and
|
|
$B^c$ be the event that a news article is "real", we can write the following:
|
|
|
|
$$P(B) = .4$$
|
|
$$P(B^c) = .6$$
|
|
|
|
This is the first "clue" or set of data that we have to build into our framework.
|
|
Namely, majority of articles are "real", therefore we could simply predict that
|
|
the new article is "real". This updated sense or reality now becomes our priors.
|
|
|
|
Getting additional data, and updating our priors, based on additional data. |