adds more work
This commit is contained in:
parent
5c071fcbb2
commit
c01c507087
689
R/ch2.html
689
R/ch2.html
|
@ -138,8 +138,10 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
|
|||
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(dplyr)</span>
|
||||
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tidyr)</span>
|
||||
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(gt)</span>
|
||||
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="fu">data</span>(fake_news)</span>
|
||||
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>fake_news <span class="ot"><-</span> tibble<span class="sc">::</span><span class="fu">as_tibble</span>(fake_news)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tibble)</span>
|
||||
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(ggplot2)</span>
|
||||
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="fu">data</span>(fake_news)</span>
|
||||
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>fake_news <span class="ot"><-</span> tibble<span class="sc">::</span><span class="fu">as_tibble</span>(fake_news)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
</div>
|
||||
<p>What is the proportion of news articles that were labeled fake vs real.</p>
|
||||
<div class="cell">
|
||||
|
@ -243,12 +245,12 @@ Probability and Likelihood
|
|||
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> gt<span class="sc">::</span><span class="fu">cols_width</span>(<span class="fu">everything</span>() <span class="sc">~</span> <span class="fu">px</span>(<span class="dv">100</span>))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output-display">
|
||||
|
||||
<div id="jtimbozsld" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
|
||||
<div id="cgeetizxio" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
|
||||
<style>html {
|
||||
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_table {
|
||||
#cgeetizxio .gt_table {
|
||||
display: table;
|
||||
border-collapse: collapse;
|
||||
margin-left: auto;
|
||||
|
@ -273,7 +275,7 @@ Probability and Likelihood
|
|||
border-left-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_heading {
|
||||
#cgeetizxio .gt_heading {
|
||||
background-color: #FFFFFF;
|
||||
text-align: center;
|
||||
border-bottom-color: #FFFFFF;
|
||||
|
@ -285,7 +287,7 @@ Probability and Likelihood
|
|||
border-right-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_title {
|
||||
#cgeetizxio .gt_title {
|
||||
color: #333333;
|
||||
font-size: 125%;
|
||||
font-weight: initial;
|
||||
|
@ -297,7 +299,7 @@ Probability and Likelihood
|
|||
border-bottom-width: 0;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_subtitle {
|
||||
#cgeetizxio .gt_subtitle {
|
||||
color: #333333;
|
||||
font-size: 85%;
|
||||
font-weight: initial;
|
||||
|
@ -309,13 +311,13 @@ Probability and Likelihood
|
|||
border-top-width: 0;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_bottom_border {
|
||||
#cgeetizxio .gt_bottom_border {
|
||||
border-bottom-style: solid;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_col_headings {
|
||||
#cgeetizxio .gt_col_headings {
|
||||
border-top-style: solid;
|
||||
border-top-width: 2px;
|
||||
border-top-color: #D3D3D3;
|
||||
|
@ -330,7 +332,7 @@ Probability and Likelihood
|
|||
border-right-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_col_heading {
|
||||
#cgeetizxio .gt_col_heading {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
font-size: 100%;
|
||||
|
@ -350,7 +352,7 @@ Probability and Likelihood
|
|||
overflow-x: hidden;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_column_spanner_outer {
|
||||
#cgeetizxio .gt_column_spanner_outer {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
font-size: 100%;
|
||||
|
@ -362,15 +364,15 @@ Probability and Likelihood
|
|||
padding-right: 4px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_column_spanner_outer:first-child {
|
||||
#cgeetizxio .gt_column_spanner_outer:first-child {
|
||||
padding-left: 0;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_column_spanner_outer:last-child {
|
||||
#cgeetizxio .gt_column_spanner_outer:last-child {
|
||||
padding-right: 0;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_column_spanner {
|
||||
#cgeetizxio .gt_column_spanner {
|
||||
border-bottom-style: solid;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
|
@ -382,7 +384,7 @@ Probability and Likelihood
|
|||
width: 100%;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_group_heading {
|
||||
#cgeetizxio .gt_group_heading {
|
||||
padding-top: 8px;
|
||||
padding-bottom: 8px;
|
||||
padding-left: 5px;
|
||||
|
@ -407,7 +409,7 @@ Probability and Likelihood
|
|||
vertical-align: middle;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_empty_group_heading {
|
||||
#cgeetizxio .gt_empty_group_heading {
|
||||
padding: 0.5px;
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
|
@ -422,15 +424,15 @@ Probability and Likelihood
|
|||
vertical-align: middle;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_from_md > :first-child {
|
||||
#cgeetizxio .gt_from_md > :first-child {
|
||||
margin-top: 0;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_from_md > :last-child {
|
||||
#cgeetizxio .gt_from_md > :last-child {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_row {
|
||||
#cgeetizxio .gt_row {
|
||||
padding-top: 8px;
|
||||
padding-bottom: 8px;
|
||||
padding-left: 5px;
|
||||
|
@ -449,7 +451,7 @@ Probability and Likelihood
|
|||
overflow-x: hidden;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_stub {
|
||||
#cgeetizxio .gt_stub {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
font-size: 100%;
|
||||
|
@ -462,7 +464,7 @@ Probability and Likelihood
|
|||
padding-right: 5px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_stub_row_group {
|
||||
#cgeetizxio .gt_stub_row_group {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
font-size: 100%;
|
||||
|
@ -476,11 +478,11 @@ Probability and Likelihood
|
|||
vertical-align: top;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_row_group_first td {
|
||||
#cgeetizxio .gt_row_group_first td {
|
||||
border-top-width: 2px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_summary_row {
|
||||
#cgeetizxio .gt_summary_row {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
text-transform: inherit;
|
||||
|
@ -490,16 +492,16 @@ Probability and Likelihood
|
|||
padding-right: 5px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_first_summary_row {
|
||||
#cgeetizxio .gt_first_summary_row {
|
||||
border-top-style: solid;
|
||||
border-top-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_first_summary_row.thick {
|
||||
#cgeetizxio .gt_first_summary_row.thick {
|
||||
border-top-width: 2px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_last_summary_row {
|
||||
#cgeetizxio .gt_last_summary_row {
|
||||
padding-top: 8px;
|
||||
padding-bottom: 8px;
|
||||
padding-left: 5px;
|
||||
|
@ -509,7 +511,7 @@ Probability and Likelihood
|
|||
border-bottom-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_grand_summary_row {
|
||||
#cgeetizxio .gt_grand_summary_row {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
text-transform: inherit;
|
||||
|
@ -519,7 +521,7 @@ Probability and Likelihood
|
|||
padding-right: 5px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_first_grand_summary_row {
|
||||
#cgeetizxio .gt_first_grand_summary_row {
|
||||
padding-top: 8px;
|
||||
padding-bottom: 8px;
|
||||
padding-left: 5px;
|
||||
|
@ -529,11 +531,11 @@ Probability and Likelihood
|
|||
border-top-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_striped {
|
||||
#cgeetizxio .gt_striped {
|
||||
background-color: rgba(128, 128, 128, 0.05);
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_table_body {
|
||||
#cgeetizxio .gt_table_body {
|
||||
border-top-style: solid;
|
||||
border-top-width: 2px;
|
||||
border-top-color: #D3D3D3;
|
||||
|
@ -542,7 +544,7 @@ Probability and Likelihood
|
|||
border-bottom-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_footnotes {
|
||||
#cgeetizxio .gt_footnotes {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
border-bottom-style: none;
|
||||
|
@ -556,7 +558,7 @@ Probability and Likelihood
|
|||
border-right-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_footnote {
|
||||
#cgeetizxio .gt_footnote {
|
||||
margin: 0px;
|
||||
font-size: 90%;
|
||||
padding-left: 4px;
|
||||
|
@ -565,7 +567,7 @@ Probability and Likelihood
|
|||
padding-right: 5px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_sourcenotes {
|
||||
#cgeetizxio .gt_sourcenotes {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
border-bottom-style: none;
|
||||
|
@ -579,7 +581,7 @@ Probability and Likelihood
|
|||
border-right-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_sourcenote {
|
||||
#cgeetizxio .gt_sourcenote {
|
||||
font-size: 90%;
|
||||
padding-top: 4px;
|
||||
padding-bottom: 4px;
|
||||
|
@ -587,64 +589,64 @@ Probability and Likelihood
|
|||
padding-right: 5px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_left {
|
||||
#cgeetizxio .gt_left {
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_center {
|
||||
#cgeetizxio .gt_center {
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_right {
|
||||
#cgeetizxio .gt_right {
|
||||
text-align: right;
|
||||
font-variant-numeric: tabular-nums;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_font_normal {
|
||||
#cgeetizxio .gt_font_normal {
|
||||
font-weight: normal;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_font_bold {
|
||||
#cgeetizxio .gt_font_bold {
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_font_italic {
|
||||
#cgeetizxio .gt_font_italic {
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_super {
|
||||
#cgeetizxio .gt_super {
|
||||
font-size: 65%;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_footnote_marks {
|
||||
#cgeetizxio .gt_footnote_marks {
|
||||
font-style: italic;
|
||||
font-weight: normal;
|
||||
font-size: 75%;
|
||||
vertical-align: 0.4em;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_asterisk {
|
||||
#cgeetizxio .gt_asterisk {
|
||||
font-size: 100%;
|
||||
vertical-align: 0;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_indent_1 {
|
||||
#cgeetizxio .gt_indent_1 {
|
||||
text-indent: 5px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_indent_2 {
|
||||
#cgeetizxio .gt_indent_2 {
|
||||
text-indent: 10px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_indent_3 {
|
||||
#cgeetizxio .gt_indent_3 {
|
||||
text-indent: 15px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_indent_4 {
|
||||
#cgeetizxio .gt_indent_4 {
|
||||
text-indent: 20px;
|
||||
}
|
||||
|
||||
#jtimbozsld .gt_indent_5 {
|
||||
#cgeetizxio .gt_indent_5 {
|
||||
text-indent: 25px;
|
||||
}
|
||||
</style>
|
||||
|
@ -787,6 +789,597 @@ total probability
|
|||
<p><span class="math display">\[P(B^c) = P(A \cap B^c) + P(A^c \cap B^c)\]</span> <span class="math display">\[=P(A|B^c)P(B^c) + P(A^c|B^c)P(B^c)\]</span> <span class="math display">\[=.0132 + .5868 = .6\]</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<p>In the above calculations we also step through <strong>joint probabilities</strong></p>
|
||||
<div class="callout-note callout callout-style-default no-icon callout-captioned">
|
||||
<div class="callout-header d-flex align-content-center">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon no-icon"></i>
|
||||
</div>
|
||||
<div class="callout-caption-container flex-fill">
|
||||
Joint and conditional probability
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-body-container callout-body">
|
||||
<p><span class="math display">\[P(A \cap B) = P(A|B)P(B)\]</span></p>
|
||||
<p><span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> are said to be independent events, if and only if</p>
|
||||
<p><span class="math display">\[P(A \cap B) = P(A)P(B)\]</span></p>
|
||||
<p>from this we can also derive the definition of a conditional probability</p>
|
||||
<p><span class="math display">\[P(A|B) = \frac{P(A \cap B)}{P(B)}\]</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<p>At this point we are able to answer the question, “What is the probability, the new article is fake?”. Given that the new article has an exclamation point, we can zoom into the top row of the table of probabilitties. Within this row we have probabilities <span class="math inline">\(.1068/.12 = .833\)</span> for fake and <span class="math inline">\(.0132 / .12 = .11\)</span> for real.</p>
|
||||
<p>This is essentially Baye’s Rule. We developed a posterior probability for an event <span class="math inline">\(B\)</span> given some observation <span class="math inline">\(A\)</span>. We did so by combining the likelihood of event <span class="math inline">\(B\)</span> given some new data <span class="math inline">\(A\)</span> and the prior probability of event <span class="math inline">\(B\)</span>. More formally we have the following definition:</p>
|
||||
<div class="callout-note callout callout-style-default no-icon callout-captioned">
|
||||
<div class="callout-header d-flex align-content-center">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon no-icon"></i>
|
||||
</div>
|
||||
<div class="callout-caption-container flex-fill">
|
||||
Baye’s Rule
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-body-container callout-body">
|
||||
<p>The posterior probability of an event <span class="math inline">\(B\)</span> given a <span class="math inline">\(A\)</span> is:</p>
|
||||
<p><span class="math display">\[ P(B|A) = \frac{P(A \cap B)}{P(A)} = \frac{L(B|A)P(B)}{P(A)}\]</span></p>
|
||||
<p>where <span class="math inline">\(L\)</span> is the likelihood function <span class="math inline">\(L(B|A) = P(B|A)\)</span> and <span class="math inline">\(P(A)\)</span> is the total probability of <span class="math inline">\(A\)</span>.</p>
|
||||
<p>More generally,</p>
|
||||
<p><span class="math display">\[ \frac{likelihood \cdot prior}{normalizing \;\; constant}\]</span></p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="simualation" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="simualation">Simualation</h3>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>articles <span class="ot"><-</span> tibble<span class="sc">::</span><span class="fu">tibble</span>(<span class="at">type =</span> <span class="fu">c</span>(<span class="st">"real"</span>, <span class="st">"fake"</span>))</span>
|
||||
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>priors <span class="ot"><-</span> <span class="fu">c</span>(.<span class="dv">6</span>, .<span class="dv">4</span>)</span>
|
||||
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>articles_sim <span class="ot"><-</span> <span class="fu">sample_n</span>(articles, <span class="dv">10000</span>, <span class="at">replace =</span> <span class="cn">TRUE</span>, <span class="at">weight =</span> priors)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
</div>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>articles_sim <span class="sc">|></span></span>
|
||||
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">ggplot</span>(<span class="fu">aes</span>(<span class="at">x =</span> type)) <span class="sc">+</span> <span class="fu">geom_bar</span>()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="ch2_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid" width="672"></p>
|
||||
</div>
|
||||
</div>
|
||||
<p>and a summary table</p>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>articles_sim <span class="sc">|></span></span>
|
||||
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">group_by</span>(type) <span class="sc">|></span></span>
|
||||
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">summarise</span>(</span>
|
||||
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> <span class="at">total =</span> <span class="fu">n</span>(), </span>
|
||||
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> <span class="at">prop =</span> total <span class="sc">/</span> <span class="fu">nrow</span>(articles_sim)</span>
|
||||
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> ) <span class="sc">|></span></span>
|
||||
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">gt</span>()<span class="sc">|></span></span>
|
||||
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> gt<span class="sc">::</span><span class="fu">cols_width</span>(<span class="fu">everything</span>() <span class="sc">~</span> <span class="fu">px</span>(<span class="dv">100</span>))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output-display">
|
||||
|
||||
<div id="riybaxjrki" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
|
||||
<style>html {
|
||||
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_table {
|
||||
display: table;
|
||||
border-collapse: collapse;
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
color: #333333;
|
||||
font-size: 16px;
|
||||
font-weight: normal;
|
||||
font-style: normal;
|
||||
background-color: #FFFFFF;
|
||||
width: auto;
|
||||
border-top-style: solid;
|
||||
border-top-width: 2px;
|
||||
border-top-color: #A8A8A8;
|
||||
border-right-style: none;
|
||||
border-right-width: 2px;
|
||||
border-right-color: #D3D3D3;
|
||||
border-bottom-style: solid;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #A8A8A8;
|
||||
border-left-style: none;
|
||||
border-left-width: 2px;
|
||||
border-left-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_heading {
|
||||
background-color: #FFFFFF;
|
||||
text-align: center;
|
||||
border-bottom-color: #FFFFFF;
|
||||
border-left-style: none;
|
||||
border-left-width: 1px;
|
||||
border-left-color: #D3D3D3;
|
||||
border-right-style: none;
|
||||
border-right-width: 1px;
|
||||
border-right-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_title {
|
||||
color: #333333;
|
||||
font-size: 125%;
|
||||
font-weight: initial;
|
||||
padding-top: 4px;
|
||||
padding-bottom: 4px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
border-bottom-color: #FFFFFF;
|
||||
border-bottom-width: 0;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_subtitle {
|
||||
color: #333333;
|
||||
font-size: 85%;
|
||||
font-weight: initial;
|
||||
padding-top: 0;
|
||||
padding-bottom: 6px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
border-top-color: #FFFFFF;
|
||||
border-top-width: 0;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_bottom_border {
|
||||
border-bottom-style: solid;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_col_headings {
|
||||
border-top-style: solid;
|
||||
border-top-width: 2px;
|
||||
border-top-color: #D3D3D3;
|
||||
border-bottom-style: solid;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
border-left-style: none;
|
||||
border-left-width: 1px;
|
||||
border-left-color: #D3D3D3;
|
||||
border-right-style: none;
|
||||
border-right-width: 1px;
|
||||
border-right-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_col_heading {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
font-size: 100%;
|
||||
font-weight: normal;
|
||||
text-transform: inherit;
|
||||
border-left-style: none;
|
||||
border-left-width: 1px;
|
||||
border-left-color: #D3D3D3;
|
||||
border-right-style: none;
|
||||
border-right-width: 1px;
|
||||
border-right-color: #D3D3D3;
|
||||
vertical-align: bottom;
|
||||
padding-top: 5px;
|
||||
padding-bottom: 6px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
overflow-x: hidden;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_column_spanner_outer {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
font-size: 100%;
|
||||
font-weight: normal;
|
||||
text-transform: inherit;
|
||||
padding-top: 0;
|
||||
padding-bottom: 0;
|
||||
padding-left: 4px;
|
||||
padding-right: 4px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_column_spanner_outer:first-child {
|
||||
padding-left: 0;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_column_spanner_outer:last-child {
|
||||
padding-right: 0;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_column_spanner {
|
||||
border-bottom-style: solid;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
vertical-align: bottom;
|
||||
padding-top: 5px;
|
||||
padding-bottom: 5px;
|
||||
overflow-x: hidden;
|
||||
display: inline-block;
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_group_heading {
|
||||
padding-top: 8px;
|
||||
padding-bottom: 8px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
font-size: 100%;
|
||||
font-weight: initial;
|
||||
text-transform: inherit;
|
||||
border-top-style: solid;
|
||||
border-top-width: 2px;
|
||||
border-top-color: #D3D3D3;
|
||||
border-bottom-style: solid;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
border-left-style: none;
|
||||
border-left-width: 1px;
|
||||
border-left-color: #D3D3D3;
|
||||
border-right-style: none;
|
||||
border-right-width: 1px;
|
||||
border-right-color: #D3D3D3;
|
||||
vertical-align: middle;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_empty_group_heading {
|
||||
padding: 0.5px;
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
font-size: 100%;
|
||||
font-weight: initial;
|
||||
border-top-style: solid;
|
||||
border-top-width: 2px;
|
||||
border-top-color: #D3D3D3;
|
||||
border-bottom-style: solid;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
vertical-align: middle;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_from_md > :first-child {
|
||||
margin-top: 0;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_from_md > :last-child {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_row {
|
||||
padding-top: 8px;
|
||||
padding-bottom: 8px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
margin: 10px;
|
||||
border-top-style: solid;
|
||||
border-top-width: 1px;
|
||||
border-top-color: #D3D3D3;
|
||||
border-left-style: none;
|
||||
border-left-width: 1px;
|
||||
border-left-color: #D3D3D3;
|
||||
border-right-style: none;
|
||||
border-right-width: 1px;
|
||||
border-right-color: #D3D3D3;
|
||||
vertical-align: middle;
|
||||
overflow-x: hidden;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_stub {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
font-size: 100%;
|
||||
font-weight: initial;
|
||||
text-transform: inherit;
|
||||
border-right-style: solid;
|
||||
border-right-width: 2px;
|
||||
border-right-color: #D3D3D3;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_stub_row_group {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
font-size: 100%;
|
||||
font-weight: initial;
|
||||
text-transform: inherit;
|
||||
border-right-style: solid;
|
||||
border-right-width: 2px;
|
||||
border-right-color: #D3D3D3;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
vertical-align: top;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_row_group_first td {
|
||||
border-top-width: 2px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_summary_row {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
text-transform: inherit;
|
||||
padding-top: 8px;
|
||||
padding-bottom: 8px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_first_summary_row {
|
||||
border-top-style: solid;
|
||||
border-top-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_first_summary_row.thick {
|
||||
border-top-width: 2px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_last_summary_row {
|
||||
padding-top: 8px;
|
||||
padding-bottom: 8px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
border-bottom-style: solid;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_grand_summary_row {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
text-transform: inherit;
|
||||
padding-top: 8px;
|
||||
padding-bottom: 8px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_first_grand_summary_row {
|
||||
padding-top: 8px;
|
||||
padding-bottom: 8px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
border-top-style: double;
|
||||
border-top-width: 6px;
|
||||
border-top-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_striped {
|
||||
background-color: rgba(128, 128, 128, 0.05);
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_table_body {
|
||||
border-top-style: solid;
|
||||
border-top-width: 2px;
|
||||
border-top-color: #D3D3D3;
|
||||
border-bottom-style: solid;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_footnotes {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
border-bottom-style: none;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
border-left-style: none;
|
||||
border-left-width: 2px;
|
||||
border-left-color: #D3D3D3;
|
||||
border-right-style: none;
|
||||
border-right-width: 2px;
|
||||
border-right-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_footnote {
|
||||
margin: 0px;
|
||||
font-size: 90%;
|
||||
padding-left: 4px;
|
||||
padding-right: 4px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_sourcenotes {
|
||||
color: #333333;
|
||||
background-color: #FFFFFF;
|
||||
border-bottom-style: none;
|
||||
border-bottom-width: 2px;
|
||||
border-bottom-color: #D3D3D3;
|
||||
border-left-style: none;
|
||||
border-left-width: 2px;
|
||||
border-left-color: #D3D3D3;
|
||||
border-right-style: none;
|
||||
border-right-width: 2px;
|
||||
border-right-color: #D3D3D3;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_sourcenote {
|
||||
font-size: 90%;
|
||||
padding-top: 4px;
|
||||
padding-bottom: 4px;
|
||||
padding-left: 5px;
|
||||
padding-right: 5px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_left {
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_center {
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_right {
|
||||
text-align: right;
|
||||
font-variant-numeric: tabular-nums;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_font_normal {
|
||||
font-weight: normal;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_font_bold {
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_font_italic {
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_super {
|
||||
font-size: 65%;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_footnote_marks {
|
||||
font-style: italic;
|
||||
font-weight: normal;
|
||||
font-size: 75%;
|
||||
vertical-align: 0.4em;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_asterisk {
|
||||
font-size: 100%;
|
||||
vertical-align: 0;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_indent_1 {
|
||||
text-indent: 5px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_indent_2 {
|
||||
text-indent: 10px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_indent_3 {
|
||||
text-indent: 15px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_indent_4 {
|
||||
text-indent: 20px;
|
||||
}
|
||||
|
||||
#riybaxjrki .gt_indent_5 {
|
||||
text-indent: 25px;
|
||||
}
|
||||
</style>
|
||||
<table class="gt_table" style="table-layout: fixed;; width: 0px">
|
||||
<colgroup>
|
||||
<col style="width:100px;">
|
||||
<col style="width:100px;">
|
||||
<col style="width:100px;">
|
||||
</colgroup>
|
||||
|
||||
<thead class="gt_col_headings">
|
||||
<tr>
|
||||
<th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col">type</th>
|
||||
<th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col">total</th>
|
||||
<th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col">prop</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody class="gt_table_body">
|
||||
<tr><td class="gt_row gt_left">fake</td>
|
||||
<td class="gt_row gt_right">3941</td>
|
||||
<td class="gt_row gt_right">0.3941</td></tr>
|
||||
<tr><td class="gt_row gt_left">real</td>
|
||||
<td class="gt_row gt_right">6059</td>
|
||||
<td class="gt_row gt_right">0.6059</td></tr>
|
||||
</tbody>
|
||||
|
||||
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<p>the simulation of 10,000 articles shows us very nearly the same priors we had from the data. We can now add the exclamation usage into the data.</p>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>articles_sim <span class="ot"><-</span> articles_sim <span class="sc">|></span></span>
|
||||
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">model_data =</span> <span class="fu">case_when</span>(</span>
|
||||
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> type <span class="sc">==</span> <span class="st">"fake"</span> <span class="sc">~</span> .<span class="dv">267</span>, </span>
|
||||
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> type <span class="sc">==</span> <span class="st">"real"</span> <span class="sc">~</span> .<span class="dv">022</span></span>
|
||||
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> ))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
</div>
|
||||
<p>The plan here is to iterate through the 10,000 samples and use the <code>data_model</code> value to assign either, “yes” or “no” using the <code>sample</code> function.</p>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb12"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>data <span class="ot"><-</span> <span class="fu">c</span>(<span class="st">"yes"</span>, <span class="st">"no"</span>)</span>
|
||||
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>articles_sim <span class="ot"><-</span> articles_sim <span class="sc">|></span></span>
|
||||
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">id =</span> <span class="fu">row_number</span>()) <span class="sc">|></span></span>
|
||||
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">group_by</span>(id) <span class="sc">|></span></span>
|
||||
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">usage =</span> <span class="fu">sample</span>(data, <span class="dv">1</span>, <span class="at">prob =</span> <span class="fu">c</span>(model_data, <span class="dv">1</span> <span class="sc">-</span> model_data)))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
</div>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb13"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>articles_sim <span class="sc">|></span></span>
|
||||
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">group_by</span>(usage, type) <span class="sc">|></span></span>
|
||||
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">summarise</span>(</span>
|
||||
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a> <span class="at">total =</span> <span class="fu">n</span>()</span>
|
||||
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> ) <span class="sc">|></span></span>
|
||||
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">pivot_wider</span>(<span class="at">names_from =</span> type, <span class="at">values_from =</span> total)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-stdout">
|
||||
<pre><code># A tibble: 2 × 3
|
||||
# Groups: usage [2]
|
||||
usage fake real
|
||||
<chr> <int> <int>
|
||||
1 no 2936 5932
|
||||
2 yes 1005 127</code></pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb15"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>articles_sim <span class="sc">|></span></span>
|
||||
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">ggplot</span>(<span class="fu">aes</span>(<span class="at">x =</span> type, <span class="at">fill =</span> usage)) <span class="sc">+</span> </span>
|
||||
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_bar</span>() <span class="sc">+</span> </span>
|
||||
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">scale_fill_discrete</span>(<span class="at">type =</span> <span class="fu">c</span>(<span class="st">"gray8"</span>, <span class="st">"dodgerblue4"</span>))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="ch2_files/figure-html/unnamed-chunk-11-1.png" class="img-fluid" width="672"></p>
|
||||
</div>
|
||||
</div>
|
||||
<p>So far have compute both the priors and likelihoods, we can simply filter our data to reflect the incoming article and determine our posterior.</p>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb16"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>articles_sim <span class="sc">|></span></span>
|
||||
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">filter</span>(usage <span class="sc">==</span> <span class="st">"yes"</span>) <span class="sc">|></span></span>
|
||||
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">group_by</span>(type) <span class="sc">|></span></span>
|
||||
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">summarise</span>(</span>
|
||||
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> <span class="at">total =</span> <span class="fu">n</span>()</span>
|
||||
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a> ) <span class="sc">|></span></span>
|
||||
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">mutate</span>(</span>
|
||||
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> <span class="at">prop =</span> total <span class="sc">/</span> <span class="fu">sum</span>(total)</span>
|
||||
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a> )</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-stdout">
|
||||
<pre><code># A tibble: 2 × 3
|
||||
type total prop
|
||||
<chr> <int> <dbl>
|
||||
1 fake 1005 0.888
|
||||
2 real 127 0.112</code></pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-note callout callout-style-default no-icon callout-captioned">
|
||||
<div class="callout-header d-flex align-content-center">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon no-icon"></i>
|
||||
</div>
|
||||
<div class="callout-caption-container flex-fill">
|
||||
Discrete Probability Model
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-body-container callout-body">
|
||||
<p>Let <span class="math inline">\(Y\)</span> be a discrete random variable. The probability model for <span class="math inline">\(Y\)</span> is described by a <strong>probability mass function</strong> (pmf) defined as: <span class="math display">\[f(y) = P(Y = y)\]</span></p>
|
||||
<p>and has the following properties</p>
|
||||
<ol type="1">
|
||||
<li><span class="math inline">\(0 \leq f(y) \leq 1\;\; \forall y\)</span></li>
|
||||
<li><span class="math inline">\(\sum_{\forall y}f(y) = 1\)</span></li>
|
||||
</ol>
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-caution callout callout-style-default no-icon callout-captioned">
|
||||
<div class="callout-header d-flex align-content-center">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon no-icon"></i>
|
||||
</div>
|
||||
<div class="callout-caption-container flex-fill">
|
||||
in emanuel’s words
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-body-container callout-body">
|
||||
<p>what does this mean? well its very straightforward a pmf is a function that takes in a some value y and outputs the probability that the random variable <span class="math inline">\(Y\)</span> equals <span class="math inline">\(y\)</span>.</p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
</main>
|
||||
|
|
156
R/ch2.qmd
156
R/ch2.qmd
|
@ -25,6 +25,8 @@ library(bayesrules)
|
|||
library(dplyr)
|
||||
library(tidyr)
|
||||
library(gt)
|
||||
library(tibble)
|
||||
library(ggplot2)
|
||||
data(fake_news)
|
||||
fake_news <- tibble::as_tibble(fake_news)
|
||||
```
|
||||
|
@ -180,4 +182,158 @@ parts. Namely
|
|||
$$P(B^c) = P(A \cap B^c) + P(A^c \cap B^c)$$
|
||||
$$=P(A|B^c)P(B^c) + P(A^c|B^c)P(B^c)$$
|
||||
$$=.0132 + .5868 = .6$$
|
||||
:::
|
||||
|
||||
In the above calculations we also step through **joint probabilities**
|
||||
|
||||
:::{.callout-note}
|
||||
## Joint and conditional probability
|
||||
|
||||
$$P(A \cap B) = P(A|B)P(B)$$
|
||||
|
||||
$A$ and $B$ are said to be independent events, if and only if
|
||||
|
||||
$$P(A \cap B) = P(A)P(B)$$
|
||||
|
||||
from this we can also derive the definition of a conditional probability
|
||||
|
||||
$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$
|
||||
|
||||
:::
|
||||
|
||||
At this point we are able to answer the question, "What is the probability,
|
||||
the new article is fake?". Given that the new article has an exclamation
|
||||
point, we can zoom into the top row of the table of probabilitties. Within
|
||||
this row we have probabilities $.1068/.12 = .833$ for fake and $.0132 / .12 = .11$
|
||||
for real.
|
||||
|
||||
This is essentially Baye's Rule. We developed a posterior probability for an event
|
||||
$B$ given some observation $A$. We did so by combining the likelihood of event $B$
|
||||
given some new data $A$ and the prior probability of event $B$. More formally we
|
||||
have the following definition:
|
||||
|
||||
:::{.callout-note}
|
||||
## Baye's Rule
|
||||
|
||||
The posterior probability of an event $B$ given a $A$ is:
|
||||
|
||||
$$ P(B|A) = \frac{P(A \cap B)}{P(A)} = \frac{L(B|A)P(B)}{P(A)}$$
|
||||
|
||||
where $L$ is the likelihood function $L(B|A) = P(B|A)$ and $P(A)$ is the
|
||||
total probability of $A$.
|
||||
|
||||
More generally,
|
||||
|
||||
$$ \frac{likelihood \cdot prior}{normalizing \;\; constant}$$
|
||||
:::
|
||||
|
||||
### Simualation
|
||||
|
||||
|
||||
```{r}
|
||||
articles <- tibble::tibble(type = c("real", "fake"))
|
||||
|
||||
priors <- c(.6, .4)
|
||||
|
||||
articles_sim <- sample_n(articles, 10000, replace = TRUE, weight = priors)
|
||||
```
|
||||
|
||||
```{r}
|
||||
articles_sim |>
|
||||
ggplot(aes(x = type)) + geom_bar()
|
||||
```
|
||||
|
||||
and a summary table
|
||||
|
||||
```{r}
|
||||
articles_sim |>
|
||||
group_by(type) |>
|
||||
summarise(
|
||||
total = n(),
|
||||
prop = total / nrow(articles_sim)
|
||||
) |>
|
||||
gt()|>
|
||||
gt::cols_width(everything() ~ px(100))
|
||||
```
|
||||
|
||||
the simulation of 10,000 articles shows us very nearly
|
||||
the same priors we had from the data. We can now add
|
||||
the exclamation usage into the data.
|
||||
|
||||
```{r}
|
||||
|
||||
articles_sim <- articles_sim |>
|
||||
mutate(model_data = case_when(
|
||||
type == "fake" ~ .267,
|
||||
type == "real" ~ .022
|
||||
))
|
||||
```
|
||||
|
||||
|
||||
The plan here is to iterate through the 10,000 samples
|
||||
and use the `data_model` value to assign either, "yes" or
|
||||
"no" using the `sample` function.
|
||||
|
||||
```{r}
|
||||
data <- c("yes", "no")
|
||||
|
||||
articles_sim <- articles_sim |>
|
||||
mutate(id = row_number()) |>
|
||||
group_by(id) |>
|
||||
mutate(usage = sample(data, 1, prob = c(model_data, 1 - model_data)))
|
||||
```
|
||||
|
||||
|
||||
```{r}
|
||||
articles_sim |>
|
||||
group_by(usage, type) |>
|
||||
summarise(
|
||||
total = n()
|
||||
) |>
|
||||
pivot_wider(names_from = type, values_from = total)
|
||||
```
|
||||
|
||||
```{r}
|
||||
articles_sim |>
|
||||
ggplot(aes(x = type, fill = usage)) +
|
||||
geom_bar() +
|
||||
scale_fill_discrete(type = c("gray8", "dodgerblue4"))
|
||||
```
|
||||
|
||||
So far have compute both the priors and likelihoods, we can simply
|
||||
filter our data to reflect the incoming article and determine our
|
||||
posterior.
|
||||
|
||||
```{r}
|
||||
articles_sim |>
|
||||
filter(usage == "yes") |>
|
||||
group_by(type) |>
|
||||
summarise(
|
||||
total = n()
|
||||
) |>
|
||||
mutate(
|
||||
prop = total / sum(total)
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
:::{.callout-note}
|
||||
## Discrete Probability Model
|
||||
|
||||
Let $Y$ be a discrete random variable. The probability model for $Y$ is
|
||||
described by a **probability mass function** (pmf) defined as:
|
||||
$$f(y) = P(Y = y)$$
|
||||
|
||||
and has the following properties
|
||||
|
||||
1. $0 \leq f(y) \leq 1\;\; \forall y$
|
||||
2. $\sum_{\forall y}f(y) = 1$
|
||||
:::
|
||||
|
||||
|
||||
:::{.callout-caution}
|
||||
## in emanuel's words
|
||||
what does this mean? well its very straightforward a pmf is a function that takes
|
||||
in a some value y and outputs the probability that the random variable
|
||||
$Y$ equals $y$.
|
||||
:::
|
Binary file not shown.
After Width: | Height: | Size: 16 KiB |
Binary file not shown.
After Width: | Height: | Size: 12 KiB |
Loading…
Reference in New Issue