Spaces:
Running
Running
Merge pull request #93 from marimo-team/haleshot/refine
Browse files- probability/10_probability_mass_function.py +13 -19
- probability/11_expectation.py +5 -5
- probability/13_bernoulli_distribution.py +23 -29
- probability/14_binomial_distribution.py +5 -7
- probability/15_poisson_distribution.py +20 -16
- probability/16_continuous_distribution.py +39 -40
- probability/18_central_limit_theorem.py +57 -53
- probability/19_maximum_likelihood_estimation.py +2 -0
probability/10_probability_mass_function.py
CHANGED
@@ -10,7 +10,7 @@
|
|
10 |
|
11 |
import marimo
|
12 |
|
13 |
-
__generated_with = "0.
|
14 |
app = marimo.App(width="medium", app_title="Probability Mass Functions")
|
15 |
|
16 |
|
@@ -22,9 +22,9 @@ def _(mo):
|
|
22 |
|
23 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/pmf/), by Stanford professor Chris Piech._
|
24 |
|
25 |
-
|
26 |
|
27 |
-
|
28 |
"""
|
29 |
)
|
30 |
return
|
@@ -36,18 +36,12 @@ def _(mo):
|
|
36 |
r"""
|
37 |
## Properties of a PMF
|
38 |
|
39 |
-
For a function $p_X(x)$ to be a valid PMF
|
40 |
|
41 |
-
1. **Non-negativity**: $p_X(x) \geq 0$ for all $x$
|
42 |
-
2. **Unit total probability**: $\sum_x p_X(x) = 1$
|
43 |
|
44 |
-
|
45 |
-
|
46 |
-
For a variable (call it $X$) to be a proper random variable, it must be the case that if you summed up the values of $P(X=x)$ for all possible values $x$ that $X$ can take on, the result must be 1:
|
47 |
-
|
48 |
-
$$\sum_x P(X=x) = 1$$
|
49 |
-
|
50 |
-
This is because a random variable taking on a value is an event (for example $X=3$). Each of those events is mutually exclusive because a random variable will take on exactly one value. Those mutually exclusive cases define an entire sample space. Why? Because $X$ must take on some value.
|
51 |
"""
|
52 |
)
|
53 |
return
|
@@ -125,11 +119,11 @@ def _(np, plt):
|
|
125 |
def _(mo):
|
126 |
mo.md(
|
127 |
r"""
|
128 |
-
|
129 |
|
130 |
-
|
131 |
|
132 |
-
|
133 |
"""
|
134 |
)
|
135 |
return
|
@@ -220,7 +214,7 @@ def _(mo):
|
|
220 |
r"""
|
221 |
## Data to Histograms to Probability Mass Functions
|
222 |
|
223 |
-
|
224 |
"""
|
225 |
)
|
226 |
return
|
@@ -323,9 +317,9 @@ def _(collections, np, plt, sim_dice_sums):
|
|
323 |
def _(mo):
|
324 |
mo.md(
|
325 |
r"""
|
326 |
-
|
327 |
|
328 |
-
|
329 |
"""
|
330 |
)
|
331 |
return
|
|
|
10 |
|
11 |
import marimo
|
12 |
|
13 |
+
__generated_with = "0.12.6"
|
14 |
app = marimo.App(width="medium", app_title="Probability Mass Functions")
|
15 |
|
16 |
|
|
|
22 |
|
23 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/pmf/), by Stanford professor Chris Piech._
|
24 |
|
25 |
+
PMFs are really important in discrete probability. They tell us how likely each possible outcome is for a discrete random variable.
|
26 |
|
27 |
+
What's interesting about PMFs is that they can be represented in multiple ways - equations, graphs, or even empirical data. The core idea is simple: they map each possible value to its probability.
|
28 |
"""
|
29 |
)
|
30 |
return
|
|
|
36 |
r"""
|
37 |
## Properties of a PMF
|
38 |
|
39 |
+
For a function $p_X(x)$ to be a valid PMF:
|
40 |
|
41 |
+
1. **Non-negativity**: probability can't be negative, so $p_X(x) \geq 0$ for all $x$
|
42 |
+
2. **Unit total probability**: all probabilities sum to 1, i.e., $\sum_x p_X(x) = 1$
|
43 |
|
44 |
+
The second property makes intuitive sense - a random variable must take some value, and the sum of all possibilities should be 100%.
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
"""
|
46 |
)
|
47 |
return
|
|
|
119 |
def _(mo):
|
120 |
mo.md(
|
121 |
r"""
|
122 |
+
These graphs really show us how likely each value is when we roll the dice.
|
123 |
|
124 |
+
looking at the right graph, when we see "6" on the $x$-axis with probability $\frac{5}{36}$ on the $y$-axis, that's telling us there's a $\frac{5}{36}$ chance of rolling a sum of 6 with two dice. or more formally: $P(Y = 6) = \frac{5}{36}$.
|
125 |
|
126 |
+
Similarly, the value "2" has probability "$\frac{1}{36}$" - that's because there's only one way to get a sum of 2 (rolling 1 on both dice). and you'll notice there's no value for "1" since you can't get a sum of 1 with two dice - the minimum possible is 2.
|
127 |
"""
|
128 |
)
|
129 |
return
|
|
|
214 |
r"""
|
215 |
## Data to Histograms to Probability Mass Functions
|
216 |
|
217 |
+
Here's something I find interesting — one way to represent a likelihood function is just through raw data. instead of mathematical formulas, we can actually approximate a PMF by collecting data points. let's see this in action by simulating lots of dice rolls and building an empirical PMF:
|
218 |
"""
|
219 |
)
|
220 |
return
|
|
|
317 |
def _(mo):
|
318 |
mo.md(
|
319 |
r"""
|
320 |
+
When we normalize a histogram (divide each count by total sample size), we get a pretty good approximation of the true PMF. it's a simple yet powerful idea - count how many times each value appears, then divide by the total number of trials.
|
321 |
|
322 |
+
let's make this concrete. say we want to estimate $P(Y=3)$ - the probability of rolling a sum of 3 with two dice. we just count how many 3's show up in our simulated rolls and divide by the total number of rolls:
|
323 |
"""
|
324 |
)
|
325 |
return
|
probability/11_expectation.py
CHANGED
@@ -10,7 +10,7 @@
|
|
10 |
|
11 |
import marimo
|
12 |
|
13 |
-
__generated_with = "0.
|
14 |
app = marimo.App(width="medium", app_title="Expectation")
|
15 |
|
16 |
|
@@ -22,9 +22,9 @@ def _(mo):
|
|
22 |
|
23 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/expectation/), by Stanford professor Chris Piech._
|
24 |
|
25 |
-
|
26 |
|
27 |
-
|
28 |
"""
|
29 |
)
|
30 |
return
|
@@ -36,11 +36,11 @@ def _(mo):
|
|
36 |
r"""
|
37 |
## Definition of Expectation
|
38 |
|
39 |
-
|
40 |
|
41 |
$$E[X] = \sum_x x \cdot P(X=x)$$
|
42 |
|
43 |
-
|
44 |
"""
|
45 |
)
|
46 |
return
|
|
|
10 |
|
11 |
import marimo
|
12 |
|
13 |
+
__generated_with = "0.12.6"
|
14 |
app = marimo.App(width="medium", app_title="Expectation")
|
15 |
|
16 |
|
|
|
22 |
|
23 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/expectation/), by Stanford professor Chris Piech._
|
24 |
|
25 |
+
Expectations are fascinating — they represent the "center of mass" of a probability distribution. while they're often called "expected values" or "averages," they don't always match our intuition about what's "expected" to happen.
|
26 |
|
27 |
+
For me, the most interesting part about expectations is how they quantify what happens "on average" in the long run, even if that average isn't a possible outcome (like expecting 3.5 on a standard die roll).
|
28 |
"""
|
29 |
)
|
30 |
return
|
|
|
36 |
r"""
|
37 |
## Definition of Expectation
|
38 |
|
39 |
+
Expectation (written as $E[X]$) is basically the "average outcome" of a random variable, but with a twist - we weight each possible value by how likely it is to occur. I like to think of it as the "center of gravity" for probability.
|
40 |
|
41 |
$$E[X] = \sum_x x \cdot P(X=x)$$
|
42 |
|
43 |
+
People call this concept by different names - mean, weighted average, center of mass, or 1st moment if you're being fancy. They're all calculated the same way, though: multiply each value by its probability, then add everything up.
|
44 |
"""
|
45 |
)
|
46 |
return
|
probability/13_bernoulli_distribution.py
CHANGED
@@ -10,7 +10,7 @@
|
|
10 |
|
11 |
import marimo
|
12 |
|
13 |
-
__generated_with = "0.
|
14 |
app = marimo.App(width="medium", app_title="Bernoulli Distribution")
|
15 |
|
16 |
|
@@ -20,15 +20,15 @@ def _(mo):
|
|
20 |
r"""
|
21 |
# Bernoulli Distribution
|
22 |
|
23 |
-
|
24 |
|
25 |
## Parametric Random Variables
|
26 |
|
27 |
-
|
28 |
|
29 |
-
|
30 |
|
31 |
-
|
32 |
"""
|
33 |
)
|
34 |
return
|
@@ -40,18 +40,16 @@ def _(mo):
|
|
40 |
r"""
|
41 |
## Bernoulli Random Variables
|
42 |
|
43 |
-
A
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
|
|
|
|
|
|
48 |
|
49 |
-
|
50 |
-
- A random binary digit
|
51 |
-
- Whether a disk drive crashed
|
52 |
-
- Whether someone likes a Netflix movie
|
53 |
-
|
54 |
-
Here $p$ is the parameter, but different instances of Bernoulli random variables might have different values of $p$.
|
55 |
"""
|
56 |
)
|
57 |
return
|
@@ -167,9 +165,11 @@ def _(expected_value, p_slider, plt, probabilities, values, variance):
|
|
167 |
def _(mo):
|
168 |
mo.md(
|
169 |
r"""
|
170 |
-
##
|
|
|
|
|
171 |
|
172 |
-
|
173 |
|
174 |
\begin{align}
|
175 |
E[X] &= \sum_x x \cdot (X=x) && \text{Definition of expectation} \\
|
@@ -178,11 +178,7 @@ def _(mo):
|
|
178 |
&= p && \text{Remove the 0 term}
|
179 |
\end{align}
|
180 |
|
181 |
-
|
182 |
-
|
183 |
-
If $X$ is a Bernoulli with parameter $p$, $X \sim \text{Bern}(p)$:
|
184 |
-
|
185 |
-
To compute variance, first compute $E[X^2]$:
|
186 |
|
187 |
\begin{align}
|
188 |
E[X^2]
|
@@ -206,18 +202,16 @@ def _(mo):
|
|
206 |
def _(mo):
|
207 |
mo.md(
|
208 |
r"""
|
209 |
-
## Indicator Random
|
210 |
-
|
211 |
-
> **Definition**: An indicator variable is a Bernoulli random variable which takes on the value 1 if an **underlying event occurs**, and 0 _otherwise_.
|
212 |
|
213 |
-
Indicator
|
214 |
|
215 |
-
|
216 |
|
217 |
-
|
218 |
|
219 |
-
- $P(I=1)=P(A)$
|
220 |
-
- $E[I]=P(A)$
|
221 |
"""
|
222 |
)
|
223 |
return
|
|
|
10 |
|
11 |
import marimo
|
12 |
|
13 |
+
__generated_with = "0.12.6"
|
14 |
app = marimo.App(width="medium", app_title="Bernoulli Distribution")
|
15 |
|
16 |
|
|
|
20 |
r"""
|
21 |
# Bernoulli Distribution
|
22 |
|
23 |
+
> _Note:_ This notebook builds on concepts from ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/bernoulli/) by Chris Piech.
|
24 |
|
25 |
## Parametric Random Variables
|
26 |
|
27 |
+
Probability has a bunch of classic random variable patterns that show up over and over. Let's explore some of the most important parametric discrete distributions.
|
28 |
|
29 |
+
Bernoulli is honestly the simplest distribution you'll ever see, but it's ridiculously powerful in practice. What makes it fascinating to me is how it captures any yes/no scenario: success/failure, heads/tails, 1/0.
|
30 |
|
31 |
+
I think of these distributions as the atoms of probability — they're the fundamental building blocks that everything else is made from.
|
32 |
"""
|
33 |
)
|
34 |
return
|
|
|
40 |
r"""
|
41 |
## Bernoulli Random Variables
|
42 |
|
43 |
+
A Bernoulli random variable boils down to just two possible values: 1 (success) or 0 (failure). dead simple, but incredibly useful.
|
44 |
|
45 |
+
Some everyday examples where I see these:
|
46 |
|
47 |
+
- Coin flip (heads=1, tails=0)
|
48 |
+
- Whether that sketchy email is spam
|
49 |
+
- If someone actually clicks my ad
|
50 |
+
- Whether my code compiles first try (almost always 0 for me)
|
51 |
|
52 |
+
All you need (the classic expression) is a single parameter $p$ - the probability of success.
|
|
|
|
|
|
|
|
|
|
|
53 |
"""
|
54 |
)
|
55 |
return
|
|
|
165 |
def _(mo):
|
166 |
mo.md(
|
167 |
r"""
|
168 |
+
## Expectation and Variance of a Bernoulli
|
169 |
+
|
170 |
+
> _Note:_ The following derivations are included as reference material. The credit for these mathematical formulations belongs to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/bernoulli/) by Chris Piech.
|
171 |
|
172 |
+
Let's work through why $E[X] = p$ for a Bernoulli:
|
173 |
|
174 |
\begin{align}
|
175 |
E[X] &= \sum_x x \cdot (X=x) && \text{Definition of expectation} \\
|
|
|
178 |
&= p && \text{Remove the 0 term}
|
179 |
\end{align}
|
180 |
|
181 |
+
And for variance, we first need $E[X^2]$:
|
|
|
|
|
|
|
|
|
182 |
|
183 |
\begin{align}
|
184 |
E[X^2]
|
|
|
202 |
def _(mo):
|
203 |
mo.md(
|
204 |
r"""
|
205 |
+
## Indicator Random Variables
|
|
|
|
|
206 |
|
207 |
+
Indicator variables are a clever trick I like to use — they turn events into numbers. Instead of dealing with "did the event happen?" (yes/no), we get "1" if it happened and "0" if it didn't.
|
208 |
|
209 |
+
Formally: an indicator variable $I$ for event $A$ equals 1 when $A$ occurs and 0 otherwise. These are just bernoulli variables where $p = P(A)$. people often use notation like $I_A$ to name them.
|
210 |
|
211 |
+
Two key properties that make them super useful:
|
212 |
|
213 |
+
- $P(I=1)=P(A)$ - probability of getting a 1 is just the probability of the event
|
214 |
+
- $E[I]=P(A)$ - the expected value equals the probability (this one's a game-changer!)
|
215 |
"""
|
216 |
)
|
217 |
return
|
probability/14_binomial_distribution.py
CHANGED
@@ -13,7 +13,7 @@
|
|
13 |
|
14 |
import marimo
|
15 |
|
16 |
-
__generated_with = "0.
|
17 |
app = marimo.App(width="medium", app_title="Binomial Distribution")
|
18 |
|
19 |
|
@@ -25,11 +25,9 @@ def _(mo):
|
|
25 |
|
26 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/binomial/), by Stanford professor Chris Piech._
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
31 |
-
|
32 |
-
This situation is truly common in the natural world, and as such, there has been a lot of research into such phenomena. Random variables like $X$ are called **binomial random variables**. If you can identify that a process fits this description, you can inherit many already proved properties such as the PMF formula, expectation, and variance!
|
33 |
"""
|
34 |
)
|
35 |
return
|
@@ -197,11 +195,11 @@ def _(mo):
|
|
197 |
r"""
|
198 |
## Relationship to Bernoulli Random Variables
|
199 |
|
200 |
-
One way to think
|
201 |
|
202 |
$$X = \sum_{i=1}^n Y_i$$
|
203 |
|
204 |
-
|
205 |
"""
|
206 |
)
|
207 |
return
|
|
|
13 |
|
14 |
import marimo
|
15 |
|
16 |
+
__generated_with = "0.12.6"
|
17 |
app = marimo.App(width="medium", app_title="Binomial Distribution")
|
18 |
|
19 |
|
|
|
25 |
|
26 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/binomial/), by Stanford professor Chris Piech._
|
27 |
|
28 |
+
The binomial distribution is essentially what happens when you run multiple Bernoulli trials and count the successes. I love this distribution because it appears everywhere in practical scenarios.
|
29 |
|
30 |
+
Think about it: whenever you're counting how many times something happens across multiple attempts, you're likely dealing with a binomial. Website conversions, A/B testing results, even counting heads in multiple coin flips — all binomial!
|
|
|
|
|
31 |
"""
|
32 |
)
|
33 |
return
|
|
|
195 |
r"""
|
196 |
## Relationship to Bernoulli Random Variables
|
197 |
|
198 |
+
One way I like to think about the binomial: it's just adding up a bunch of Bernoullis. If each $Y_i$ is a Bernoulli that tells us if the $i$-th trial succeeded, then:
|
199 |
|
200 |
$$X = \sum_{i=1}^n Y_i$$
|
201 |
|
202 |
+
This makes the distribution really intuitive to me - we're just counting 1s across our $n$ experiments.
|
203 |
"""
|
204 |
)
|
205 |
return
|
probability/15_poisson_distribution.py
CHANGED
@@ -13,7 +13,7 @@
|
|
13 |
|
14 |
import marimo
|
15 |
|
16 |
-
__generated_with = "0.
|
17 |
app = marimo.App(width="medium", app_title="Poisson Distribution")
|
18 |
|
19 |
|
@@ -25,7 +25,9 @@ def _(mo):
|
|
25 |
|
26 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/poisson/), by Stanford professor Chris Piech._
|
27 |
|
28 |
-
|
|
|
|
|
29 |
"""
|
30 |
)
|
31 |
return
|
@@ -180,11 +182,11 @@ def _(mo):
|
|
180 |
r"""
|
181 |
## Poisson Intuition: Relation to Binomial Distribution
|
182 |
|
183 |
-
The Poisson distribution can be derived as a limiting case of the
|
184 |
|
185 |
-
Let's work
|
186 |
|
187 |
-
We could
|
188 |
|
189 |
Let's visualize this concept:
|
190 |
"""
|
@@ -231,7 +233,7 @@ def _(fig_to_image, mo, plt):
|
|
231 |
_explanation = mo.md(
|
232 |
r"""
|
233 |
In this visualization:
|
234 |
-
|
235 |
- Each rectangle represents a 1-second interval
|
236 |
- Blue rectangles indicate intervals where an event occurred
|
237 |
- Red dots show the actual event times (2.75s and 7.12s)
|
@@ -247,9 +249,9 @@ def _(fig_to_image, mo, plt):
|
|
247 |
def _(mo):
|
248 |
mo.md(
|
249 |
r"""
|
250 |
-
The total number of requests received over the minute can be approximated as the sum of
|
251 |
|
252 |
-
|
253 |
|
254 |
\begin{align}
|
255 |
\lambda &= E[X] && \text{Expectation matches historical average} \\
|
@@ -257,7 +259,7 @@ def _(mo):
|
|
257 |
p &= \frac{\lambda}{n} && \text{Solving for $p$}
|
258 |
\end{align}
|
259 |
|
260 |
-
|
261 |
|
262 |
$P(X = x) = {n \choose x} p^x (1-p)^{n-x}$
|
263 |
|
@@ -269,7 +271,7 @@ def _(mo):
|
|
269 |
P(X=3) &= {60 \choose 3} (5/60)^3 (55/60)^{60-3} \approx 0.1389
|
270 |
\end{align}
|
271 |
|
272 |
-
This
|
273 |
"""
|
274 |
)
|
275 |
return
|
@@ -283,7 +285,7 @@ def _(fig_to_image, mo, plt):
|
|
283 |
|
284 |
# Example events at 2.75s and 7.12s (convert to deciseconds)
|
285 |
events = [27.5, 71.2]
|
286 |
-
|
287 |
for i in range(100):
|
288 |
color = 'royalblue' if any(i <= event_val < i + 1 for event_val in events) else 'lightgray'
|
289 |
ax.add_patch(plt.Rectangle((i, 0), 0.9, 1, color=color))
|
@@ -434,21 +436,23 @@ def _(df, fig, fig_to_image, mo, n, p):
|
|
434 |
def _(mo):
|
435 |
mo.md(
|
436 |
r"""
|
437 |
-
As
|
438 |
|
439 |
- The number of trials $n$ approaches infinity
|
440 |
- The probability of success $p$ approaches zero
|
441 |
- The product $np = \lambda$ remains constant
|
442 |
|
443 |
-
This
|
444 |
|
445 |
## Derivation of the Poisson PMF
|
446 |
|
447 |
-
|
|
|
|
|
448 |
|
449 |
$P(X=x) = \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x}$
|
450 |
|
451 |
-
|
452 |
|
453 |
\begin{align}
|
454 |
P(X=x)
|
@@ -495,7 +499,7 @@ def _(mo):
|
|
495 |
&& \text{Simplifying}\\
|
496 |
\end{align}
|
497 |
|
498 |
-
This gives us
|
499 |
"""
|
500 |
)
|
501 |
return
|
|
|
13 |
|
14 |
import marimo
|
15 |
|
16 |
+
__generated_with = "0.12.6"
|
17 |
app = marimo.App(width="medium", app_title="Poisson Distribution")
|
18 |
|
19 |
|
|
|
25 |
|
26 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/poisson/), by Stanford professor Chris Piech._
|
27 |
|
28 |
+
The Poisson distribution is my go-to for modeling random events occurring over time or space. What makes it cool is that it only needs a single parameter λ (lambda), which represents both the mean and variance.
|
29 |
+
|
30 |
+
I find it particularly useful when events happen rarely but the opportunities for them to occur are numerous — like modeling website visits, dust/particle emissions or even typos in a document.
|
31 |
"""
|
32 |
)
|
33 |
return
|
|
|
182 |
r"""
|
183 |
## Poisson Intuition: Relation to Binomial Distribution
|
184 |
|
185 |
+
The Poisson distribution can be derived as a limiting case of the binomial distribution. I find this connection fascinating because it shows how seemingly different distributions are actually related.
|
186 |
|
187 |
+
Let's work through a practical example: predicting ride-sharing requests in a specific area over a one-minute interval. From historical data, we know that the average number of requests per minute is $\lambda = 5$.
|
188 |
|
189 |
+
We could model this using a binomial distribution by dividing our minute into smaller intervals. For example, splitting a minute into 60 seconds, where each second is a Bernoulli trial — either a request arrives (success) or it doesn't (failure).
|
190 |
|
191 |
Let's visualize this concept:
|
192 |
"""
|
|
|
233 |
_explanation = mo.md(
|
234 |
r"""
|
235 |
In this visualization:
|
236 |
+
|
237 |
- Each rectangle represents a 1-second interval
|
238 |
- Blue rectangles indicate intervals where an event occurred
|
239 |
- Red dots show the actual event times (2.75s and 7.12s)
|
|
|
249 |
def _(mo):
|
250 |
mo.md(
|
251 |
r"""
|
252 |
+
The total number of requests received over the minute can be approximated as the sum of sixty indicator variables, which aligns perfectly with the binomial distribution — a sum of Bernoullis.
|
253 |
|
254 |
+
If we define $X$ as the number of requests in a minute, $X$ follows a binomial with $n=60$ trials. To determine the success probability $p$, we need to match the expected value with our historical average $\lambda$:
|
255 |
|
256 |
\begin{align}
|
257 |
\lambda &= E[X] && \text{Expectation matches historical average} \\
|
|
|
259 |
p &= \frac{\lambda}{n} && \text{Solving for $p$}
|
260 |
\end{align}
|
261 |
|
262 |
+
With $\lambda=5$ and $n=60$, we get $p=\frac{5}{60}=\frac{1}{12}$, so $X \sim \text{Bin}(n=60, p=\frac{5}{60})$. Using the binomial PMF:
|
263 |
|
264 |
$P(X = x) = {n \choose x} p^x (1-p)^{n-x}$
|
265 |
|
|
|
271 |
P(X=3) &= {60 \choose 3} (5/60)^3 (55/60)^{60-3} \approx 0.1389
|
272 |
\end{align}
|
273 |
|
274 |
+
This approximation works well, but it doesn't account for multiple events occurring in a single second. To address this limitation, we can use even finer intervals — perhaps 600 deciseconds (tenths of a second):
|
275 |
"""
|
276 |
)
|
277 |
return
|
|
|
285 |
|
286 |
# Example events at 2.75s and 7.12s (convert to deciseconds)
|
287 |
events = [27.5, 71.2]
|
288 |
+
|
289 |
for i in range(100):
|
290 |
color = 'royalblue' if any(i <= event_val < i + 1 for event_val in events) else 'lightgray'
|
291 |
ax.add_patch(plt.Rectangle((i, 0), 0.9, 1, color=color))
|
|
|
436 |
def _(mo):
|
437 |
mo.md(
|
438 |
r"""
|
439 |
+
As our interactive comparison demonstrates, the binomial distribution converges to the Poisson distribution as we increase the number of intervals! This remarkable relationship exists because the Poisson distribution is actually the limiting case of the binomial when:
|
440 |
|
441 |
- The number of trials $n$ approaches infinity
|
442 |
- The probability of success $p$ approaches zero
|
443 |
- The product $np = \lambda$ remains constant
|
444 |
|
445 |
+
This elegance is why I find the Poisson distribution so powerful — it simplifies what would otherwise be a cumbersome binomial with numerous trials and tiny success probabilities.
|
446 |
|
447 |
## Derivation of the Poisson PMF
|
448 |
|
449 |
+
> _Note:_ The following mathematical derivation is included as reference material. The credit for this formulation belongs to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/poisson/) by Chris Piech.
|
450 |
+
|
451 |
+
The Poisson PMF can be derived by taking the limit of the binomial PMF as $n \to \infty$:
|
452 |
|
453 |
$P(X=x) = \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x}$
|
454 |
|
455 |
+
Through a series of algebraic manipulations:
|
456 |
|
457 |
\begin{align}
|
458 |
P(X=x)
|
|
|
499 |
&& \text{Simplifying}\\
|
500 |
\end{align}
|
501 |
|
502 |
+
This gives us the elegant Poisson PMF formula: $P(X=x) = \frac{\lambda^x \cdot e^{-\lambda}}{x!}$
|
503 |
"""
|
504 |
)
|
505 |
return
|
probability/16_continuous_distribution.py
CHANGED
@@ -14,7 +14,7 @@
|
|
14 |
|
15 |
import marimo
|
16 |
|
17 |
-
__generated_with = "0.
|
18 |
app = marimo.App(width="medium")
|
19 |
|
20 |
|
@@ -26,7 +26,9 @@ def _(mo):
|
|
26 |
|
27 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/continuous/), by Stanford professor Chris Piech._
|
28 |
|
29 |
-
|
|
|
|
|
30 |
"""
|
31 |
)
|
32 |
return
|
@@ -38,20 +40,17 @@ def _(mo):
|
|
38 |
r"""
|
39 |
## From Discrete to Continuous
|
40 |
|
41 |
-
|
42 |
-
|
43 |
-
> Imagine you're running to catch a bus. You know you'll arrive at 2:15pm, but you don't know exactly when the bus will arrive. You want to model the bus arrival time (in minutes past 2pm) as a random variable $T$ so you can calculate the probability that you'll wait more than five minutes: $P(15 < T < 20)$.
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
48 |
|
|
|
49 |
- What's the probability the bus arrives at exactly 2:17pm and 12.12333911102389234 seconds?
|
50 |
-
- What's the probability
|
51 |
|
52 |
-
These questions
|
53 |
-
|
54 |
-
### Visualizing the Transition
|
55 |
|
56 |
Let's visualize this transition from discrete to continuous:
|
57 |
"""
|
@@ -150,44 +149,43 @@ def _(mo):
|
|
150 |
r"""
|
151 |
## Probability Density Functions
|
152 |
|
153 |
-
|
154 |
|
155 |
-
|
156 |
|
157 |
$$f(X=x) \quad \text{or simply} \quad f(x)$$
|
158 |
|
159 |
-
Where the lowercase $x$
|
160 |
|
161 |
### Key Properties of PDFs
|
162 |
|
163 |
-
|
164 |
|
165 |
-
1. The probability that $X$
|
166 |
|
167 |
$$P(a \leq X \leq b) = \int_a^b f(x) \, dx$$
|
168 |
|
169 |
-
2.
|
170 |
|
171 |
$$f(x) \geq 0 \text{ for all } x$$
|
172 |
|
173 |
-
3.
|
174 |
|
175 |
$$\int_{-\infty}^{\infty} f(x) \, dx = 1$$
|
176 |
|
177 |
-
4. The probability
|
178 |
|
179 |
$$P(X = a) = \int_a^a f(x) \, dx = 0$$
|
180 |
|
181 |
-
This last property
|
182 |
-
|
183 |
-
### Caution: Density ≠ Probability
|
184 |
|
185 |
-
|
186 |
|
187 |
-
|
188 |
|
189 |
-
|
190 |
-
|
|
|
191 |
"""
|
192 |
)
|
193 |
return
|
@@ -665,16 +663,18 @@ def _(fig_to_image, mo, np, plt, sympy):
|
|
665 |
# Detailed calculations for our example
|
666 |
_calculations = mo.md(
|
667 |
f"""
|
668 |
-
###
|
669 |
|
670 |
-
|
|
|
|
|
671 |
|
672 |
$$f(x) = \\begin{{cases}}
|
673 |
\\frac{{3}}{{8}}(4x - 2x^2) & \\text{{when }} 0 < x < 2 \\\\
|
674 |
0 & \\text{{otherwise}}
|
675 |
\\end{{cases}}$$
|
676 |
|
677 |
-
#### Expectation
|
678 |
|
679 |
$$E[X] = \\int_{{-\\infty}}^{{\\infty}} x \\cdot f(x) \\, dx = \\int_0^2 x \\cdot \\frac{{3}}{{8}}(4x - 2x^2) \\, dx$$
|
680 |
|
@@ -684,9 +684,9 @@ def _(fig_to_image, mo, np, plt, sympy):
|
|
684 |
|
685 |
$$E[X] = \\frac{{3}}{{8}} \\cdot \\frac{{32 - 12}}{{3}} = \\frac{{3}}{{8}} \\cdot \\frac{{20}}{{3}} = \\frac{{20}}{{8}} = {E_X}$$
|
686 |
|
687 |
-
#### Variance
|
688 |
|
689 |
-
|
690 |
|
691 |
$$E[X^2] = \\int_{{-\\infty}}^{{\\infty}} x^2 \\cdot f(x) \\, dx = \\int_0^2 x^2 \\cdot \\frac{{3}}{{8}}(4x - 2x^2) \\, dx$$
|
692 |
|
@@ -696,11 +696,11 @@ def _(fig_to_image, mo, np, plt, sympy):
|
|
696 |
|
697 |
$$E[X^2] = \\frac{{3}}{{8}} \\cdot \\frac{{20 - 64/5}}{{1}} = {E_X2}$$
|
698 |
|
699 |
-
Now we
|
700 |
|
701 |
$$\\text{{Var}}(X) = E[X^2] - (E[X])^2 = {E_X2} - ({E_X})^2 = {Var_X}$$
|
702 |
|
703 |
-
|
704 |
"""
|
705 |
)
|
706 |
mo.vstack([_img, _calculations])
|
@@ -765,11 +765,11 @@ def _(mo):
|
|
765 |
|
766 |
Some key points to remember:
|
767 |
|
768 |
-
|
769 |
-
|
770 |
-
|
771 |
-
|
772 |
-
|
773 |
|
774 |
This foundation will serve you well as we explore specific continuous distributions like normal, exponential, and beta in future notebooks. These distributions are the workhorses of probability theory and statistics, appearing everywhere from quality control to financial modeling.
|
775 |
|
@@ -779,7 +779,7 @@ def _(mo):
|
|
779 |
return
|
780 |
|
781 |
|
782 |
-
@app.cell
|
783 |
def _(mo):
|
784 |
mo.md(r"""Appendix code (helper functions, variables, etc.):""")
|
785 |
return
|
@@ -971,7 +971,6 @@ def _(np, plt, sympy):
|
|
971 |
1. Total probability: ∫₀² {C}(4x - 2x²) dx = {total_prob}
|
972 |
2. P(X > 1): ∫₁² {C}(4x - 2x²) dx = {prob_gt_1}
|
973 |
"""
|
974 |
-
|
975 |
return create_example_pdf_visualization, symbolic_calculation
|
976 |
|
977 |
|
|
|
14 |
|
15 |
import marimo
|
16 |
|
17 |
+
__generated_with = "0.12.6"
|
18 |
app = marimo.App(width="medium")
|
19 |
|
20 |
|
|
|
26 |
|
27 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/continuous/), by Stanford professor Chris Piech._
|
28 |
|
29 |
+
Continuous distributions are what we need when dealing with random variables that can take any value in a range, rather than just discrete values.
|
30 |
+
|
31 |
+
The key difference here is that we work with probability density functions (PDFs) instead of probability mass functions (PMFs). It took me a while to really get this - the PDF at a point isn't actually a probability, but rather a density.
|
32 |
"""
|
33 |
)
|
34 |
return
|
|
|
40 |
r"""
|
41 |
## From Discrete to Continuous
|
42 |
|
43 |
+
Making the jump from discrete to continuous random variables requires a fundamental shift in thinking. Let me walk you through a thought experiment:
|
|
|
|
|
44 |
|
45 |
+
> You're rushing to catch a bus. You know you'll arrive at 2:15pm, but the bus arrival time is uncertain. If you model the bus arrival time (in minutes past 2pm) as a random variable $T$, how would you calculate the probability of waiting more than five minutes: $P(15 < T < 20)$?
|
46 |
|
47 |
+
This highlights a crucial difference from discrete distributions. With discrete distributions, we calculated probabilities for exact values, but this approach breaks down with continuous values like time.
|
48 |
|
49 |
+
Consider these questions:
|
50 |
- What's the probability the bus arrives at exactly 2:17pm and 12.12333911102389234 seconds?
|
51 |
+
- What's the probability a newborn weighs exactly 3.523112342234 kilograms?
|
52 |
|
53 |
+
These questions have no meaningful answers because continuous measurements can have infinite precision. In the continuous world, the probability of a random variable taking any specific exact value is actually zero!
|
|
|
|
|
54 |
|
55 |
Let's visualize this transition from discrete to continuous:
|
56 |
"""
|
|
|
149 |
r"""
|
150 |
## Probability Density Functions
|
151 |
|
152 |
+
While discrete random variables use Probability Mass Functions (PMFs), continuous random variables require a different approach — Probability Density Functions (PDFs).
|
153 |
|
154 |
+
A PDF defines the relative likelihood of a continuous random variable taking particular values. We typically denote this with $f$ and write it as:
|
155 |
|
156 |
$$f(X=x) \quad \text{or simply} \quad f(x)$$
|
157 |
|
158 |
+
Where the lowercase $x$ represents a specific value our random variable $X$ might take.
|
159 |
|
160 |
### Key Properties of PDFs
|
161 |
|
162 |
+
For a PDF $f(x)$ to be valid, it must satisfy these properties:
|
163 |
|
164 |
+
1. The probability that $X$ falls within interval $[a, b]$ is:
|
165 |
|
166 |
$$P(a \leq X \leq b) = \int_a^b f(x) \, dx$$
|
167 |
|
168 |
+
2. Non-negativity — the PDF can't be negative:
|
169 |
|
170 |
$$f(x) \geq 0 \text{ for all } x$$
|
171 |
|
172 |
+
3. Total probability equals 1:
|
173 |
|
174 |
$$\int_{-\infty}^{\infty} f(x) \, dx = 1$$
|
175 |
|
176 |
+
4. The probability of any exact value is zero:
|
177 |
|
178 |
$$P(X = a) = \int_a^a f(x) \, dx = 0$$
|
179 |
|
180 |
+
This last property reveals a fundamental difference from discrete distributions — with continuous random variables, probabilities only make sense for ranges, not specific points.
|
|
|
|
|
181 |
|
182 |
+
### Important Distinction: Density ≠ Probability
|
183 |
|
184 |
+
One common mistake is interpreting $f(x)$ as a probability. It's actually a **density** — representing probability per unit of $x$. This is why $f(x)$ values can exceed 1, provided the total area under the curve equals 1.
|
185 |
|
186 |
+
The true meaning of $f(x)$ emerges only when:
|
187 |
+
1. We integrate over a range to obtain an actual probability, or
|
188 |
+
2. We compare densities at different points to understand relative likelihoods.
|
189 |
"""
|
190 |
)
|
191 |
return
|
|
|
663 |
# Detailed calculations for our example
|
664 |
_calculations = mo.md(
|
665 |
f"""
|
666 |
+
### Computing Expectation and Variance
|
667 |
|
668 |
+
> _Note:_ The following mathematical derivation is included as reference material. The credit for this approach belongs to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/continuous/) by Chris Piech.
|
669 |
+
|
670 |
+
Let's work through the calculations for our PDF:
|
671 |
|
672 |
$$f(x) = \\begin{{cases}}
|
673 |
\\frac{{3}}{{8}}(4x - 2x^2) & \\text{{when }} 0 < x < 2 \\\\
|
674 |
0 & \\text{{otherwise}}
|
675 |
\\end{{cases}}$$
|
676 |
|
677 |
+
#### Finding the Expectation
|
678 |
|
679 |
$$E[X] = \\int_{{-\\infty}}^{{\\infty}} x \\cdot f(x) \\, dx = \\int_0^2 x \\cdot \\frac{{3}}{{8}}(4x - 2x^2) \\, dx$$
|
680 |
|
|
|
684 |
|
685 |
$$E[X] = \\frac{{3}}{{8}} \\cdot \\frac{{32 - 12}}{{3}} = \\frac{{3}}{{8}} \\cdot \\frac{{20}}{{3}} = \\frac{{20}}{{8}} = {E_X}$$
|
686 |
|
687 |
+
#### Computing the Variance
|
688 |
|
689 |
+
We first need $E[X^2]$:
|
690 |
|
691 |
$$E[X^2] = \\int_{{-\\infty}}^{{\\infty}} x^2 \\cdot f(x) \\, dx = \\int_0^2 x^2 \\cdot \\frac{{3}}{{8}}(4x - 2x^2) \\, dx$$
|
692 |
|
|
|
696 |
|
697 |
$$E[X^2] = \\frac{{3}}{{8}} \\cdot \\frac{{20 - 64/5}}{{1}} = {E_X2}$$
|
698 |
|
699 |
+
Now we calculate variance using the formula $Var(X) = E[X^2] - (E[X])^2$:
|
700 |
|
701 |
$$\\text{{Var}}(X) = E[X^2] - (E[X])^2 = {E_X2} - ({E_X})^2 = {Var_X}$$
|
702 |
|
703 |
+
This gives us a standard deviation of $\\sqrt{{\\text{{Var}}(X)}} = {Std_X}$.
|
704 |
"""
|
705 |
)
|
706 |
mo.vstack([_img, _calculations])
|
|
|
765 |
|
766 |
Some key points to remember:
|
767 |
|
768 |
+
- PDFs give us relative likelihood, not actual probabilities - that's why they can exceed 1
|
769 |
+
- The probability between two points is the area under the PDF curve
|
770 |
+
- CDFs offer a convenient shortcut to find probabilities without integrating
|
771 |
+
- Expectation and variance work similarly to discrete variables, just with integrals instead of sums
|
772 |
+
- Constants in PDFs are determined by ensuring the total probability equals 1
|
773 |
|
774 |
This foundation will serve you well as we explore specific continuous distributions like normal, exponential, and beta in future notebooks. These distributions are the workhorses of probability theory and statistics, appearing everywhere from quality control to financial modeling.
|
775 |
|
|
|
779 |
return
|
780 |
|
781 |
|
782 |
+
@app.cell(hide_code=True)
|
783 |
def _(mo):
|
784 |
mo.md(r"""Appendix code (helper functions, variables, etc.):""")
|
785 |
return
|
|
|
971 |
1. Total probability: ∫₀² {C}(4x - 2x²) dx = {total_prob}
|
972 |
2. P(X > 1): ∫₁² {C}(4x - 2x²) dx = {prob_gt_1}
|
973 |
"""
|
|
|
974 |
return create_example_pdf_visualization, symbolic_calculation
|
975 |
|
976 |
|
probability/18_central_limit_theorem.py
CHANGED
@@ -6,12 +6,13 @@
|
|
6 |
# "scipy==1.15.2",
|
7 |
# "numpy==2.2.4",
|
8 |
# "plotly==5.18.0",
|
|
|
9 |
# ]
|
10 |
# ///
|
11 |
|
12 |
import marimo
|
13 |
|
14 |
-
__generated_with = "0.
|
15 |
app = marimo.App(width="medium", app_title="Central Limit Theorem")
|
16 |
|
17 |
|
@@ -23,7 +24,20 @@ def _(mo):
|
|
23 |
|
24 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part4/clt/), by Stanford professor Chris Piech._
|
25 |
|
26 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
"""
|
28 |
)
|
29 |
return
|
@@ -41,7 +55,7 @@ def _(mo):
|
|
41 |
|
42 |
Let $X_1, X_2, \dots, X_n$ be independent and identically distributed random variables. The sum of these random variables approaches a normal distribution as $n \rightarrow \infty$:
|
43 |
|
44 |
-
|
45 |
|
46 |
Where $\mu = E[X_i]$ and $\sigma^2 = \text{Var}(X_i)$. Since each $X_i$ is identically distributed, they share the same expectation and variance.
|
47 |
|
@@ -49,7 +63,7 @@ def _(mo):
|
|
49 |
|
50 |
Let $X_1, X_2, \dots, X_n$ be independent and identically distributed random variables. The average of these random variables approaches a normal distribution as $n \rightarrow \infty$:
|
51 |
|
52 |
-
$\frac{1}{n}
|
53 |
|
54 |
Where $\mu = E[X_i]$ and $\sigma^2 = \text{Var}(X_i)$.
|
55 |
|
@@ -311,41 +325,37 @@ def _(mo):
|
|
311 |
r"""
|
312 |
### Example 1: Dice Game
|
313 |
|
314 |
-
|
315 |
|
316 |
-
|
317 |
|
|
|
318 |
- $E[X_i] = 3.5$
|
319 |
- $\text{Var}(X_i) = \frac{35}{12}$
|
320 |
|
321 |
-
**Solution:**
|
322 |
-
|
323 |
-
Let $Y$ be the approximating normal distribution. By the Central Limit Theorem:
|
324 |
-
|
325 |
-
$Y∼N(10⋅E[Xi],10⋅Var(Xi))Y \sim \mathcal{N}(10 \cdot E[X_i], 10 \cdot \text{Var}(X_i))$
|
326 |
|
327 |
-
|
328 |
|
329 |
-
$Y
|
330 |
|
331 |
-
Now
|
332 |
|
333 |
-
$P(X
|
334 |
|
335 |
-
|
336 |
|
337 |
-
|
338 |
|
339 |
-
|
340 |
|
341 |
-
|
342 |
|
343 |
-
|
344 |
|
345 |
-
|
346 |
|
347 |
-
|
348 |
-
So, the probability of winning the game is approximately 7.8%.
|
349 |
"""
|
350 |
)
|
351 |
return
|
@@ -359,17 +369,17 @@ def _(create_dice_game_visualization, fig_to_image, mo):
|
|
359 |
|
360 |
dice_explanation = mo.md(
|
361 |
r"""
|
362 |
-
**Visualization
|
363 |
|
364 |
-
|
365 |
|
366 |
-
|
367 |
- The left region where $X \leq 25$
|
368 |
- The right region where $X \geq 45$
|
369 |
|
370 |
-
|
371 |
|
372 |
-
|
373 |
"""
|
374 |
)
|
375 |
|
@@ -383,50 +393,45 @@ def _(mo):
|
|
383 |
r"""
|
384 |
### Example 2: Algorithm Runtime Estimation
|
385 |
|
386 |
-
|
|
|
|
|
387 |
|
388 |
-
|
389 |
|
390 |
-
Let $X_i$
|
391 |
|
392 |
**Solution:**
|
393 |
|
394 |
We need to find $n$ such that:
|
395 |
|
396 |
-
$0.95
|
397 |
-
|
398 |
-
By the central limit theorem, the sample mean follows a normal distribution.
|
399 |
-
We can standardize this to work with the standard normal:
|
400 |
-
|
401 |
-
$Z=(∑ni=1Xi)−nμσ√nZ = \frac{\left(\sum_{i=1}^n X_i\right) - n\mu}{\sigma \sqrt{n}}$
|
402 |
|
403 |
-
|
404 |
|
405 |
-
|
406 |
|
407 |
-
|
408 |
|
409 |
-
|
410 |
|
411 |
-
|
412 |
|
413 |
-
$0.95
|
414 |
-
|
415 |
-
$4=Φ(√n4)−(1−Φ(√n4))= \Phi\left(\frac{\sqrt{n}}{4}\right) - \left(1 - \Phi\left(\frac{\sqrt{n}}{4}\right)\right)$
|
416 |
-
|
417 |
-
$=2Φ(√n4)−1= 2\Phi\left(\frac{\sqrt{n}}{4}\right) - 1$
|
418 |
|
419 |
Solving for $\Phi\left(\frac{\sqrt{n}}{4}\right)$:
|
420 |
|
421 |
-
$0.975
|
422 |
|
423 |
-
|
424 |
|
425 |
-
|
426 |
|
427 |
-
$
|
428 |
|
429 |
-
|
|
|
|
|
430 |
"""
|
431 |
)
|
432 |
return
|
@@ -929,7 +934,6 @@ def _(mo):
|
|
929 |
mo.vstack([distribution_type, sample_size, sim_count_slider]),
|
930 |
run_explorer_button
|
931 |
], justify='space-around')
|
932 |
-
|
933 |
return (
|
934 |
controls,
|
935 |
distribution_type,
|
|
|
6 |
# "scipy==1.15.2",
|
7 |
# "numpy==2.2.4",
|
8 |
# "plotly==5.18.0",
|
9 |
+
# "wigglystuff==0.1.13",
|
10 |
# ]
|
11 |
# ///
|
12 |
|
13 |
import marimo
|
14 |
|
15 |
+
__generated_with = "0.12.6"
|
16 |
app = marimo.App(width="medium", app_title="Central Limit Theorem")
|
17 |
|
18 |
|
|
|
24 |
|
25 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part4/clt/), by Stanford professor Chris Piech._
|
26 |
|
27 |
+
The central limit theorem is honestly mind-blowing — it's like magic that no matter what distribution you start with, the sampling distribution of means approaches a normal distribution as sample size increases.
|
28 |
+
|
29 |
+
Mathematically, if we have:
|
30 |
+
|
31 |
+
$X_1, X_2, \ldots, X_n$ as independent, identically distributed random variables with:
|
32 |
+
|
33 |
+
- Mean: $\mu$
|
34 |
+
- Variance: $\sigma^2 < \infty$
|
35 |
+
|
36 |
+
Then as $n \to \infty$:
|
37 |
+
|
38 |
+
$$\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^{n}X_i - \mu\right) \xrightarrow{d} \mathcal{N}(0, \sigma^2)$$
|
39 |
+
|
40 |
+
> _Note:_ The above LaTeX derivation is included as a reference. Credit for this formulation goes to the original source linked at the top of the notebook.
|
41 |
"""
|
42 |
)
|
43 |
return
|
|
|
55 |
|
56 |
Let $X_1, X_2, \dots, X_n$ be independent and identically distributed random variables. The sum of these random variables approaches a normal distribution as $n \rightarrow \infty$:
|
57 |
|
58 |
+
$\sum_{i=1}^{n}X_i \sim \mathcal{N}(n \cdot \mu, n \cdot \sigma^2)$
|
59 |
|
60 |
Where $\mu = E[X_i]$ and $\sigma^2 = \text{Var}(X_i)$. Since each $X_i$ is identically distributed, they share the same expectation and variance.
|
61 |
|
|
|
63 |
|
64 |
Let $X_1, X_2, \dots, X_n$ be independent and identically distributed random variables. The average of these random variables approaches a normal distribution as $n \rightarrow \infty$:
|
65 |
|
66 |
+
$\frac{1}{n}\sum_{i=1}^{n}X_i \sim \mathcal{N}(\mu, \frac{\sigma^2}{n})$
|
67 |
|
68 |
Where $\mu = E[X_i]$ and $\sigma^2 = \text{Var}(X_i)$.
|
69 |
|
|
|
325 |
r"""
|
326 |
### Example 1: Dice Game
|
327 |
|
328 |
+
> _Note:_ The following application demonstrates the practical use of the Central Limit Theorem. The mathematical derivation is based on concepts from ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/clt/) by Chris Piech.
|
329 |
|
330 |
+
Let's solve a fun probability problem: You roll a 6-sided die 10 times and let $X$ represent the total value of all 10 dice: $X = X_1 + X_2 + \dots + X_{10}$. You win if $X \leq 25$ or $X \geq 45$. What's your probability of winning?
|
331 |
|
332 |
+
For a single die roll $X_i$, we know:
|
333 |
- $E[X_i] = 3.5$
|
334 |
- $\text{Var}(X_i) = \frac{35}{12}$
|
335 |
|
336 |
+
**Solution Approach:**
|
|
|
|
|
|
|
|
|
337 |
|
338 |
+
This is where the Central Limit Theorem shines! Since we're summing 10 independent, identically distributed random variables, we can approximate this sum with a normal distribution $Y$:
|
339 |
|
340 |
+
$Y \sim \mathcal{N}(10 \cdot E[X_i], 10 \cdot \text{Var}(X_i)) = \mathcal{N}(35, 29.2)$
|
341 |
|
342 |
+
Now calculating our winning probability:
|
343 |
|
344 |
+
$P(X \leq 25 \text{ or } X \geq 45) = P(X \leq 25) + P(X \geq 45)$
|
345 |
|
346 |
+
Since we're approximating a discrete distribution with a continuous one, we apply a continuity correction:
|
347 |
|
348 |
+
$\approx P(Y < 25.5) + P(Y > 44.5) = P(Y < 25.5) + [1 - P(Y < 44.5)]$
|
349 |
|
350 |
+
Converting to standard normal form:
|
351 |
|
352 |
+
$\approx \Phi\left(\frac{25.5 - 35}{\sqrt{29.2}}\right) + \left[1 - \Phi\left(\frac{44.5 - 35}{\sqrt{29.2}}\right)\right]$
|
353 |
|
354 |
+
$\approx \Phi(-1.76) + [1 - \Phi(1.76)]$
|
355 |
|
356 |
+
$\approx 0.039 + (1 - 0.961) \approx 0.078$
|
357 |
|
358 |
+
So your chance of winning is about 7.8% — not great odds, but that's probability for you!
|
|
|
359 |
"""
|
360 |
)
|
361 |
return
|
|
|
369 |
|
370 |
dice_explanation = mo.md(
|
371 |
r"""
|
372 |
+
**Understanding the Visualization:**
|
373 |
|
374 |
+
This graph shows our dice game in action. The blue bars represent the exact probability distribution for summing 10 dice, while the red curve shows our normal approximation from the Central Limit Theorem.
|
375 |
|
376 |
+
I've highlighted the winning regions in orange:
|
377 |
- The left region where $X \leq 25$
|
378 |
- The right region where $X \geq 45$
|
379 |
|
380 |
+
Together these regions cover about 7.8% of the total probability.
|
381 |
|
382 |
+
What's fascinating here is how closely the normal curve approximates the actual discrete distribution — this is the Central Limit Theorem working its magic, even with just 10 random variables.
|
383 |
"""
|
384 |
)
|
385 |
|
|
|
393 |
r"""
|
394 |
### Example 2: Algorithm Runtime Estimation
|
395 |
|
396 |
+
> _Note:_ The following derivation demonstrates the practical application of the Central Limit Theorem for experimental design. The mathematical approach is based on concepts from ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/clt/) by Chris Piech.
|
397 |
+
|
398 |
+
Here's a practical problem I encounter in performance testing: You've developed a new algorithm and want to measure its average runtime. You know the variance is $\sigma^2 = 4 \text{ sec}^2$, but need to estimate the true mean runtime $t$.
|
399 |
|
400 |
+
The question: How many test runs do you need to be 95% confident your estimated mean is within ±0.5 seconds of the true value?
|
401 |
|
402 |
+
Let $X_i$ represent the runtime of the $i$-th test (for $1 \leq i \leq n$).
|
403 |
|
404 |
**Solution:**
|
405 |
|
406 |
We need to find $n$ such that:
|
407 |
|
408 |
+
$0.95 = P\left(-0.5 \leq \frac{\sum_{i=1}^n X_i}{n} - t \leq 0.5\right)$
|
|
|
|
|
|
|
|
|
|
|
409 |
|
410 |
+
The Central Limit Theorem tells us that as $n$ increases, the sample mean approaches a normal distribution. Let's standardize this to work with the standard normal distribution:
|
411 |
|
412 |
+
$Z = \frac{\left(\sum_{i=1}^n X_i\right) - n\mu}{\sigma \sqrt{n}} = \frac{\left(\sum_{i=1}^n X_i\right) - nt}{2 \sqrt{n}}$
|
413 |
|
414 |
+
Rewriting our probability constraint in terms of $Z$:
|
415 |
|
416 |
+
$0.95 = P\left(-0.5 \leq \frac{\sum_{i=1}^n X_i}{n} - t \leq 0.5\right) = P\left(\frac{-0.5 \sqrt{n}}{2} \leq Z \leq \frac{0.5 \sqrt{n}}{2}\right)$
|
417 |
|
418 |
+
Using the properties of the standard normal CDF:
|
419 |
|
420 |
+
$0.95 = \Phi\left(\frac{\sqrt{n}}{4}\right) - \Phi\left(-\frac{\sqrt{n}}{4}\right) = 2\Phi\left(\frac{\sqrt{n}}{4}\right) - 1$
|
|
|
|
|
|
|
|
|
421 |
|
422 |
Solving for $\Phi\left(\frac{\sqrt{n}}{4}\right)$:
|
423 |
|
424 |
+
$0.975 = \Phi\left(\frac{\sqrt{n}}{4}\right)$
|
425 |
|
426 |
+
Using the inverse CDF:
|
427 |
|
428 |
+
$\Phi^{-1}(0.975) = \frac{\sqrt{n}}{4}$
|
429 |
|
430 |
+
$1.96 = \frac{\sqrt{n}}{4}$
|
431 |
|
432 |
+
$n = 61.4$
|
433 |
+
|
434 |
+
Rounding up, we need 62 test runs to achieve our desired confidence interval — a practical result we can immediately apply to our testing protocol.
|
435 |
"""
|
436 |
)
|
437 |
return
|
|
|
934 |
mo.vstack([distribution_type, sample_size, sim_count_slider]),
|
935 |
run_explorer_button
|
936 |
], justify='space-around')
|
|
|
937 |
return (
|
938 |
controls,
|
939 |
distribution_type,
|
probability/19_maximum_likelihood_estimation.py
CHANGED
@@ -133,6 +133,8 @@ def _(mo):
|
|
133 |
r"""
|
134 |
## MLE for Bernoulli Distribution
|
135 |
|
|
|
|
|
136 |
Let's start with a simple example: estimating the parameter $p$ of a Bernoulli distribution.
|
137 |
|
138 |
### The Model
|
|
|
133 |
r"""
|
134 |
## MLE for Bernoulli Distribution
|
135 |
|
136 |
+
> _Note:_ The following derivation is included as reference material. The credit for this mathematical formulation belongs to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part5/mle/) by Chris Piech.
|
137 |
+
|
138 |
Let's start with a simple example: estimating the parameter $p$ of a Bernoulli distribution.
|
139 |
|
140 |
### The Model
|