Akshay Agrawal commited on
Commit
a797de9
·
unverified ·
2 Parent(s): 1c791c5 4996422

Merge pull request #93 from marimo-team/haleshot/refine

Browse files
probability/10_probability_mass_function.py CHANGED
@@ -10,7 +10,7 @@
10
 
11
  import marimo
12
 
13
- __generated_with = "0.11.17"
14
  app = marimo.App(width="medium", app_title="Probability Mass Functions")
15
 
16
 
@@ -22,9 +22,9 @@ def _(mo):
22
 
23
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/pmf/), by Stanford professor Chris Piech._
24
 
25
- For a random variable, the most important thing to know is: how likely is each outcome? For a discrete random variable, this information is called the "**Probability Mass Function**". The probability mass function (PMF) provides the "mass" (i.e. amount) of "probability" for each possible assignment of the random variable.
26
 
27
- Formally, the Probability Mass Function is a mapping between the values that the random variable could take on and the probability of the random variable taking on said value. In mathematics, we call these associations functions. There are many different ways of representing functions: you can write an equation, you can make a graph, you can even store many samples in a list.
28
  """
29
  )
30
  return
@@ -36,18 +36,12 @@ def _(mo):
36
  r"""
37
  ## Properties of a PMF
38
 
39
- For a function $p_X(x)$ to be a valid PMF, it must satisfy:
40
 
41
- 1. **Non-negativity**: $p_X(x) \geq 0$ for all $x$
42
- 2. **Unit total probability**: $\sum_x p_X(x) = 1$
43
 
44
- ### Probabilities Must Sum to 1
45
-
46
- For a variable (call it $X$) to be a proper random variable, it must be the case that if you summed up the values of $P(X=x)$ for all possible values $x$ that $X$ can take on, the result must be 1:
47
-
48
- $$\sum_x P(X=x) = 1$$
49
-
50
- This is because a random variable taking on a value is an event (for example $X=3$). Each of those events is mutually exclusive because a random variable will take on exactly one value. Those mutually exclusive cases define an entire sample space. Why? Because $X$ must take on some value.
51
  """
52
  )
53
  return
@@ -125,11 +119,11 @@ def _(np, plt):
125
  def _(mo):
126
  mo.md(
127
  r"""
128
- The information provided in these graphs shows the likelihood of a random variable taking on different values.
129
 
130
- In the graph on the right, the value "6" on the $x$-axis is associated with the probability $\frac{5}{36}$ on the $y$-axis. This $x$-axis refers to the event "the sum of two dice is 6" or $Y = 6$. The $y$-axis tells us that the probability of that event is $\frac{5}{36}$. In full: $P(Y = 6) = \frac{5}{36}$.
131
 
132
- The value "2" is associated with "$\frac{1}{36}$" which tells us that, $P(Y = 2) = \frac{1}{36}$, the probability that two dice sum to 2 is $\frac{1}{36}$. There is no value associated with "1" because the sum of two dice cannot be 1.
133
  """
134
  )
135
  return
@@ -220,7 +214,7 @@ def _(mo):
220
  r"""
221
  ## Data to Histograms to Probability Mass Functions
222
 
223
- One surprising way to store a likelihood function (recall that a PMF is the name of the likelihood function for discrete random variables) is simply a list of data. Let's simulate summing two dice many times to create an empirical PMF:
224
  """
225
  )
226
  return
@@ -323,9 +317,9 @@ def _(collections, np, plt, sim_dice_sums):
323
  def _(mo):
324
  mo.md(
325
  r"""
326
- A normalized histogram (where each value is divided by the length of your data list) is an approximation of the PMF. For a dataset of discrete numbers, a histogram shows the count of each value. By the definition of probability, if you divide this count by the number of experiments run, you arrive at an approximation of the probability of the event $P(Y=y)$.
327
 
328
- Let's look at a specific example. If we want to approximate $P(Y=3)$ (the probability that the sum of two dice is 3), we can count the number of times that "3" occurs in our data and divide by the total number of trials:
329
  """
330
  )
331
  return
 
10
 
11
  import marimo
12
 
13
+ __generated_with = "0.12.6"
14
  app = marimo.App(width="medium", app_title="Probability Mass Functions")
15
 
16
 
 
22
 
23
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/pmf/), by Stanford professor Chris Piech._
24
 
25
+ PMFs are really important in discrete probability. They tell us how likely each possible outcome is for a discrete random variable.
26
 
27
+ What's interesting about PMFs is that they can be represented in multiple ways - equations, graphs, or even empirical data. The core idea is simple: they map each possible value to its probability.
28
  """
29
  )
30
  return
 
36
  r"""
37
  ## Properties of a PMF
38
 
39
+ For a function $p_X(x)$ to be a valid PMF:
40
 
41
+ 1. **Non-negativity**: probability can't be negative, so $p_X(x) \geq 0$ for all $x$
42
+ 2. **Unit total probability**: all probabilities sum to 1, i.e., $\sum_x p_X(x) = 1$
43
 
44
+ The second property makes intuitive sense - a random variable must take some value, and the sum of all possibilities should be 100%.
 
 
 
 
 
 
45
  """
46
  )
47
  return
 
119
  def _(mo):
120
  mo.md(
121
  r"""
122
+ These graphs really show us how likely each value is when we roll the dice.
123
 
124
+ looking at the right graph, when we see "6" on the $x$-axis with probability $\frac{5}{36}$ on the $y$-axis, that's telling us there's a $\frac{5}{36}$ chance of rolling a sum of 6 with two dice. or more formally: $P(Y = 6) = \frac{5}{36}$.
125
 
126
+ Similarly, the value "2" has probability "$\frac{1}{36}$" - that's because there's only one way to get a sum of 2 (rolling 1 on both dice). and you'll notice there's no value for "1" since you can't get a sum of 1 with two dice - the minimum possible is 2.
127
  """
128
  )
129
  return
 
214
  r"""
215
  ## Data to Histograms to Probability Mass Functions
216
 
217
+ Here's something I find interesting — one way to represent a likelihood function is just through raw data. instead of mathematical formulas, we can actually approximate a PMF by collecting data points. let's see this in action by simulating lots of dice rolls and building an empirical PMF:
218
  """
219
  )
220
  return
 
317
  def _(mo):
318
  mo.md(
319
  r"""
320
+ When we normalize a histogram (divide each count by total sample size), we get a pretty good approximation of the true PMF. it's a simple yet powerful idea - count how many times each value appears, then divide by the total number of trials.
321
 
322
+ let's make this concrete. say we want to estimate $P(Y=3)$ - the probability of rolling a sum of 3 with two dice. we just count how many 3's show up in our simulated rolls and divide by the total number of rolls:
323
  """
324
  )
325
  return
probability/11_expectation.py CHANGED
@@ -10,7 +10,7 @@
10
 
11
  import marimo
12
 
13
- __generated_with = "0.11.19"
14
  app = marimo.App(width="medium", app_title="Expectation")
15
 
16
 
@@ -22,9 +22,9 @@ def _(mo):
22
 
23
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/expectation/), by Stanford professor Chris Piech._
24
 
25
- A random variable is fully represented by its Probability Mass Function (PMF), which describes each value the random variable can take on and the corresponding probabilities. However, a PMF can contain a lot of information. Sometimes it's useful to summarize a random variable with a single value!
26
 
27
- The most common, and arguably the most useful, summary of a random variable is its **Expectation** (also called the expected value or mean).
28
  """
29
  )
30
  return
@@ -36,11 +36,11 @@ def _(mo):
36
  r"""
37
  ## Definition of Expectation
38
 
39
- The expectation of a random variable $X$, written $E[X]$, is the average of all the values the random variable can take on, each weighted by the probability that the random variable will take on that value.
40
 
41
  $$E[X] = \sum_x x \cdot P(X=x)$$
42
 
43
- Expectation goes by many other names: Mean, Weighted Average, Center of Mass, 1st Moment. All of these are calculated using the same formula.
44
  """
45
  )
46
  return
 
10
 
11
  import marimo
12
 
13
+ __generated_with = "0.12.6"
14
  app = marimo.App(width="medium", app_title="Expectation")
15
 
16
 
 
22
 
23
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/expectation/), by Stanford professor Chris Piech._
24
 
25
+ Expectations are fascinating they represent the "center of mass" of a probability distribution. while they're often called "expected values" or "averages," they don't always match our intuition about what's "expected" to happen.
26
 
27
+ For me, the most interesting part about expectations is how they quantify what happens "on average" in the long run, even if that average isn't a possible outcome (like expecting 3.5 on a standard die roll).
28
  """
29
  )
30
  return
 
36
  r"""
37
  ## Definition of Expectation
38
 
39
+ Expectation (written as $E[X]$) is basically the "average outcome" of a random variable, but with a twist - we weight each possible value by how likely it is to occur. I like to think of it as the "center of gravity" for probability.
40
 
41
  $$E[X] = \sum_x x \cdot P(X=x)$$
42
 
43
+ People call this concept by different names - mean, weighted average, center of mass, or 1st moment if you're being fancy. They're all calculated the same way, though: multiply each value by its probability, then add everything up.
44
  """
45
  )
46
  return
probability/13_bernoulli_distribution.py CHANGED
@@ -10,7 +10,7 @@
10
 
11
  import marimo
12
 
13
- __generated_with = "0.11.22"
14
  app = marimo.App(width="medium", app_title="Bernoulli Distribution")
15
 
16
 
@@ -20,15 +20,15 @@ def _(mo):
20
  r"""
21
  # Bernoulli Distribution
22
 
23
- _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/bernoulli/), by Stanford professor Chris Piech._
24
 
25
  ## Parametric Random Variables
26
 
27
- There are many classic and commonly-seen random variable abstractions that show up in the world of probability. At this point, we'll learn about several of the most significant parametric discrete distributions.
28
 
29
- When solving problems, if you can recognize that a random variable fits one of these formats, then you can use its pre-derived Probability Mass Function (PMF), expectation, variance, and other properties. Random variables of this sort are called **parametric random variables**. If you can argue that a random variable falls under one of the studied parametric types, you simply need to provide parameters.
30
 
31
- > A good analogy is a `class` in programming. Creating a parametric random variable is very similar to calling a constructor with input parameters.
32
  """
33
  )
34
  return
@@ -40,18 +40,16 @@ def _(mo):
40
  r"""
41
  ## Bernoulli Random Variables
42
 
43
- A **Bernoulli random variable** (also called a boolean or indicator random variable) is the simplest kind of parametric random variable. It can take on two values: 1 and 0.
44
 
45
- It takes on a 1 if an experiment with probability $p$ resulted in success and a 0 otherwise.
46
 
47
- Some example uses include:
 
 
 
48
 
49
- - A coin flip (heads = 1, tails = 0)
50
- - A random binary digit
51
- - Whether a disk drive crashed
52
- - Whether someone likes a Netflix movie
53
-
54
- Here $p$ is the parameter, but different instances of Bernoulli random variables might have different values of $p$.
55
  """
56
  )
57
  return
@@ -167,9 +165,11 @@ def _(expected_value, p_slider, plt, probabilities, values, variance):
167
  def _(mo):
168
  mo.md(
169
  r"""
170
- ## Proof: Expectation of a Bernoulli
 
 
171
 
172
- If $X$ is a Bernoulli with parameter $p$, $X \sim \text{Bern}(p)$:
173
 
174
  \begin{align}
175
  E[X] &= \sum_x x \cdot (X=x) && \text{Definition of expectation} \\
@@ -178,11 +178,7 @@ def _(mo):
178
  &= p && \text{Remove the 0 term}
179
  \end{align}
180
 
181
- ## Proof: Variance of a Bernoulli
182
-
183
- If $X$ is a Bernoulli with parameter $p$, $X \sim \text{Bern}(p)$:
184
-
185
- To compute variance, first compute $E[X^2]$:
186
 
187
  \begin{align}
188
  E[X^2]
@@ -206,18 +202,16 @@ def _(mo):
206
  def _(mo):
207
  mo.md(
208
  r"""
209
- ## Indicator Random Variable
210
-
211
- > **Definition**: An indicator variable is a Bernoulli random variable which takes on the value 1 if an **underlying event occurs**, and 0 _otherwise_.
212
 
213
- Indicator random variables are a convenient way to convert the "true/false" outcome of an event into a number. That number may be easier to incorporate into an equation.
214
 
215
- A random variable $I$ is an indicator variable for an event $A$ if $I = 1$ when $A$ occurs and $I = 0$ if $A$ does not occur. Indicator random variables are Bernoulli random variables, with $p = P(A)$. $I_A$ is a common choice of name for an indicator random variable.
216
 
217
- Here are some properties of indicator random variables:
218
 
219
- - $P(I=1)=P(A)$
220
- - $E[I]=P(A)$
221
  """
222
  )
223
  return
 
10
 
11
  import marimo
12
 
13
+ __generated_with = "0.12.6"
14
  app = marimo.App(width="medium", app_title="Bernoulli Distribution")
15
 
16
 
 
20
  r"""
21
  # Bernoulli Distribution
22
 
23
+ > _Note:_ This notebook builds on concepts from ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/bernoulli/) by Chris Piech.
24
 
25
  ## Parametric Random Variables
26
 
27
+ Probability has a bunch of classic random variable patterns that show up over and over. Let's explore some of the most important parametric discrete distributions.
28
 
29
+ Bernoulli is honestly the simplest distribution you'll ever see, but it's ridiculously powerful in practice. What makes it fascinating to me is how it captures any yes/no scenario: success/failure, heads/tails, 1/0.
30
 
31
+ I think of these distributions as the atoms of probability they're the fundamental building blocks that everything else is made from.
32
  """
33
  )
34
  return
 
40
  r"""
41
  ## Bernoulli Random Variables
42
 
43
+ A Bernoulli random variable boils down to just two possible values: 1 (success) or 0 (failure). dead simple, but incredibly useful.
44
 
45
+ Some everyday examples where I see these:
46
 
47
+ - Coin flip (heads=1, tails=0)
48
+ - Whether that sketchy email is spam
49
+ - If someone actually clicks my ad
50
+ - Whether my code compiles first try (almost always 0 for me)
51
 
52
+ All you need (the classic expression) is a single parameter $p$ - the probability of success.
 
 
 
 
 
53
  """
54
  )
55
  return
 
165
  def _(mo):
166
  mo.md(
167
  r"""
168
+ ## Expectation and Variance of a Bernoulli
169
+
170
+ > _Note:_ The following derivations are included as reference material. The credit for these mathematical formulations belongs to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/bernoulli/) by Chris Piech.
171
 
172
+ Let's work through why $E[X] = p$ for a Bernoulli:
173
 
174
  \begin{align}
175
  E[X] &= \sum_x x \cdot (X=x) && \text{Definition of expectation} \\
 
178
  &= p && \text{Remove the 0 term}
179
  \end{align}
180
 
181
+ And for variance, we first need $E[X^2]$:
 
 
 
 
182
 
183
  \begin{align}
184
  E[X^2]
 
202
  def _(mo):
203
  mo.md(
204
  r"""
205
+ ## Indicator Random Variables
 
 
206
 
207
+ Indicator variables are a clever trick I like to use they turn events into numbers. Instead of dealing with "did the event happen?" (yes/no), we get "1" if it happened and "0" if it didn't.
208
 
209
+ Formally: an indicator variable $I$ for event $A$ equals 1 when $A$ occurs and 0 otherwise. These are just bernoulli variables where $p = P(A)$. people often use notation like $I_A$ to name them.
210
 
211
+ Two key properties that make them super useful:
212
 
213
+ - $P(I=1)=P(A)$ - probability of getting a 1 is just the probability of the event
214
+ - $E[I]=P(A)$ - the expected value equals the probability (this one's a game-changer!)
215
  """
216
  )
217
  return
probability/14_binomial_distribution.py CHANGED
@@ -13,7 +13,7 @@
13
 
14
  import marimo
15
 
16
- __generated_with = "0.11.24"
17
  app = marimo.App(width="medium", app_title="Binomial Distribution")
18
 
19
 
@@ -25,11 +25,9 @@ def _(mo):
25
 
26
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/binomial/), by Stanford professor Chris Piech._
27
 
28
- In this section, we will discuss the binomial distribution. To start, imagine the following example:
29
 
30
- Consider $n$ independent trials of an experiment where each trial is a "success" with probability $p$. Let $X$ be the number of successes in $n$ trials.
31
-
32
- This situation is truly common in the natural world, and as such, there has been a lot of research into such phenomena. Random variables like $X$ are called **binomial random variables**. If you can identify that a process fits this description, you can inherit many already proved properties such as the PMF formula, expectation, and variance!
33
  """
34
  )
35
  return
@@ -197,11 +195,11 @@ def _(mo):
197
  r"""
198
  ## Relationship to Bernoulli Random Variables
199
 
200
- One way to think of the binomial is as the sum of $n$ Bernoulli variables. Say that $Y_i$ is an indicator Bernoulli random variable which is 1 if experiment $i$ is a success. Then if $X$ is the total number of successes in $n$ experiments, $X \sim \text{Bin}(n, p)$:
201
 
202
  $$X = \sum_{i=1}^n Y_i$$
203
 
204
- Recall that the outcome of $Y_i$ will be 1 or 0, so one way to think of $X$ is as the sum of those 1s and 0s.
205
  """
206
  )
207
  return
 
13
 
14
  import marimo
15
 
16
+ __generated_with = "0.12.6"
17
  app = marimo.App(width="medium", app_title="Binomial Distribution")
18
 
19
 
 
25
 
26
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/binomial/), by Stanford professor Chris Piech._
27
 
28
+ The binomial distribution is essentially what happens when you run multiple Bernoulli trials and count the successes. I love this distribution because it appears everywhere in practical scenarios.
29
 
30
+ Think about it: whenever you're counting how many times something happens across multiple attempts, you're likely dealing with a binomial. Website conversions, A/B testing results, even counting heads in multiple coin flips — all binomial!
 
 
31
  """
32
  )
33
  return
 
195
  r"""
196
  ## Relationship to Bernoulli Random Variables
197
 
198
+ One way I like to think about the binomial: it's just adding up a bunch of Bernoullis. If each $Y_i$ is a Bernoulli that tells us if the $i$-th trial succeeded, then:
199
 
200
  $$X = \sum_{i=1}^n Y_i$$
201
 
202
+ This makes the distribution really intuitive to me - we're just counting 1s across our $n$ experiments.
203
  """
204
  )
205
  return
probability/15_poisson_distribution.py CHANGED
@@ -13,7 +13,7 @@
13
 
14
  import marimo
15
 
16
- __generated_with = "0.11.25"
17
  app = marimo.App(width="medium", app_title="Poisson Distribution")
18
 
19
 
@@ -25,7 +25,9 @@ def _(mo):
25
 
26
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/poisson/), by Stanford professor Chris Piech._
27
 
28
- A Poisson random variable gives the probability of a given number of events in a fixed interval of time (or space). It makes the Poisson assumption that events occur with a known constant mean rate and independently of the time since the last event.
 
 
29
  """
30
  )
31
  return
@@ -180,11 +182,11 @@ def _(mo):
180
  r"""
181
  ## Poisson Intuition: Relation to Binomial Distribution
182
 
183
- The Poisson distribution can be derived as a limiting case of the [binomial distribution](http://marimo.app/https://github.com/marimo-team/learn/blob/main/probability/14_binomial_distribution.py).
184
 
185
- Let's work on a practical example: predicting the number of ride-sharing requests in a specific area over a one-minute interval. From historical data, we know that the average number of requests per minute is $\lambda = 5$.
186
 
187
- We could approximate this using a binomial distribution by dividing our minute into smaller intervals. For example, we can divide a minute into 60 seconds and treat each second as a [Bernoulli trial](http://marimo.app/https://github.com/marimo-team/learn/blob/main/probability/13_bernoulli_distribution.py) - either there's a request (success) or there isn't (failure).
188
 
189
  Let's visualize this concept:
190
  """
@@ -231,7 +233,7 @@ def _(fig_to_image, mo, plt):
231
  _explanation = mo.md(
232
  r"""
233
  In this visualization:
234
-
235
  - Each rectangle represents a 1-second interval
236
  - Blue rectangles indicate intervals where an event occurred
237
  - Red dots show the actual event times (2.75s and 7.12s)
@@ -247,9 +249,9 @@ def _(fig_to_image, mo, plt):
247
  def _(mo):
248
  mo.md(
249
  r"""
250
- The total number of requests received over the minute can be approximated as the sum of the sixty indicator variables, which conveniently matches the description of a binomial — a sum of Bernoullis.
251
 
252
- Specifically, if we define $X$ to be the number of requests in a minute, $X$ is a binomial with $n=60$ trials. What is the probability, $p$, of a success on a single trial? To make the expectation of $X$ equal the observed historical average $\lambda$, we should choose $p$ so that:
253
 
254
  \begin{align}
255
  \lambda &= E[X] && \text{Expectation matches historical average} \\
@@ -257,7 +259,7 @@ def _(mo):
257
  p &= \frac{\lambda}{n} && \text{Solving for $p$}
258
  \end{align}
259
 
260
- In this case, since $\lambda=5$ and $n=60$, we should choose $p=\frac{5}{60}=\frac{1}{12}$ and state that $X \sim \text{Bin}(n=60, p=\frac{5}{60})$. Now we can calculate the probability of different numbers of requests using the binomial PMF:
261
 
262
  $P(X = x) = {n \choose x} p^x (1-p)^{n-x}$
263
 
@@ -269,7 +271,7 @@ def _(mo):
269
  P(X=3) &= {60 \choose 3} (5/60)^3 (55/60)^{60-3} \approx 0.1389
270
  \end{align}
271
 
272
- This is a good approximation, but it doesn't account for the possibility of multiple events in a single second. One solution is to divide our minute into even more fine-grained intervals. Let's try 600 deciseconds (tenths of a second):
273
  """
274
  )
275
  return
@@ -283,7 +285,7 @@ def _(fig_to_image, mo, plt):
283
 
284
  # Example events at 2.75s and 7.12s (convert to deciseconds)
285
  events = [27.5, 71.2]
286
-
287
  for i in range(100):
288
  color = 'royalblue' if any(i <= event_val < i + 1 for event_val in events) else 'lightgray'
289
  ax.add_patch(plt.Rectangle((i, 0), 0.9, 1, color=color))
@@ -434,21 +436,23 @@ def _(df, fig, fig_to_image, mo, n, p):
434
  def _(mo):
435
  mo.md(
436
  r"""
437
- As you can see from the interactive comparison above, as the number of intervals increases, the binomial distribution approaches the Poisson distribution! This is not a coincidence - the Poisson distribution is actually the limiting case of the binomial distribution when:
438
 
439
  - The number of trials $n$ approaches infinity
440
  - The probability of success $p$ approaches zero
441
  - The product $np = \lambda$ remains constant
442
 
443
- This relationship is why the Poisson distribution is so useful - it's easier to work with than a binomial with a very large number of trials and a very small probability of success.
444
 
445
  ## Derivation of the Poisson PMF
446
 
447
- Let's derive the Poisson PMF by taking the limit of the binomial PMF as $n \to \infty$. We start with:
 
 
448
 
449
  $P(X=x) = \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x}$
450
 
451
- While this expression looks intimidating, it simplifies nicely:
452
 
453
  \begin{align}
454
  P(X=x)
@@ -495,7 +499,7 @@ def _(mo):
495
  && \text{Simplifying}\\
496
  \end{align}
497
 
498
- This gives us our elegant Poisson PMF formula: $P(X=x) = \frac{\lambda^x \cdot e^{-\lambda}}{x!}$
499
  """
500
  )
501
  return
 
13
 
14
  import marimo
15
 
16
+ __generated_with = "0.12.6"
17
  app = marimo.App(width="medium", app_title="Poisson Distribution")
18
 
19
 
 
25
 
26
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/poisson/), by Stanford professor Chris Piech._
27
 
28
+ The Poisson distribution is my go-to for modeling random events occurring over time or space. What makes it cool is that it only needs a single parameter λ (lambda), which represents both the mean and variance.
29
+
30
+ I find it particularly useful when events happen rarely but the opportunities for them to occur are numerous — like modeling website visits, dust/particle emissions or even typos in a document.
31
  """
32
  )
33
  return
 
182
  r"""
183
  ## Poisson Intuition: Relation to Binomial Distribution
184
 
185
+ The Poisson distribution can be derived as a limiting case of the binomial distribution. I find this connection fascinating because it shows how seemingly different distributions are actually related.
186
 
187
+ Let's work through a practical example: predicting ride-sharing requests in a specific area over a one-minute interval. From historical data, we know that the average number of requests per minute is $\lambda = 5$.
188
 
189
+ We could model this using a binomial distribution by dividing our minute into smaller intervals. For example, splitting a minute into 60 seconds, where each second is a Bernoulli trial either a request arrives (success) or it doesn't (failure).
190
 
191
  Let's visualize this concept:
192
  """
 
233
  _explanation = mo.md(
234
  r"""
235
  In this visualization:
236
+
237
  - Each rectangle represents a 1-second interval
238
  - Blue rectangles indicate intervals where an event occurred
239
  - Red dots show the actual event times (2.75s and 7.12s)
 
249
  def _(mo):
250
  mo.md(
251
  r"""
252
+ The total number of requests received over the minute can be approximated as the sum of sixty indicator variables, which aligns perfectly with the binomial distribution — a sum of Bernoullis.
253
 
254
+ If we define $X$ as the number of requests in a minute, $X$ follows a binomial with $n=60$ trials. To determine the success probability $p$, we need to match the expected value with our historical average $\lambda$:
255
 
256
  \begin{align}
257
  \lambda &= E[X] && \text{Expectation matches historical average} \\
 
259
  p &= \frac{\lambda}{n} && \text{Solving for $p$}
260
  \end{align}
261
 
262
+ With $\lambda=5$ and $n=60$, we get $p=\frac{5}{60}=\frac{1}{12}$, so $X \sim \text{Bin}(n=60, p=\frac{5}{60})$. Using the binomial PMF:
263
 
264
  $P(X = x) = {n \choose x} p^x (1-p)^{n-x}$
265
 
 
271
  P(X=3) &= {60 \choose 3} (5/60)^3 (55/60)^{60-3} \approx 0.1389
272
  \end{align}
273
 
274
+ This approximation works well, but it doesn't account for multiple events occurring in a single second. To address this limitation, we can use even finer intervals perhaps 600 deciseconds (tenths of a second):
275
  """
276
  )
277
  return
 
285
 
286
  # Example events at 2.75s and 7.12s (convert to deciseconds)
287
  events = [27.5, 71.2]
288
+
289
  for i in range(100):
290
  color = 'royalblue' if any(i <= event_val < i + 1 for event_val in events) else 'lightgray'
291
  ax.add_patch(plt.Rectangle((i, 0), 0.9, 1, color=color))
 
436
  def _(mo):
437
  mo.md(
438
  r"""
439
+ As our interactive comparison demonstrates, the binomial distribution converges to the Poisson distribution as we increase the number of intervals! This remarkable relationship exists because the Poisson distribution is actually the limiting case of the binomial when:
440
 
441
  - The number of trials $n$ approaches infinity
442
  - The probability of success $p$ approaches zero
443
  - The product $np = \lambda$ remains constant
444
 
445
+ This elegance is why I find the Poisson distribution so powerful it simplifies what would otherwise be a cumbersome binomial with numerous trials and tiny success probabilities.
446
 
447
  ## Derivation of the Poisson PMF
448
 
449
+ > _Note:_ The following mathematical derivation is included as reference material. The credit for this formulation belongs to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/poisson/) by Chris Piech.
450
+
451
+ The Poisson PMF can be derived by taking the limit of the binomial PMF as $n \to \infty$:
452
 
453
  $P(X=x) = \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x}$
454
 
455
+ Through a series of algebraic manipulations:
456
 
457
  \begin{align}
458
  P(X=x)
 
499
  && \text{Simplifying}\\
500
  \end{align}
501
 
502
+ This gives us the elegant Poisson PMF formula: $P(X=x) = \frac{\lambda^x \cdot e^{-\lambda}}{x!}$
503
  """
504
  )
505
  return
probability/16_continuous_distribution.py CHANGED
@@ -14,7 +14,7 @@
14
 
15
  import marimo
16
 
17
- __generated_with = "0.11.26"
18
  app = marimo.App(width="medium")
19
 
20
 
@@ -26,7 +26,9 @@ def _(mo):
26
 
27
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/continuous/), by Stanford professor Chris Piech._
28
 
29
- So far, all the random variables we've explored have been discrete, taking on only specific values (usually integers). Now we'll move into the world of **continuous random variables**, which can take on any real number value. Continuous random variables are used to model measurements with arbitrary precision like height, weight, time, and many natural phenomena.
 
 
30
  """
31
  )
32
  return
@@ -38,20 +40,17 @@ def _(mo):
38
  r"""
39
  ## From Discrete to Continuous
40
 
41
- To make the transition from discrete to continuous random variables, let's start with a thought experiment:
42
-
43
- > Imagine you're running to catch a bus. You know you'll arrive at 2:15pm, but you don't know exactly when the bus will arrive. You want to model the bus arrival time (in minutes past 2pm) as a random variable $T$ so you can calculate the probability that you'll wait more than five minutes: $P(15 < T < 20)$.
44
 
45
- This immediately highlights a key difference from discrete distributions. For discrete distributions, we described the probability that a random variable takes on exact values. But this doesn't make sense for continuous values like time.
46
 
47
- For example:
48
 
 
49
  - What's the probability the bus arrives at exactly 2:17pm and 12.12333911102389234 seconds?
50
- - What's the probability of a child being born weighing exactly 3.523112342234 kilograms?
51
 
52
- These questions don't have meaningful answers because real-world measurements can have infinite precision. The probability of a continuous random variable taking on any specific exact value is actually zero!
53
-
54
- ### Visualizing the Transition
55
 
56
  Let's visualize this transition from discrete to continuous:
57
  """
@@ -150,44 +149,43 @@ def _(mo):
150
  r"""
151
  ## Probability Density Functions
152
 
153
- In the world of discrete random variables, we used **Probability Mass Functions (PMFs)** to describe the probability of a random variable taking on specific values. In the continuous world, we need a different approach.
154
 
155
- For continuous random variables, we use a **Probability Density Function (PDF)** which defines the relative likelihood that a random variable takes on a particular value. We traditionally denote the PDF with the symbol $f$ and write it as:
156
 
157
  $$f(X=x) \quad \text{or simply} \quad f(x)$$
158
 
159
- Where the lowercase $x$ implies that we're talking about the relative likelihood of a continuous random variable which is the uppercase $X$.
160
 
161
  ### Key Properties of PDFs
162
 
163
- A **Probability Density Function (PDF)** $f(x)$ for a continuous random variable $X$ has these key properties:
164
 
165
- 1. The probability that $X$ takes a value in the interval $[a, b]$ is:
166
 
167
  $$P(a \leq X \leq b) = \int_a^b f(x) \, dx$$
168
 
169
- 2. The PDF must be non-negative everywhere:
170
 
171
  $$f(x) \geq 0 \text{ for all } x$$
172
 
173
- 3. The total probability must sum to 1:
174
 
175
  $$\int_{-\infty}^{\infty} f(x) \, dx = 1$$
176
 
177
- 4. The probability that $X$ takes any specific exact value is 0:
178
 
179
  $$P(X = a) = \int_a^a f(x) \, dx = 0$$
180
 
181
- This last property highlights a key difference from discrete distributions: the probability of a continuous random variable taking on an exact value is always 0. Probabilities only make sense when talking about ranges of values.
182
-
183
- ### Caution: Density ≠ Probability
184
 
185
- A common misconception is to think of $f(x)$ as a probability. It is instead a **probability density**, representing probability per unit of $x$. The values of $f(x)$ can actually exceed 1, as long as the total area under the curve equals 1.
186
 
187
- The interpretation of $f(x)$ is only meaningful when:
188
 
189
- 1. We integrate over a range to get a probability, or
190
- 2. We compare densities at different points to determine relative likelihoods.
 
191
  """
192
  )
193
  return
@@ -665,16 +663,18 @@ def _(fig_to_image, mo, np, plt, sympy):
665
  # Detailed calculations for our example
666
  _calculations = mo.md(
667
  f"""
668
- ### Calculating Expectation and Variance for Our Example
669
 
670
- Let's calculate the expectation and variance for the PDF:
 
 
671
 
672
  $$f(x) = \\begin{{cases}}
673
  \\frac{{3}}{{8}}(4x - 2x^2) & \\text{{when }} 0 < x < 2 \\\\
674
  0 & \\text{{otherwise}}
675
  \\end{{cases}}$$
676
 
677
- #### Expectation Calculation
678
 
679
  $$E[X] = \\int_{{-\\infty}}^{{\\infty}} x \\cdot f(x) \\, dx = \\int_0^2 x \\cdot \\frac{{3}}{{8}}(4x - 2x^2) \\, dx$$
680
 
@@ -684,9 +684,9 @@ def _(fig_to_image, mo, np, plt, sympy):
684
 
685
  $$E[X] = \\frac{{3}}{{8}} \\cdot \\frac{{32 - 12}}{{3}} = \\frac{{3}}{{8}} \\cdot \\frac{{20}}{{3}} = \\frac{{20}}{{8}} = {E_X}$$
686
 
687
- #### Variance Calculation
688
 
689
- First, we need $E[X^2]$:
690
 
691
  $$E[X^2] = \\int_{{-\\infty}}^{{\\infty}} x^2 \\cdot f(x) \\, dx = \\int_0^2 x^2 \\cdot \\frac{{3}}{{8}}(4x - 2x^2) \\, dx$$
692
 
@@ -696,11 +696,11 @@ def _(fig_to_image, mo, np, plt, sympy):
696
 
697
  $$E[X^2] = \\frac{{3}}{{8}} \\cdot \\frac{{20 - 64/5}}{{1}} = {E_X2}$$
698
 
699
- Now we can calculate the variance:
700
 
701
  $$\\text{{Var}}(X) = E[X^2] - (E[X])^2 = {E_X2} - ({E_X})^2 = {Var_X}$$
702
 
703
- Therefore, the standard deviation is $\\sqrt{{\\text{{Var}}(X)}} = {Std_X}$.
704
  """
705
  )
706
  mo.vstack([_img, _calculations])
@@ -765,11 +765,11 @@ def _(mo):
765
 
766
  Some key points to remember:
767
 
768
- PDFs give us relative likelihood, not actual probabilities - that's why they can exceed 1
769
- The probability between two points is the area under the PDF curve
770
- CDFs offer a convenient shortcut to find probabilities without integrating
771
- Expectation and variance work similarly to discrete variables, just with integrals instead of sums
772
- Constants in PDFs are determined by ensuring the total probability equals 1
773
 
774
  This foundation will serve you well as we explore specific continuous distributions like normal, exponential, and beta in future notebooks. These distributions are the workhorses of probability theory and statistics, appearing everywhere from quality control to financial modeling.
775
 
@@ -779,7 +779,7 @@ def _(mo):
779
  return
780
 
781
 
782
- @app.cell
783
  def _(mo):
784
  mo.md(r"""Appendix code (helper functions, variables, etc.):""")
785
  return
@@ -971,7 +971,6 @@ def _(np, plt, sympy):
971
  1. Total probability: ∫₀² {C}(4x - 2x²) dx = {total_prob}
972
  2. P(X > 1): ∫₁² {C}(4x - 2x²) dx = {prob_gt_1}
973
  """
974
-
975
  return create_example_pdf_visualization, symbolic_calculation
976
 
977
 
 
14
 
15
  import marimo
16
 
17
+ __generated_with = "0.12.6"
18
  app = marimo.App(width="medium")
19
 
20
 
 
26
 
27
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/continuous/), by Stanford professor Chris Piech._
28
 
29
+ Continuous distributions are what we need when dealing with random variables that can take any value in a range, rather than just discrete values.
30
+
31
+ The key difference here is that we work with probability density functions (PDFs) instead of probability mass functions (PMFs). It took me a while to really get this - the PDF at a point isn't actually a probability, but rather a density.
32
  """
33
  )
34
  return
 
40
  r"""
41
  ## From Discrete to Continuous
42
 
43
+ Making the jump from discrete to continuous random variables requires a fundamental shift in thinking. Let me walk you through a thought experiment:
 
 
44
 
45
+ > You're rushing to catch a bus. You know you'll arrive at 2:15pm, but the bus arrival time is uncertain. If you model the bus arrival time (in minutes past 2pm) as a random variable $T$, how would you calculate the probability of waiting more than five minutes: $P(15 < T < 20)$?
46
 
47
+ This highlights a crucial difference from discrete distributions. With discrete distributions, we calculated probabilities for exact values, but this approach breaks down with continuous values like time.
48
 
49
+ Consider these questions:
50
  - What's the probability the bus arrives at exactly 2:17pm and 12.12333911102389234 seconds?
51
+ - What's the probability a newborn weighs exactly 3.523112342234 kilograms?
52
 
53
+ These questions have no meaningful answers because continuous measurements can have infinite precision. In the continuous world, the probability of a random variable taking any specific exact value is actually zero!
 
 
54
 
55
  Let's visualize this transition from discrete to continuous:
56
  """
 
149
  r"""
150
  ## Probability Density Functions
151
 
152
+ While discrete random variables use Probability Mass Functions (PMFs), continuous random variables require a different approach Probability Density Functions (PDFs).
153
 
154
+ A PDF defines the relative likelihood of a continuous random variable taking particular values. We typically denote this with $f$ and write it as:
155
 
156
  $$f(X=x) \quad \text{or simply} \quad f(x)$$
157
 
158
+ Where the lowercase $x$ represents a specific value our random variable $X$ might take.
159
 
160
  ### Key Properties of PDFs
161
 
162
+ For a PDF $f(x)$ to be valid, it must satisfy these properties:
163
 
164
+ 1. The probability that $X$ falls within interval $[a, b]$ is:
165
 
166
  $$P(a \leq X \leq b) = \int_a^b f(x) \, dx$$
167
 
168
+ 2. Non-negativity — the PDF can't be negative:
169
 
170
  $$f(x) \geq 0 \text{ for all } x$$
171
 
172
+ 3. Total probability equals 1:
173
 
174
  $$\int_{-\infty}^{\infty} f(x) \, dx = 1$$
175
 
176
+ 4. The probability of any exact value is zero:
177
 
178
  $$P(X = a) = \int_a^a f(x) \, dx = 0$$
179
 
180
+ This last property reveals a fundamental difference from discrete distributions with continuous random variables, probabilities only make sense for ranges, not specific points.
 
 
181
 
182
+ ### Important Distinction: Density Probability
183
 
184
+ One common mistake is interpreting $f(x)$ as a probability. It's actually a **density** — representing probability per unit of $x$. This is why $f(x)$ values can exceed 1, provided the total area under the curve equals 1.
185
 
186
+ The true meaning of $f(x)$ emerges only when:
187
+ 1. We integrate over a range to obtain an actual probability, or
188
+ 2. We compare densities at different points to understand relative likelihoods.
189
  """
190
  )
191
  return
 
663
  # Detailed calculations for our example
664
  _calculations = mo.md(
665
  f"""
666
+ ### Computing Expectation and Variance
667
 
668
+ > _Note:_ The following mathematical derivation is included as reference material. The credit for this approach belongs to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/continuous/) by Chris Piech.
669
+
670
+ Let's work through the calculations for our PDF:
671
 
672
  $$f(x) = \\begin{{cases}}
673
  \\frac{{3}}{{8}}(4x - 2x^2) & \\text{{when }} 0 < x < 2 \\\\
674
  0 & \\text{{otherwise}}
675
  \\end{{cases}}$$
676
 
677
+ #### Finding the Expectation
678
 
679
  $$E[X] = \\int_{{-\\infty}}^{{\\infty}} x \\cdot f(x) \\, dx = \\int_0^2 x \\cdot \\frac{{3}}{{8}}(4x - 2x^2) \\, dx$$
680
 
 
684
 
685
  $$E[X] = \\frac{{3}}{{8}} \\cdot \\frac{{32 - 12}}{{3}} = \\frac{{3}}{{8}} \\cdot \\frac{{20}}{{3}} = \\frac{{20}}{{8}} = {E_X}$$
686
 
687
+ #### Computing the Variance
688
 
689
+ We first need $E[X^2]$:
690
 
691
  $$E[X^2] = \\int_{{-\\infty}}^{{\\infty}} x^2 \\cdot f(x) \\, dx = \\int_0^2 x^2 \\cdot \\frac{{3}}{{8}}(4x - 2x^2) \\, dx$$
692
 
 
696
 
697
  $$E[X^2] = \\frac{{3}}{{8}} \\cdot \\frac{{20 - 64/5}}{{1}} = {E_X2}$$
698
 
699
+ Now we calculate variance using the formula $Var(X) = E[X^2] - (E[X])^2$:
700
 
701
  $$\\text{{Var}}(X) = E[X^2] - (E[X])^2 = {E_X2} - ({E_X})^2 = {Var_X}$$
702
 
703
+ This gives us a standard deviation of $\\sqrt{{\\text{{Var}}(X)}} = {Std_X}$.
704
  """
705
  )
706
  mo.vstack([_img, _calculations])
 
765
 
766
  Some key points to remember:
767
 
768
+ - PDFs give us relative likelihood, not actual probabilities - that's why they can exceed 1
769
+ - The probability between two points is the area under the PDF curve
770
+ - CDFs offer a convenient shortcut to find probabilities without integrating
771
+ - Expectation and variance work similarly to discrete variables, just with integrals instead of sums
772
+ - Constants in PDFs are determined by ensuring the total probability equals 1
773
 
774
  This foundation will serve you well as we explore specific continuous distributions like normal, exponential, and beta in future notebooks. These distributions are the workhorses of probability theory and statistics, appearing everywhere from quality control to financial modeling.
775
 
 
779
  return
780
 
781
 
782
+ @app.cell(hide_code=True)
783
  def _(mo):
784
  mo.md(r"""Appendix code (helper functions, variables, etc.):""")
785
  return
 
971
  1. Total probability: ∫₀² {C}(4x - 2x²) dx = {total_prob}
972
  2. P(X > 1): ∫₁² {C}(4x - 2x²) dx = {prob_gt_1}
973
  """
 
974
  return create_example_pdf_visualization, symbolic_calculation
975
 
976
 
probability/18_central_limit_theorem.py CHANGED
@@ -6,12 +6,13 @@
6
  # "scipy==1.15.2",
7
  # "numpy==2.2.4",
8
  # "plotly==5.18.0",
 
9
  # ]
10
  # ///
11
 
12
  import marimo
13
 
14
- __generated_with = "0.11.30"
15
  app = marimo.App(width="medium", app_title="Central Limit Theorem")
16
 
17
 
@@ -23,7 +24,20 @@ def _(mo):
23
 
24
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part4/clt/), by Stanford professor Chris Piech._
25
 
26
- The Central Limit Theorem (CLT) is one of the most important concepts in probability theory and statistics. It explains why many real-world distributions tend to be normal, even when the underlying processes are not.
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  """
28
  )
29
  return
@@ -41,7 +55,7 @@ def _(mo):
41
 
42
  Let $X_1, X_2, \dots, X_n$ be independent and identically distributed random variables. The sum of these random variables approaches a normal distribution as $n \rightarrow \infty$:
43
 
44
- $n∑i=1Xi∼N(n⋅μ,n⋅σ2)\sum_{i=1}^{n}X_i \sim \mathcal{N}(n \cdot \mu, n \cdot \sigma^2)$
45
 
46
  Where $\mu = E[X_i]$ and $\sigma^2 = \text{Var}(X_i)$. Since each $X_i$ is identically distributed, they share the same expectation and variance.
47
 
@@ -49,7 +63,7 @@ def _(mo):
49
 
50
  Let $X_1, X_2, \dots, X_n$ be independent and identically distributed random variables. The average of these random variables approaches a normal distribution as $n \rightarrow \infty$:
51
 
52
- $\frac{1}{n} ∑i=1Xi∼N(μ,σ2n)\frac{1}{n}\sum_{i=1}^{n}X_i \sim \mathcal{N}(\mu, \frac{\sigma^2}{n})$
53
 
54
  Where $\mu = E[X_i]$ and $\sigma^2 = \text{Var}(X_i)$.
55
 
@@ -311,41 +325,37 @@ def _(mo):
311
  r"""
312
  ### Example 1: Dice Game
313
 
314
- You will roll a 6-sided dice 10 times. Let $X$ be the total value of all 10 dice: $X = X_1 + X_2 + \dots + X_{10}$. You win the game if $X \leq 25$ or $X \geq 45$. Use the central limit theorem to calculate the probability that you win.
315
 
316
- Recall that for a single die roll $X_i$:
317
 
 
318
  - $E[X_i] = 3.5$
319
  - $\text{Var}(X_i) = \frac{35}{12}$
320
 
321
- **Solution:**
322
-
323
- Let $Y$ be the approximating normal distribution. By the Central Limit Theorem:
324
-
325
- $Y∼N(10⋅E[Xi],10⋅Var(Xi))Y \sim \mathcal{N}(10 \cdot E[X_i], 10 \cdot \text{Var}(X_i))$
326
 
327
- Substituting in the known values:
328
 
329
- $Y∼N(10⋅3.5,10⋅3512)=N(35,29.2)Y \sim \mathcal{N}(10 \cdot 3.5, 10 \cdot \frac{35}{12}) = \mathcal{N}(35, 29.2)$
330
 
331
- Now we calculate the probability:
332
 
333
- $P(X25 or X45)P(X \leq 25 \text{ or } X \geq 45)$
334
 
335
- $=P(X≤25)+P(X≥45)= P(X \leq 25) + P(X \geq 45)$
336
 
337
- $≈P(Y<25.5)+P(Y>44.5) (Continuity Correction)\approx P(Y < 25.5) + P(Y > 44.5) \text{ (Continuity Correction)}$
338
 
339
- $≈P(Y<25.5)+[1−P(Y<44.5)]\approx P(Y < 25.5) + [1 - P(Y < 44.5)]$
340
 
341
- $≈Φ(25.5−35√29.2)+[1−Φ(44.5−35√29.2)]\approx \Phi\left(\frac{25.5 - 35}{\sqrt{29.2}}\right) + \left[1 - \Phi\left(\frac{44.5 - 35}{\sqrt{29.2}}\right)\right]$
342
 
343
- $≈Φ(−1.76)+[1−Φ(1.76)]\approx \Phi(-1.76) + [1 - \Phi(1.76)]$
344
 
345
- $≈0.039+(1−0.961)\approx 0.039 + (1 - 0.961)$
346
 
347
- $≈0.078\approx 0.078$
348
- So, the probability of winning the game is approximately 7.8%.
349
  """
350
  )
351
  return
@@ -359,17 +369,17 @@ def _(create_dice_game_visualization, fig_to_image, mo):
359
 
360
  dice_explanation = mo.md(
361
  r"""
362
- **Visualization Explanation:**
363
 
364
- The graph shows the distribution of the sum of 10 dice rolls. The blue bars represent the actual probability mass function (PMF), while the red curve shows the normal approximation using the Central Limit Theorem.
365
 
366
- The winning regions are shaded in orange:
367
  - The left region where $X \leq 25$
368
  - The right region where $X \geq 45$
369
 
370
- The total probability of these regions is approximately 0.078 or 7.8%.
371
 
372
- Notice how the normal approximation provides a good fit to the discrete distribution, demonstrating the power of the Central Limit Theorem.
373
  """
374
  )
375
 
@@ -383,50 +393,45 @@ def _(mo):
383
  r"""
384
  ### Example 2: Algorithm Runtime Estimation
385
 
386
- Say you have a new algorithm and you want to test its running time. You know the variance of the algorithm's run time is $\sigma^2 = 4 \text{ sec}^2$, but you want to estimate the mean run time $t$ in seconds.
 
 
387
 
388
- You can run the algorithm repeatedly (IID trials). How many trials do you have to run so that your estimated runtime is within ±0.5 seconds of $t$ with 95% certainty?
389
 
390
- Let $X_i$ be the run time of the $i$-th run (for $1 \leq i \leq n$).
391
 
392
  **Solution:**
393
 
394
  We need to find $n$ such that:
395
 
396
- $0.95=P(−0.5≤∑ni=1Xin−t≤0.5)0.95 = P\left(-0.5 \leq \frac{\sum_{i=1}^n X_i}{n} - t \leq 0.5\right)$
397
-
398
- By the central limit theorem, the sample mean follows a normal distribution.
399
- We can standardize this to work with the standard normal:
400
-
401
- $Z=(∑ni=1Xi)−nμσ√nZ = \frac{\left(\sum_{i=1}^n X_i\right) - n\mu}{\sigma \sqrt{n}}$
402
 
403
- $=(∑ni=1Xi)−nt2√n= \frac{\left(\sum_{i=1}^n X_i\right) - nt}{2 \sqrt{n}}$
404
 
405
- Rewriting our probability inequality so that the central term is $Z$:
406
 
407
- $0.95=P(−0.5≤∑ni=1Xin−t≤0.5)0.95 = P\left(-0.5 \leq \frac{\sum_{i=1}^n X_i}{n} - t \leq 0.5\right)$
408
 
409
- $=P(0.5√n2≤Z≤0.5√n2)= P\left(\frac{-0.5 \sqrt{n}}{2} \leq Z \leq \frac{0.5 \sqrt{n}}{2}\right)$
410
 
411
- And now we find the value of $n$ that makes this equation hold:
412
 
413
- $0.95=Φ(√n4)−Φ(−√n4)0.95 = \Phi\left(\frac{\sqrt{n}}{4}\right) - \Phi\left(-\frac{\sqrt{n}}{4}\right)$
414
-
415
- $4=Φ(√n4)−(1−Φ(√n4))= \Phi\left(\frac{\sqrt{n}}{4}\right) - \left(1 - \Phi\left(\frac{\sqrt{n}}{4}\right)\right)$
416
-
417
- $=2Φ(√n4)−1= 2\Phi\left(\frac{\sqrt{n}}{4}\right) - 1$
418
 
419
  Solving for $\Phi\left(\frac{\sqrt{n}}{4}\right)$:
420
 
421
- $0.975=Φ(√n4)0.975 = \Phi\left(\frac{\sqrt{n}}{4}\right)$
422
 
423
- $Φ−1(0.975)=√n4\Phi^{-1}(0.975) = \frac{\sqrt{n}}{4}$
424
 
425
- $1.96=√n41.96 = \frac{\sqrt{n}}{4}$
426
 
427
- $n=61.4n = 61.4$
428
 
429
- Therefore, we need to run the algorithm 62 times to estimate the mean runtime within ±0.5 seconds with 95% confidence.
 
 
430
  """
431
  )
432
  return
@@ -929,7 +934,6 @@ def _(mo):
929
  mo.vstack([distribution_type, sample_size, sim_count_slider]),
930
  run_explorer_button
931
  ], justify='space-around')
932
-
933
  return (
934
  controls,
935
  distribution_type,
 
6
  # "scipy==1.15.2",
7
  # "numpy==2.2.4",
8
  # "plotly==5.18.0",
9
+ # "wigglystuff==0.1.13",
10
  # ]
11
  # ///
12
 
13
  import marimo
14
 
15
+ __generated_with = "0.12.6"
16
  app = marimo.App(width="medium", app_title="Central Limit Theorem")
17
 
18
 
 
24
 
25
  _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part4/clt/), by Stanford professor Chris Piech._
26
 
27
+ The central limit theorem is honestly mind-blowing it's like magic that no matter what distribution you start with, the sampling distribution of means approaches a normal distribution as sample size increases.
28
+
29
+ Mathematically, if we have:
30
+
31
+ $X_1, X_2, \ldots, X_n$ as independent, identically distributed random variables with:
32
+
33
+ - Mean: $\mu$
34
+ - Variance: $\sigma^2 < \infty$
35
+
36
+ Then as $n \to \infty$:
37
+
38
+ $$\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^{n}X_i - \mu\right) \xrightarrow{d} \mathcal{N}(0, \sigma^2)$$
39
+
40
+ > _Note:_ The above LaTeX derivation is included as a reference. Credit for this formulation goes to the original source linked at the top of the notebook.
41
  """
42
  )
43
  return
 
55
 
56
  Let $X_1, X_2, \dots, X_n$ be independent and identically distributed random variables. The sum of these random variables approaches a normal distribution as $n \rightarrow \infty$:
57
 
58
+ $\sum_{i=1}^{n}X_i \sim \mathcal{N}(n \cdot \mu, n \cdot \sigma^2)$
59
 
60
  Where $\mu = E[X_i]$ and $\sigma^2 = \text{Var}(X_i)$. Since each $X_i$ is identically distributed, they share the same expectation and variance.
61
 
 
63
 
64
  Let $X_1, X_2, \dots, X_n$ be independent and identically distributed random variables. The average of these random variables approaches a normal distribution as $n \rightarrow \infty$:
65
 
66
+ $\frac{1}{n}\sum_{i=1}^{n}X_i \sim \mathcal{N}(\mu, \frac{\sigma^2}{n})$
67
 
68
  Where $\mu = E[X_i]$ and $\sigma^2 = \text{Var}(X_i)$.
69
 
 
325
  r"""
326
  ### Example 1: Dice Game
327
 
328
+ > _Note:_ The following application demonstrates the practical use of the Central Limit Theorem. The mathematical derivation is based on concepts from ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/clt/) by Chris Piech.
329
 
330
+ Let's solve a fun probability problem: You roll a 6-sided die 10 times and let $X$ represent the total value of all 10 dice: $X = X_1 + X_2 + \dots + X_{10}$. You win if $X \leq 25$ or $X \geq 45$. What's your probability of winning?
331
 
332
+ For a single die roll $X_i$, we know:
333
  - $E[X_i] = 3.5$
334
  - $\text{Var}(X_i) = \frac{35}{12}$
335
 
336
+ **Solution Approach:**
 
 
 
 
337
 
338
+ This is where the Central Limit Theorem shines! Since we're summing 10 independent, identically distributed random variables, we can approximate this sum with a normal distribution $Y$:
339
 
340
+ $Y \sim \mathcal{N}(10 \cdot E[X_i], 10 \cdot \text{Var}(X_i)) = \mathcal{N}(35, 29.2)$
341
 
342
+ Now calculating our winning probability:
343
 
344
+ $P(X \leq 25 \text{ or } X \geq 45) = P(X \leq 25) + P(X \geq 45)$
345
 
346
+ Since we're approximating a discrete distribution with a continuous one, we apply a continuity correction:
347
 
348
+ $\approx P(Y < 25.5) + P(Y > 44.5) = P(Y < 25.5) + [1 - P(Y < 44.5)]$
349
 
350
+ Converting to standard normal form:
351
 
352
+ $\approx \Phi\left(\frac{25.5 - 35}{\sqrt{29.2}}\right) + \left[1 - \Phi\left(\frac{44.5 - 35}{\sqrt{29.2}}\right)\right]$
353
 
354
+ $\approx \Phi(-1.76) + [1 - \Phi(1.76)]$
355
 
356
+ $\approx 0.039 + (1 - 0.961) \approx 0.078$
357
 
358
+ So your chance of winning is about 7.8% — not great odds, but that's probability for you!
 
359
  """
360
  )
361
  return
 
369
 
370
  dice_explanation = mo.md(
371
  r"""
372
+ **Understanding the Visualization:**
373
 
374
+ This graph shows our dice game in action. The blue bars represent the exact probability distribution for summing 10 dice, while the red curve shows our normal approximation from the Central Limit Theorem.
375
 
376
+ I've highlighted the winning regions in orange:
377
  - The left region where $X \leq 25$
378
  - The right region where $X \geq 45$
379
 
380
+ Together these regions cover about 7.8% of the total probability.
381
 
382
+ What's fascinating here is how closely the normal curve approximates the actual discrete distribution this is the Central Limit Theorem working its magic, even with just 10 random variables.
383
  """
384
  )
385
 
 
393
  r"""
394
  ### Example 2: Algorithm Runtime Estimation
395
 
396
+ > _Note:_ The following derivation demonstrates the practical application of the Central Limit Theorem for experimental design. The mathematical approach is based on concepts from ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/clt/) by Chris Piech.
397
+
398
+ Here's a practical problem I encounter in performance testing: You've developed a new algorithm and want to measure its average runtime. You know the variance is $\sigma^2 = 4 \text{ sec}^2$, but need to estimate the true mean runtime $t$.
399
 
400
+ The question: How many test runs do you need to be 95% confident your estimated mean is within ±0.5 seconds of the true value?
401
 
402
+ Let $X_i$ represent the runtime of the $i$-th test (for $1 \leq i \leq n$).
403
 
404
  **Solution:**
405
 
406
  We need to find $n$ such that:
407
 
408
+ $0.95 = P\left(-0.5 \leq \frac{\sum_{i=1}^n X_i}{n} - t \leq 0.5\right)$
 
 
 
 
 
409
 
410
+ The Central Limit Theorem tells us that as $n$ increases, the sample mean approaches a normal distribution. Let's standardize this to work with the standard normal distribution:
411
 
412
+ $Z = \frac{\left(\sum_{i=1}^n X_i\right) - n\mu}{\sigma \sqrt{n}} = \frac{\left(\sum_{i=1}^n X_i\right) - nt}{2 \sqrt{n}}$
413
 
414
+ Rewriting our probability constraint in terms of $Z$:
415
 
416
+ $0.95 = P\left(-0.5 \leq \frac{\sum_{i=1}^n X_i}{n} - t \leq 0.5\right) = P\left(\frac{-0.5 \sqrt{n}}{2} \leq Z \leq \frac{0.5 \sqrt{n}}{2}\right)$
417
 
418
+ Using the properties of the standard normal CDF:
419
 
420
+ $0.95 = \Phi\left(\frac{\sqrt{n}}{4}\right) - \Phi\left(-\frac{\sqrt{n}}{4}\right) = 2\Phi\left(\frac{\sqrt{n}}{4}\right) - 1$
 
 
 
 
421
 
422
  Solving for $\Phi\left(\frac{\sqrt{n}}{4}\right)$:
423
 
424
+ $0.975 = \Phi\left(\frac{\sqrt{n}}{4}\right)$
425
 
426
+ Using the inverse CDF:
427
 
428
+ $\Phi^{-1}(0.975) = \frac{\sqrt{n}}{4}$
429
 
430
+ $1.96 = \frac{\sqrt{n}}{4}$
431
 
432
+ $n = 61.4$
433
+
434
+ Rounding up, we need 62 test runs to achieve our desired confidence interval — a practical result we can immediately apply to our testing protocol.
435
  """
436
  )
437
  return
 
934
  mo.vstack([distribution_type, sample_size, sim_count_slider]),
935
  run_explorer_button
936
  ], justify='space-around')
 
937
  return (
938
  controls,
939
  distribution_type,
probability/19_maximum_likelihood_estimation.py CHANGED
@@ -133,6 +133,8 @@ def _(mo):
133
  r"""
134
  ## MLE for Bernoulli Distribution
135
 
 
 
136
  Let's start with a simple example: estimating the parameter $p$ of a Bernoulli distribution.
137
 
138
  ### The Model
 
133
  r"""
134
  ## MLE for Bernoulli Distribution
135
 
136
+ > _Note:_ The following derivation is included as reference material. The credit for this mathematical formulation belongs to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part5/mle/) by Chris Piech.
137
+
138
  Let's start with a simple example: estimating the parameter $p$ of a Bernoulli distribution.
139
 
140
  ### The Model