Spaces:
Running
Running
go over explanations
Browse files
probability/15_poisson_distribution.py
CHANGED
@@ -13,7 +13,7 @@
|
|
13 |
|
14 |
import marimo
|
15 |
|
16 |
-
__generated_with = "0.
|
17 |
app = marimo.App(width="medium", app_title="Poisson Distribution")
|
18 |
|
19 |
|
@@ -25,7 +25,9 @@ def _(mo):
|
|
25 |
|
26 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/poisson/), by Stanford professor Chris Piech._
|
27 |
|
28 |
-
|
|
|
|
|
29 |
"""
|
30 |
)
|
31 |
return
|
@@ -180,11 +182,11 @@ def _(mo):
|
|
180 |
r"""
|
181 |
## Poisson Intuition: Relation to Binomial Distribution
|
182 |
|
183 |
-
The Poisson distribution can be derived as a limiting case of the
|
184 |
|
185 |
-
Let's work
|
186 |
|
187 |
-
We could
|
188 |
|
189 |
Let's visualize this concept:
|
190 |
"""
|
@@ -231,7 +233,7 @@ def _(fig_to_image, mo, plt):
|
|
231 |
_explanation = mo.md(
|
232 |
r"""
|
233 |
In this visualization:
|
234 |
-
|
235 |
- Each rectangle represents a 1-second interval
|
236 |
- Blue rectangles indicate intervals where an event occurred
|
237 |
- Red dots show the actual event times (2.75s and 7.12s)
|
@@ -247,9 +249,9 @@ def _(fig_to_image, mo, plt):
|
|
247 |
def _(mo):
|
248 |
mo.md(
|
249 |
r"""
|
250 |
-
The total number of requests received over the minute can be approximated as the sum of
|
251 |
|
252 |
-
|
253 |
|
254 |
\begin{align}
|
255 |
\lambda &= E[X] && \text{Expectation matches historical average} \\
|
@@ -257,7 +259,7 @@ def _(mo):
|
|
257 |
p &= \frac{\lambda}{n} && \text{Solving for $p$}
|
258 |
\end{align}
|
259 |
|
260 |
-
|
261 |
|
262 |
$P(X = x) = {n \choose x} p^x (1-p)^{n-x}$
|
263 |
|
@@ -269,7 +271,7 @@ def _(mo):
|
|
269 |
P(X=3) &= {60 \choose 3} (5/60)^3 (55/60)^{60-3} \approx 0.1389
|
270 |
\end{align}
|
271 |
|
272 |
-
This
|
273 |
"""
|
274 |
)
|
275 |
return
|
@@ -283,7 +285,7 @@ def _(fig_to_image, mo, plt):
|
|
283 |
|
284 |
# Example events at 2.75s and 7.12s (convert to deciseconds)
|
285 |
events = [27.5, 71.2]
|
286 |
-
|
287 |
for i in range(100):
|
288 |
color = 'royalblue' if any(i <= event_val < i + 1 for event_val in events) else 'lightgray'
|
289 |
ax.add_patch(plt.Rectangle((i, 0), 0.9, 1, color=color))
|
@@ -434,21 +436,23 @@ def _(df, fig, fig_to_image, mo, n, p):
|
|
434 |
def _(mo):
|
435 |
mo.md(
|
436 |
r"""
|
437 |
-
As
|
438 |
|
439 |
- The number of trials $n$ approaches infinity
|
440 |
- The probability of success $p$ approaches zero
|
441 |
- The product $np = \lambda$ remains constant
|
442 |
|
443 |
-
This
|
444 |
|
445 |
## Derivation of the Poisson PMF
|
446 |
|
447 |
-
|
|
|
|
|
448 |
|
449 |
$P(X=x) = \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x}$
|
450 |
|
451 |
-
|
452 |
|
453 |
\begin{align}
|
454 |
P(X=x)
|
@@ -495,7 +499,7 @@ def _(mo):
|
|
495 |
&& \text{Simplifying}\\
|
496 |
\end{align}
|
497 |
|
498 |
-
This gives us
|
499 |
"""
|
500 |
)
|
501 |
return
|
|
|
13 |
|
14 |
import marimo
|
15 |
|
16 |
+
__generated_with = "0.12.6"
|
17 |
app = marimo.App(width="medium", app_title="Poisson Distribution")
|
18 |
|
19 |
|
|
|
25 |
|
26 |
_This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/poisson/), by Stanford professor Chris Piech._
|
27 |
|
28 |
+
The Poisson distribution is my go-to for modeling random events occurring over time or space. What makes it cool is that it only needs a single parameter λ (lambda), which represents both the mean and variance.
|
29 |
+
|
30 |
+
I find it particularly useful when events happen rarely but the opportunities for them to occur are numerous — like modeling website visits, dust/particle emissions or even typos in a document.
|
31 |
"""
|
32 |
)
|
33 |
return
|
|
|
182 |
r"""
|
183 |
## Poisson Intuition: Relation to Binomial Distribution
|
184 |
|
185 |
+
The Poisson distribution can be derived as a limiting case of the binomial distribution. I find this connection fascinating because it shows how seemingly different distributions are actually related.
|
186 |
|
187 |
+
Let's work through a practical example: predicting ride-sharing requests in a specific area over a one-minute interval. From historical data, we know that the average number of requests per minute is $\lambda = 5$.
|
188 |
|
189 |
+
We could model this using a binomial distribution by dividing our minute into smaller intervals. For example, splitting a minute into 60 seconds, where each second is a Bernoulli trial — either a request arrives (success) or it doesn't (failure).
|
190 |
|
191 |
Let's visualize this concept:
|
192 |
"""
|
|
|
233 |
_explanation = mo.md(
|
234 |
r"""
|
235 |
In this visualization:
|
236 |
+
|
237 |
- Each rectangle represents a 1-second interval
|
238 |
- Blue rectangles indicate intervals where an event occurred
|
239 |
- Red dots show the actual event times (2.75s and 7.12s)
|
|
|
249 |
def _(mo):
|
250 |
mo.md(
|
251 |
r"""
|
252 |
+
The total number of requests received over the minute can be approximated as the sum of sixty indicator variables, which aligns perfectly with the binomial distribution — a sum of Bernoullis.
|
253 |
|
254 |
+
If we define $X$ as the number of requests in a minute, $X$ follows a binomial with $n=60$ trials. To determine the success probability $p$, we need to match the expected value with our historical average $\lambda$:
|
255 |
|
256 |
\begin{align}
|
257 |
\lambda &= E[X] && \text{Expectation matches historical average} \\
|
|
|
259 |
p &= \frac{\lambda}{n} && \text{Solving for $p$}
|
260 |
\end{align}
|
261 |
|
262 |
+
With $\lambda=5$ and $n=60$, we get $p=\frac{5}{60}=\frac{1}{12}$, so $X \sim \text{Bin}(n=60, p=\frac{5}{60})$. Using the binomial PMF:
|
263 |
|
264 |
$P(X = x) = {n \choose x} p^x (1-p)^{n-x}$
|
265 |
|
|
|
271 |
P(X=3) &= {60 \choose 3} (5/60)^3 (55/60)^{60-3} \approx 0.1389
|
272 |
\end{align}
|
273 |
|
274 |
+
This approximation works well, but it doesn't account for multiple events occurring in a single second. To address this limitation, we can use even finer intervals — perhaps 600 deciseconds (tenths of a second):
|
275 |
"""
|
276 |
)
|
277 |
return
|
|
|
285 |
|
286 |
# Example events at 2.75s and 7.12s (convert to deciseconds)
|
287 |
events = [27.5, 71.2]
|
288 |
+
|
289 |
for i in range(100):
|
290 |
color = 'royalblue' if any(i <= event_val < i + 1 for event_val in events) else 'lightgray'
|
291 |
ax.add_patch(plt.Rectangle((i, 0), 0.9, 1, color=color))
|
|
|
436 |
def _(mo):
|
437 |
mo.md(
|
438 |
r"""
|
439 |
+
As our interactive comparison demonstrates, the binomial distribution converges to the Poisson distribution as we increase the number of intervals! This remarkable relationship exists because the Poisson distribution is actually the limiting case of the binomial when:
|
440 |
|
441 |
- The number of trials $n$ approaches infinity
|
442 |
- The probability of success $p$ approaches zero
|
443 |
- The product $np = \lambda$ remains constant
|
444 |
|
445 |
+
This elegance is why I find the Poisson distribution so powerful — it simplifies what would otherwise be a cumbersome binomial with numerous trials and tiny success probabilities.
|
446 |
|
447 |
## Derivation of the Poisson PMF
|
448 |
|
449 |
+
> _Note:_ The following mathematical derivation is included as reference material. The credit for this formulation belongs to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/poisson/) by Chris Piech.
|
450 |
+
|
451 |
+
The Poisson PMF can be derived by taking the limit of the binomial PMF as $n \to \infty$:
|
452 |
|
453 |
$P(X=x) = \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x}$
|
454 |
|
455 |
+
Through a series of algebraic manipulations:
|
456 |
|
457 |
\begin{align}
|
458 |
P(X=x)
|
|
|
499 |
&& \text{Simplifying}\\
|
500 |
\end{align}
|
501 |
|
502 |
+
This gives us the elegant Poisson PMF formula: $P(X=x) = \frac{\lambda^x \cdot e^{-\lambda}}{x!}$
|
503 |
"""
|
504 |
)
|
505 |
return
|