XXX4Fans
3blue1brown from patreon
3blue1brown

patreon


Update

Hey everyone,

I wanted to share a bit about what videos have been brewing and what to expect in the hopefully-not-too-distant future.

The next video out will discuss how to reason about the probabilities surrounding medical tests.  At the base, I think it would be helpful to simply visualize what all the relevant jargon means, terms like sensitivity, specificity, PPV, etc.  But beyond that, it seems worth clarifying what exactly these tests tell us, what they don't, and to address some of the misconceptions and fallacies around them.

At the heart, it's another video about Bayes' rule, which has been covered on the channel before.  But with such a concrete and topical example this gives a good opportunity to talk about this rule not just as a formula, but as a way to think, and to hopefully have it sink in more viscerally than it might for more abstract examples.  Also, even for those who are very comfortable with Bayes' rule, there remain a number of fallacies and misconceptions around applying it to medical tests.

Of course, the risk with such a topical example that matters a lot in people's lives is that it's terrifying to say anything concrete or put up any hard numbers.  There's so much uncertainty around these numbers, such a wide variety of constantly changing tests looking for different things, and so much nuance in the underlying reasons for false negatives and false positives that plugging in hard numbers runs the risk of being downright irresponsible without appropriate caveats.  On the other hand, it would be deeply unsatisfying to completely throw up our arms and say nothing about how the central lessons of the video apply to current tests.

I've talked with a few doctors and an epidemiologist about this topic, and what's felt substantive about these conversations is not so much a reduction in the error bars around, say, the sensitivity of a given RT-PCR test, but instead some added clarity on why things are so inherently uncertain.  What I feel most comfortable talking about is how all the relevant uncertainties related to each other, and how certain assumptions get us to certain conclusions, so long as it's crystal clear what's being assumed.  Anyway, I'll save that all for the video itself.

--

Before I turned to this I had been working for longer than is reasonable on the third chapter of the "probabilities of probabilities" sequence.  As it happens, this also involves Bayes' rule, but in the continuous setting of updating probability density function rather simple yes/no hypotheses (e.g. whether or not you have a given disease).  More specifically it aims to explain the beta distribution.  I can't put my finger on why exactly that one has proven so difficult to write, especially given that in the end, I hope for it to come off as a relatively simple concept.

--

One other probability-related topic on the burner is to explain why pi shows up in the formula for a Gaussian distribution.  There is a standard integral trick for showing why the area under the curve exp(-x^2) is the square root of pi, which is typically what's given as an explanation of pi's presence in a Gaussian.  But the aim of this video would be to get at something deeper.  The relevant integral trick only works because of certain properties that this function exp(-x^2) has with respect to a kind of rotational symmetry, so a satisfying answer here would involve explaining why we should expect that the function which comes up so much in statistics, i.e. the one which will result from the central limit theorem, happens to have this relevant property related to rotational symmetry.

Feel free to share thoughts you have about any of these topics, I'm all ears!

-Grant

Comments

Never mind the probabilities of probabilities got cut off in the list and I missed it. Oops

These all sounds pretty great. I have been looking forward to the beta distribution one. Was a finalized version of part 2 ever posted to YouTube?

My prof just went over stochastic processes and steady-state probabilities used to describe the long-run behavior or a Markov chain for my Decision Analytics class. He showed the transition matrix for a steady-state ergodic chain ultimately reached a limit as n approaches infinity where all the elements in the matrix are pi !!!!! ... Or actually, they were asymptotically approaching pi, since it was a limit. I don't know how you could work something like that into the video, but maybe it would be a good example of places where the Gaussian distribution becomes pi. I don't know enough to say if that's actually an example of how you meant "pi's presence in a Gaussian". If so, I'd love to know!

Sharing statistical concepts that can help the public understand how misleading certain discoveries appear sometimes is important. I like this 2014 article on computational Biology: Colquhoun D. 2014 An investigation of the false discovery rate and the misinterpretation of p-values. R. Soc. open sci. 1: 140216. http://dx.doi.org/10.1098/rsos.140216

I would like you to show "best practices" in presenting statistical information to the general public. Many studies show even the medical profession is confused when statistics are presented in the way statisticians use for each other. For example, see : https://www.bbc.com/news/magazine-28166019

William Walters

Can you please elaborate how false positive (Type I error) and false negative (Type II error) can be equal?

A practical example from the news recently: - A certain (cancer) test is 90% accurate. (10% of false negative / 10% false positive) - Historically, 0.4% of people who get the test truly have that type of cancer. - You get the test, and it comes back positive. Q: What's the likelihood that you actually have that cancer? A: 3.5% !! (No where near the 90% that even the doctors might guess.)

I think explaining how the various formulas for probability distributions were derived is incredibly important. One of the things that took me from "statistics/probability is this weird thing that I don't understand" to "oh, this makes so much sense" was the realization that the formula for a Poisson distribution was derived from a Normal distribution, and why. It might sound simple, but before that I always felt that probability/stats was some kind of magical thing with a bunch of random formulas, this gave me a sense of the *reasoning* behind it.

Edan Maor

I think this video will be VERY important for premed and medical students. As an oncologist myself, I frequently remind my residents and fellows that the majority (if not all really) of the medical decision making (MDM) is essentially an approximation of Bayes. I almost always reference your prior video as a great intro to the topic. For example, I constantly update patient risk of cancer recurrence, the potential toxicity of a regimen, and the likely benefit of a new therapy based on published data and individual performance status and comorbidities. I would go a step further and maybe hope you (briefly) mention the Bayesian methods of human inference. Many are aware of the inductive nature of human perception, classification, and day-to-day decision making, but very few will actually relay that to the Bayesian mathematical inference of such process, where beautiful models such as the free energy principle exist. Some nice reviews include Friston. Nat Rev Neurosci 2010, Gardner Nat Rev Neurosci 2019, Yon et al Nat comm 2020.

Yes, that's basically the idea. But I think you need the scaling exponent to be 1/2 in order for the argument to give you rotational invariance. A clarification: "Stable" actually does have a very specific meaning. (https://en.wikipedia.org/wiki/Stable_distribution). Looking at your comment made me realize my first question sounded like I was asking if we can give "stable" a precise meaning. I edited it to ask: can we make more precise the analogy between dynamical systems and distributions? It seems like the idea in both situations is you repeatedly do the same thing to a large set of objects, and the result approaches an object that is invariant under what you're doing. (It's broader than that; it's also the same thing that happens with ratios of successive terms of the Fibbonacci sequence). Part of the reason for this phenomenon is that if there's a limit at all, it has to be stable, but still, when and why should you expect to get any sort of limit?

Daniel and Rebekah Slonim

I think I've followed this thread, and I really, really like it. I'm going to write it out here another way to make sure I understand it and also give another choice of words / steps. First, take two 'stable' distributions X and Y, and draw out the plot of their joint probability f(x, y) = P(X=x)P(Y=y). Because X and Y are independent and X+Y is their sum, by the properties of the 'stable' distribution, the distribution of X+Y should be the same 'shape' (roughly speaking) as X and Y. Geometrically, this means that if you integrate in the joint probability space along the lines defined by x+y = c (for some constant c), you get the same 'stable' distribution. Visually, I think of it as "looking" along the diagonal produces the same shape as 'looking' at this lump in either of the horizontal directions (x or y) But by the same argument, so should 2X+Y also be this 'stable' distribution. And that visually means that looking mostly along X but slightly along Y, you should see a very similar shape. But what's special about 2 and 1? How about 5X + 4Y, or 1000X+1Y, etc? So of course a distribution 'stable' in this way will be rotationally symmetric. There is of course a lot of fuzziness in defining 'stable' and what 'same shape' means but perhaps that can be addressed with some algebra.

Really looking forward to the pi the Gaussian video!!

Looking forward to the way you visualize rotational invariance of the normal distribution. A nice visual intuition I have for this topic comes from thinking about 2 dimensional random walks. There it is intuitively clear that the distribution for a random walk should be roughly circular, only depending on Euclidean distance and not on fine-grained details of the lattice. The 2D random walk be described by a pair of IID binomials. It would be nice to visualize the probability distribution and see how it becomes more and more rotationally invariant as the distributions converge to normals.

Eric Severson

Nice! I'm a young intern in medicine and I'm really looking forward for your vision on these test. Thanks for attacking this topic. Even when we study it for many years I still don't feel like I grasped on the essence of those test!

4. All of this ultimately has to do with the Pythagorean Theorem. Can we poinpoint exactly where that theorem fits into the above analysis? If the Pythagorean Theorem were different, how would the Central Limit Theorem be different? (or is that even a meaningful question?) Somehow, it seems like Pythagoras is at the heart of the CLT. I think that's really amazing. The CLT is one of the most monumental and profound results in all of probability, with an extremely wide range of applications in the real world, and somehow it all boils down to the oldest, most famous, simplest, most surprising, and most elegant mathematical theorem of all time. 

Daniel and Rebekah Slonim

3. Where do other stable distributions fit into this picture? There's a stable law for any rescaling exponent $1/a$ if a<2, but why is 2 the cutoff?

Daniel and Rebekah Slonim

Here are some things I'd still like to sort out, if I can: 1. Can we make the analogy between stability of distributions and stability of dynamical systems more precise? 2. Is there some way to turn the above discussion into a rigorous proof of the Central Limit Theorem, with the density of the normal distribution as a corollary? If so, it might involve something like defining Brownian motion as a limit of scaled sums of iid Rademacher random variables, and then using an embedding theorem like the one in Chapter 8 of Durrett (https://services.math.duke.edu/~rtd/PTE/PTE5_011119.pdf).

Daniel and Rebekah Slonim

I'm still trying to think through all of these things. This is all a very rough explanation, as I'm still in some sense at the beginning of my thinking about this stuff. I'd like to sit down at some point and figure it out more thoroughly, and hopefully turn it into a talk or video of my own, but for now, since you're thinking about similar questions, I wanted to share what I'm thinking. 

Daniel and Rebekah Slonim

The stability is very connected to the rotational invariance. I think it also has to do with the fact that the scaling exponent is 1/2. For now, define "normal" to mean "invariant over sums of iid random variables with the right scaling", and assume we don't know anything else about the normal distribution. If $X$, $Y$, and $Z$ are all iid normal with standard deviation 1, then $aX$ and $bY$ are normal with standard deviation $a$ and $b$, respectively. For now assume $a$ and $b$ are rationals; we can think of $aX$ and $bY$ both as sums of a ton of iid normal random variables with really tiny standard deviation. By our invariance assumption  $aX+bY$ should be normal with standard deviation $\sqrt{a^2+b^2}$. Now think about $X$ and $Y$ on a cartesian plane, and think about the lines of form $aX+bY

Daniel and Rebekah Slonim

So I think the reason the normal distribution is the one that has the CLT is that it's invariant under the appropriate addition and rescaling. Very loosely, just like dynamical systems tend to asymptotically approach stable states, when you repeatedly perform some sort of addition and rescaling operation, you end up with something stable.

Daniel and Rebekah Slonim

Can't wait for beta distribution video (and secretly hope to plant the seed for gamma, exponential (memorylessness!), and possibly Weibull?!). A boy can dream. :)

Glad to see you around again, I was just starting to wonder when I'd last seen your updates, and worry whether you were ok. Good to see you're fine and just working a bit longer :)

Boudewijn Redeker

which reminded me of "measure with micrometer; mark with chalk ; cut with axe" from ye old murphy's law book, methinks...

It would help if you could mention the statistical significance, and importance of proper sampling, IN PARTICULAR the lancet med journal published the russian vaccine sputnik with a population of 38+38(control) "successful test" within the Russian career military, which are not representative of general population health, nor the sample size is meaningful for anything, etc. See https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30709-X/fulltext and within it the reference to https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)31866-3/fulltext which is pure BS from stats point of view....

I've been curious what it would mean if different vaccines have different effectiveness. What if we have a situation with multiple vaccines available, but some are more expensive and require refrigeration and multiple doses, versus a cheap single-dose shelf stable option that's less effective. Where should we invest? Is there a world where we buy both for different populations?

Adam Berkan

I've been waiting for "Probability of probabilities" for a while now, and it's a good thing that it's announced! If I learnt one thing with your videos about pi^2/6 and colliding blocks, it's that pi implies a hidden circle. I'm looking forward discovering where it's located... can't wait!!! Thanks for those news and for your very qualitative work.

Oltarus

Wow... this is almost exactly what I prayed for after I wrote my blog post inspired by your previous videos. I just want more clarity around what I experienced so that I can trust doctors and their findings. In my personal experience, the medical profession relies heavily on past publications they have read to determine treatment rather than utilizing Baye's theory directly to determine treatment. I really hope this video can help technical professionals develop better diagnostic tools for doctors. I feel that, given the right tools, the medical profession can better diagnose individuals on a case by case basis. This type of diagnosis would likely be more effective in contrast to trying to classify individuals into a group for which there is already a prescribed method of treatment. If the right tools had been available to my doctors I have no reason to doubt that I would still have a working thyroid today.

Joshua Davis

If you haven't already come across https://www.rapidtests.org/ I highly recommend taking a look in case it leads you down an interesting path you hadn't explored yet.

Both videos sound great! I am particularly excited to watch the one about pi and the Gaussian distribution. Just curious, are you planning or currently working on another essentials of series?

I am so excited to see the "probabilities of probabilities" series grow. Thinking in probabilities and not just binary/discrete outcomes, I think, is so important yet is not very widespread. You always do such a phenomenal job getting at the intuition of the subject, I bet I'll be forwarding a lot of your videos to my colleagues here!

Zachariah Rosenberg

No pressure, but definitely don't shy away from grounding in the real world. I've shared you earlier Covid modeling and exponential videos widely and they have really helped many non-mathematically literate get a sense of what's going on. Of course, I'm also excited to see why pi shows up in the Gaussian distribution :)

Super excited to see pi in the Gaussian distribution, I've always wondered about that!

thanks dad


Related Creators