XXX4Fans
3blue1brown from patreon
3blue1brown

patreon


Any requests for what to include in a central limit theorem video?

Hey everyone,

What began as the follow-on to the convolutions video has become a video about the central limit theorem. In particular, I'd like to show how a certain perspective on convolutions makes it easier to "see" why the function e^(-x^2) is the one that shows up so prominently in statistics, as opposed to some other function that bulges in the middle and tapers out in the ends.

The explanation is loosely connected to a delightful, and somewhat classic, argument for why the area under this curve is the square root of π, which in turn explains why the formula for a normal distribution has a π in it. Right now, the plan is to pair the main CLT explainer together with a second video talking more about why that pi is there, and how a kind of circular symmetry more core to this distribution than it might initially seem.

Given that pi day is coming up, it's only fitting to target publishing the two of these around that time.

In the meantime, I'm putting together a draft of the first one to share early here on Patreon, and as I do so I thought I'd ask whether any of you have any requests. For those of you who have seen the central limit theorem before, was there anything you found confusing upon first learning it? Do you have any favorite examples that helped to illustrate what it is saying? Do you have any lingering questions about the theorem, normal distributions, etc.? 

For you're early viewing pleasure, here are a couple of simulations I rendered for the project. Let me know if you have thoughts for other such simulations in a similar vein.

Cheers,
Grant

Comments

I believe 2 emerges from the Euclidean distance https://en.wikipedia.org/wiki/Euclidean_distance , which is still 2 regardless of the dimensionality of the space. But I'll leave it to Grant to get at a more satisfying answer.

Hey Grant, Finally you're making a video about something that I know something about :) These thoughts came to mind in no particular order, I won't be surprised if some of them are already in your plans. At the same time, some may be completely out of scope for this video, but one hopes they make it into a future video anyway. Your existing plan sounds delightful. I think the emergence of pi is something that many people (including me) puzzle about and don't quite grasp on an intuitive level (I've seen the derivations, I saw pi emerge, and yet ... ). One think I believe is worth focus on its own is the formula of the normal distribution and why it emerges in all kinds of situations, not just with averaging as in CLT, but also being derived in several different ways. Gauss' own derivation is brilliancy incarnate (but may not lend itself to visualization). The way it emerges as an approximation to the binomial is perhaps the most closely related to the CLT. But the most striking in my view is the Herschel-Maxwell derivation of the normal. This also shows where pi may have emerged and I won't be surprised if it was in your original plans. Other than derivations, it would be cool to touch on some history of the curve and how it used to be parameterized differently at different times and how this specific form emerged over time. May be an overkill, but fascinating in its own right. Another cool thing would be linking the formula to least squares and Euclidean distance in n-dimensional space. I'm very excited to see this video, and in fact, would be delighted to see a whole series about probability distributions, statistical inference, and statistical models. Happy to even help with the content of such series if you ever embark on one. deeb

All right, I don't think I'm ever getting a tatoo, but if I do, now I know what I'm getting.

Daniel and Rebekah Slonim

Yes, the fact that asymmetric distributions approach something symmetric is part of the magic of CLT, and it would be a shame to leave that out.

Daniel and Rebekah Slonim

I question I'd like to see you address, to the extent you're able, is what is so special about the number 2? There are alpha stable distributions for any 0<\alpha\leq2, but suddenly at 2, the picture changes. I don't understand the picture super well for \alpha<2, but my best understanding is that for a lot of random variables, you can ad them up and divide by n^{1/alpha}, for some alpha corresponding to the decay of the tails, and they'll converge to a stable law. But if the tails decay fast enough, then at some point the correct alpha stops changing. Tails can decay like x^{-3}, x^{-4}, x^{-5}, or e^{-x}, and it doesn't matter, your alpha is 2. And suddenly you don't just have a large class of i.i.d. sums converging, you have i.i.d. sums of all random variables with finite second moment converging. Why is 2 so special? You can pose a very related question without specifically addressing alpha-stable random variables. I like to show students this slideshow: https://docs.google.com/presentation/d/1pXZkdV8K6sixdMtRpIX4jPu7PtvBEtPMdn_6unWeMMg/edit?usp=sharing It represents the probabilities of a simple symmetric 1D random walk started at 0 being at various sites after n steps. If you adjust the y-axis to make everything stay visible (e.g., adjust so that visually, the shaded area stays the same), but don't adjust the x-axis, everything gets flat, and approaches a uniform distribution on a finite set, corresponding to the fact that the probabilities get small together, becoming (geometrically as well as arithmetically) closer and closer to one another as they decrease. On the other hand, if you zoom out linearly, letting your x-axis show on the order of n sites after n steps, then what you see will start to approach a dirac distribution at 0, corresponding to the weak law of large numbers. Somewhere in between, there's a rate of zooming out that gives you a nice, clean convergence to a nice-looking curve, and that rate is the square root of n. The two major questions this picture raises are 1. Why e^{-x^2/2}, and 2. Why square root of n? Why not n^{0.7386}*(log(n))^{-3} or something? It sounds like you're planning to address the first question, and I'm looking forward to seeing what you've got. But I'm equally interested in the second question, and I'd like to see you at least raise and discuss the question, if not answer it. It's worth noting that outside of sums of i.i.d. random variables, there are lots of very natural situations where you want to scale something that depends on a large number of random variables, but the correct scaling is on the order of something other than n^{1/2}. For example, if you assign i.i.d. exponential weights to every site on the 2D integer lattice, and then uniformly choose one of the (2n choose n) up-right paths from (0,0) to (n,n), the amount of weight you collect and the distance your path wanders from the straight line y=x will both fluctuate on the order of n^{1/2}. However, if you look for the path that collects the most weight possible, and take that path, then the amount of weight you collect will fluctuate on the order of n^{1/3}, while your wandering from the straight line will be on the order of n^{2/3}. There are lots of mathematicians out there who would love nothing more than to prove that these fluctuation orders are 1/3 and 2/3 for general weights, not just exponential, but this is an extremely difficult problem.

Daniel and Rebekah Slonim

A word of advice for this video: You don't have to be accurate, as long as you are eventually accurate.

For me the most natural way to think of the central limit theorem is its connection (through thermodynamics second law) to information theory and as such a fundamental way in which we can mathematically model any phyiscal system through the way we can gain information about the individual elements inside it. And by the way I can't wait to see your take on this, as always I wish I had seen your videos when I was in highschool.

As an Engineering student with heavy math background, I found it quite interesting how the convolutions of the pdfs (after normalization) always end up degenerating in the Gaussian pdf. Note: the convolution trick can be used because the random variables are independent. So, for example, if we start with a rectangular pulse (Heaviside Pi) and apply convolution many times with itself, we end up with a Gaussian shape. A quick search found me some nice pictures of that: http://oscar6echo.blogspot.com/2012/10/convolve-n-square-pulses-to-gaussian.html This is closely related with how the (discrete) binomial distribution can be approximated with a (continuous) Gaussian distribution. Also, some mention of the connection to the (weak?) law of large numbers would be interesting IMHO.

jp42

I have a tattoo of this integral. https://i.postimg.cc/PJ9ynTrR/4325704-E-BF9-B-4-D44-B7-C9-FA8-D1-F196780.jpg

The Great Quux

Somewhat ironically I have a student currently doing their final project on the Normal Distribution and the central limit theorem. I would have recommended this resource if it existed! :) One suggestion... I am glad you will be showing the normal distribution arising from very not-normal population distributions. I found such simulations to be very dramatic and motivational when I first studied this, they were convincing in a viceral way that the more formal arguments were not. But the example of a skewed distribution you show is symmetric. Probably you are doing others, but I just want to note that the fact that an asymmetric and bimodal distribution also works was the most striking of these demonstrations for me, and indeed the theorem still holds just fine for most such distributions.

Philip Freeman

I think you should include one example of when it does not apply. My favourite is p(x) = A/(1+x^2) (infinite variance and other issues too)

Bob Dowling

Coming at this from the perspective of an AP Stats teacher, one question I've never been able to answer for my students in a satisfying way is why the textbooks insist on $n \ge 30$. Obviously, if 30 is adequate, 29 is not terrible, but I don't have a calculation that pops out "30" at the end (or 29.6 or 11*e or what have you). I found a document that said $n&lt;8$ is okay if the parent distribution is itself normal; $8 \le n &lt; 15$ is okay if the parent is "strongly normal," but outliers, e.g., will break things with such a low n; that $15 \le n &lt; 30$ "shouldn't be an issue except if there is a strange sample distribution or extreme outliers." This to say that there is more going on than just "30" -- and I agree with a previous post that the symmetry of your loaded dice simulation might mask how impressive the CLT is. As a final thought, when saying that a binomial distribution is "approximately normal," another number that seemingly comes from the blue is that the expected number of success and failures, np and n(1-p) respectively, must both be ten or more (the "Large Counts Condition"). When creating confidence intervals, students are told to use the LCC for proportions and the CLT for means, but there's nothing to my mind that says you couldn't use the CLT for both. Yet if you have a binomial distribution where 90% of the time you get 1 and 10% of the time you get 0, the CLT says $n \ge 30$ is fine but the LCC implies you need $n \ge 100$. That contradiction irks me. Finally, just wanted to mention that my son was one of the throng of teenagers who came to see your Saturday talk at the JMM this year.

Hi Grant, I love the simulation where you have balls falling on either side of a series of pegs and forming a physical depiction of the normal curve (I've seen them in real life in museums and such). But in your video, the bins are wider than they need to be in that even if a ball takes a right (or left) turn at every possible junction, it will land far from the ends. If you could limit the width of bins to match the possible outcomes, I think you'd get a bit more visual clarity.

Marc Cohen


Related Creators