XXX4Fans
3blue1brown from patreon
3blue1brown

patreon


Share your thoughts on probability

Hello Patrons, 

Now that calculus is wrapped up, that means the next big "Essence of" project will be on probability.  Even though I won't start working on it for a few weeks, instead focussing on two upcoming stand-alone videos, I'd love to hear your thoughts on what you'd like to see in the probability series.

What topics do you want?  What perspectives made things click?  What do you think is not represented in other resources about probability?  What aspects of the study have never quite settled in your mind? 

Throughout the calculus project, the most significant influence of your feedback was the way that comments on earlier videos shaped how I wrote later videos, so in that spirit comments you make here can carry quite a bit of influence.  

-Grant

Comments

I think a chapter or two on classical and widely used fitting methods would be helpful for a lot of people (i.e. linear/logistic regressions, chi-squared, etc.) and tie them in with the more fundamental theory you are presenting.

Hobart Young

Thank you very much! I have been waiting for probability a lot!

Sergiu Coroi

Basically, everything in Deep Learning by Ian Goodfellow on probability theory.

Jason Benn

@John Rauser: Could you recommend some resources to get started with/learn more about simulation?

Zac

Just as a suggestion, using Thompson sampling for multi-armed bandit problems/AB testing is a great demonstration of some of those concepts. Here was one of my first introductions: <a href="https://learnforeverlearn.com/bandits/" rel="nofollow noopener" target="_blank">https://learnforeverlearn.com/bandits/</a>

Zac

+1 for everything that Jonathan mentioned above. I'd really like to see some coverage on modeling/simulation (especially with MCMC). It would be great if you could also cover convolution and some Bayesian Inference as well. Thanks for all of your great work! I love your videos!

Zac

Information Theory (entropy, KL divergence, cross-entropy, etc), in-depth discussion of probability mass function vs probability density function, mixtures of distributions, chain/product rule of conditional probabilities, factorization of a probability distribution with a graph, conditional independence, maximum likelihood estimation

Jonathan

For me, things really started to click when I got permission to simulate instead of doing calculus. This happened after I read George Cobb's article "The Introductory Statistics Course: A Ptolemaic Curriculum" (<a href="http://escholarship.org/uc/item/6hb3k0nz)." rel="nofollow noopener" target="_blank">http://escholarship.org/uc/item/6hb3k0nz).</a>Granted that article is more about statistics than probability, but the ideas are relevant. I wrote a talk (<a href="https://www.youtube.com/watch?v=5Dnw46eC-0o)" rel="nofollow noopener" target="_blank">https://www.youtube.com/watch?v=5Dnw46eC-0o)</a> where I tried to convey some of Cobb's sentiment, encouraging people to examine statistics by watching processes unfold via simulation. My goal there was to dispel people's fears, which result from statistical concepts being presented as complete mathematical objects, devoid of any story about how they were created or why they are as they are. The talk I've always wanted to write, but haven't worked up to yet, is one that walks through the historical development of the LLN, including Bernoulli's proof, and then proceeds to develop the CLT via DeMoivre's reasoning. Maybe you can cover that material? (Please, please!) Stigler's History of Statistics is good reference material.

John Rauser

I'm not sure if this is too late, but I would appreciate it if you consider topics like: Gaussian/Normal distributions, probability density function. These topics are often used in AI for analyzing data. Where does the formula for example come from and how to combine 2 Gaussian distributions?

Stef van der Struijk

I think it's a great idea. They are intrinsically difficult topics, so putting together a good explanation for any one of them seems both difficult and fruitful.

3blue1brown

That's a very charming way to describe things. I might just have to steal it :)

3blue1brown

Ha, I know the feeling. What makes Bayes hard to cover is that it's easy to make it feel reasonable in the moment, but making it stick is strangely elusive.

3blue1brown

Chris Olah's work in general is fantastic.

3blue1brown

Is measure theory in the context of probability asking for too much? I really hope that someday you'll cover analysis and number theory as series in their own right.

Tavpritesh Sethi

How e^-(x^2) connects to pascals triangle and the whole idea of a normal distribution, is something I'm yet to understand.

Myles Buckley

The whole thing is puzzling to me. It is just so hard. My main interest is in statistics and machine learning.

normal, binomial and poison distributions

Royal with Cheese

The generalization of mean and variance as moments is interesting to me.

Devin Neal

Maybe you could also speak about information/entropy?

In addition to some very good input so far, I'd like to see some discussion on distance/divergence measures between probability distributions. Also, I believe the phenomenon of explaining away in graphical models is very important. Depending on how advanced you get you may want to talk about the concept of a 'distribution over distributions' as in a Dirichlet process. In addition, Jensen's inequality would be nice. Also, the concept of mean, median and mode beyond associated with the minimisation of different loss functions is a key idea I think.

Sanjeevan Ahilan

tf is variance

Matthew Hausmann

The interesting thing is that statistics becomes very intuitive when you look at extreme examples. The "well that doesn't seem right" conclusion of flipping a penny 100 times and only getting heads. "maybe something is up." Then we try to quantify that notion. From there confidence intervals become pretty intuitive. It is the range for which you say, "yeah, that is fine." These confidence intervals can vary based on how sure you need to be. A banker flipping a penny to check if it is okay doesn't need to be as confident in the fairness of the penny compared with a coin flipping champion of the world. Then you look at test verses control situations and we assume that the test is the same as the control (null hypothesis). Then if the test starts acting differently, then we say "maybe we did something that worked."

Jason Caggiano

I would love to see a focus on statistics as well. Maybe something about QM and Probability Amplitudes too?

Sidhant Rastogi

Just get ET Jaynes "Probability Theory" and work through that. It gets very complicated so pick out the simple parts. Someone else in the comments made the point that Jaynes proved that you don't need balls in urns with replacement to derive probability theory - it's jaw dropping. Also, Richard T Cox "Algebra of Probable Inference." Very, very good.

Peter Dulimov

What I'm almost always confused by are things like the chi-squared test: to determine whether things are fair (or more generally, what distribution they have) based on a limited set of observations. Same goes for determining whether things are different statistically speaking. This could be a great opportunity to dispel some misconceptions about (Shannon) entropy. I also didn't understand Baysean methods until I read ch 6 of "Algorithms to Live By"; it starts with the German tank problem, then does Laplace's Law before coming to Bayes' Rule. It even covers power-law, normal and Erlang distributions, which is really fun. While I'm at it, the book also presents the optimal stopping problem and the multi-armed bandit, which are great applications.

Okuno Zankoku

thanks for your video! for me, I always wonder why (for example) Gauss distribution is calculated using the crazy equation, how mathematician invented them at the first place?

waterWhite

Super rudimentary, but something I've encountered in tutoring sessions is something to hammer in the idea that even if you have N possible outcomes, they're not all equally likely. Maybe useful to later tie with Bayes theorem? The idea of modeling a distribution based on data (reverse-engineering the true probability distribution) and the idea of "Given this assumption, how likely was this outcome?". These are crucial in my eyes, since they form a basis for critical reading of science news.

Henri Taarisen

Ok hope that this comment still is in time. When I was new to probability theory those various drawing models didn't yet really click with me. The number of possibilities to draw with or without repetition was something I just learned for the test instead of getting a feeling for it. I would hope that actually displaying all subgroups of k items in a set of n items might help here. What bugs me today in many probability riddles today is that they implicitly assume one of various possible probability distributions. One classic is "proving" a probability by constructing a bijection from a continuous domain to some geometric figure where a probability p is easily "seen". What the explainer often forgets is that numerous such bijections are possible. Otherwise: regression, model fitting, X^2, .... When I let my phantasy fly on topics you could cover in higher chapters I immediately think of the probabilistic method. It would certainly help to mention how it is a more general argument w.r.t the pigeon hole principle. But maybe that's already too combinatorial.

Lionel Pöffel

The expectation of a function evaluated across a random variable is not necessarily the same as the function evaluated at the expectation of the random variable. I am struggling to fit this into my understanding of calibration, where we compare measured data to our model and find what we think the values of the unknown parameters should be. I would like you to help me properly think about the calibration problem in domains of high uncertainty.

Julian

Here is a thought on Bayes' Theorem, if it is helpful: <a href="https://docs.google.com/document/d/1Lz-zAWKOKbdCY_CxOKjOpQcxt4-cJ3bGQll9O8qXc7U/edit" rel="nofollow noopener" target="_blank">https://docs.google.com/document/d/1Lz-zAWKOKbdCY_CxOKjOpQcxt4-cJ3bGQll9O8qXc7U/edit</a>

Nicholas Sterling

The Monty Hall problem, since it is counter intuitive, is probably worth a mention.

Also, another Essense series you could do is multy variable calculus, just the basics, like topology of Euclidian spaces or maybe son partial derivatives or the Stokes Theorem, something like that. These series are great. I hope you get around to doing them eventually and I hope someday I can becaome a bigger Patreon, wish you well. -Oscar

Oscar Ivan Miranda Alcocer

I just want to give you a litle tip from my own desire, I think you should some videos explaining the Millenium Price problems like you did with the Riemman Hypothesys. Maybe you should start with the only one that has been solve, the Poincare Conjecture. The Navier Stokes Equations should be fun to watch the animations. I hope you have the time and I know I'm not a big enough Patreon to ask for anything, but just think about it.

Oscar Ivan Miranda Alcocer

Hey Grant, I'm going to start binge watching your calculus series, I waited to watch everything all together. Anyway, it's a good topic probability, I don't know much about it, but i'm eager to learn.

Oscar Ivan Miranda Alcocer

If possible, could you link it to measure/integral theory? e.g define expected value as an integral

Andreas Blatter

It would be nice to have some intuition for the big theorems (law of large numbers, central value theorem)

Andreas Blatter

Ok, firstly I really need the basic stuff (what is a random variable, what does variance mean? )

Andreas Blatter

First of all, thanks for the awesome videos you're making. for me it's Bayes theorem.

Ammar hameed

Definitely explain the difference between independence and mutual exclusivity.

Mr. IntelliGent

Some issues that were really confusing when I first "learned" probability: 1. Independence. I find the intuition pretty simple and so the formal definition, however, I had never been able to connect the two with a simple diagram or example. 2. The difference between random variables (functions that go from a probability space to the real numbers) and CDFs (functions from the reals to the reals that describe a random variable) 3. Probability is not transitive <a href="https://www.youtube.com/watch?v=zzKGnuvX6IQ&t=66s" rel="nofollow noopener" target="_blank">https://www.youtube.com/watch?v=zzKGnuvX6IQ&t=66s</a> 4. The difference between probability and an estimator i.e. the difference between a sample average and the expected value

Héctor Arturo

Reading people's comments is making me salivate! :) What great ideas! +1 for Bayes. By the way, here is an interesting probability problem that came up in the workplace, and although it seems very simple, apparently multiple statisticians came up with the wrong answer. A Monte Carlo simulation and a mathematical analysis are provided. Perhaps it would be a useful example of how our intuition can be misleading. <a href="https://www.linkedin.com/pulse/monte-carlo-simulation-scala-nicholas-sterling" rel="nofollow noopener" target="_blank">https://www.linkedin.com/pulse/monte-carlo-simulation-scala-nicholas-sterling</a>

Nicholas Sterling

What is the holy grail for a teacher or communicator of probability, especially with Bayesian concepts being mentioned? Explain the Monty Hall puzzle in such a way that most people can intuit it after hearing the explanation for the first time. I also like John's idea for St. Petersburg. Unraveling a paradox usually seems to inform pretty well.

Lee Whitney

Some interesting topics that are less covered: 1. Fat-tailed vs thin-tailed distributions. (Cauchy distribution is symmetric and has a median but no mean!) 2. Linearity of expectation – even when random variables have complex conditional probabilities their expected values combine linearly. 3. If you take N items and sample with replacement N times, as N approaches infinity, the probability that a given item will not be sampled approaches 1/e. 4. St. Petersburg paradox. 5. Huffman codes.

john kraemer

1. The same as many people here: some intuitions about Bayes theorem. I know there are some videos on youtube explaining this but I thought you might have some good insights 2. I really need some lessons about how and when to use different distributions. all I learned from my undergrads is incomplete. I never seen the whole picture or know how to apply these distributions to calculate probabilities in real life.

ChocoKung Chinworawatana

An explanation of Gauss's intuitions and logic used to create his probability function . Likewise , in general , the morphing of the discrete into the continuous , specifically with regards to probability, How summing Sine waves creates a picture of reality .This may beyond the scope of what most expect , but I can't help but ask, since your explanatory angles are uniquely acute , and astute.

Guillermo Suarez

Along with what @KevinNorris and @AndreasGwilt have said, I would like to see a discussion of conditional probability. This can of course be taken two different ways rather quickly (towards Frequentist or Bayesian statistics). Given that you can arrive at Bayes' Theorem quite quickly from just the Kolmogorov axioms, it would be interesting to see how one then begins to think about how one can interpret that equality form different viewpoints.

Matthew Feickert

Personally, I'd love to see a link between our intuition of probability and the formal mathematics based on Kolmogorov's axioms.

MadTux

These explanations made it really click for me. Especially the way all formulas in combinatorics are unified.

slzb

<a href="http://gdaymath.com/lessons/powerarea/4-1-what-we-like-to-believe-about-probability/" rel="nofollow noopener" target="_blank">http://gdaymath.com/lessons/powerarea/4-1-what-we-like-to-believe-about-probability/</a>

slzb

<a href="http://gdaymath.com/courses/permutations-and-combinations/" rel="nofollow noopener" target="_blank">http://gdaymath.com/courses/permutations-and-combinations/</a>

slzb

+1 for probability distributions. I think understanding them intuitively can really transform the way you see the world. I'd love to see your take on them.

Eric Doi

Others have brought up many specific points which are more than I can think of. However, to me, what makes your work outstanding is not just the animations but more importantly your creativity, insight, simplicity and wisdom behind them. Try to bring us out of the frame of textbook and division of subjects. Distill us with mechanisms and don't get too close to specific applications. Try to bridge over to ideas in your linear algebra and calculus videos when necessary. Be patient and not to squeeze too many stuff all together. After watching your previous videos, I lost (or never had) the ability to evaluate what I don't know. Just amaze me with anything you think matters!

Duo Xu

Central Limit Theorem I think would be a big hit! You could get one of your trademark 'ohhhhhhh RIGHT. That's so cool!' moments pretty easily :))

Darcy Myring

If you did something visually with bayes I think that would be neat to see, I haven't scene anything like that and by the sounds of it lots of other people want to see something like that too. I think doing a deep drive into correlation, partial correlation, and regression would be interesting as well. They're easy concepts but I feel like I learned them without fully understanding the intuition behind them.

Josh Armstrong

People have mentioned frequentism and Bayesianism several times above, and I want to add to that: You can't have an intuitive understanding of what probability is unless you either pick one of these schools of thought exclusively, or explain both of them. Here's how I think about it: Frequentists think of probability as describing things like roulette wheels, which you can spin many times and measure how often different events happen (then talk about law of large numbers in simple terms). Bayesians think of probability as describing things like a game of poker, where different people may have different sets of information available to them, so that you can have a probability of something even after the shuffle has already happened and the order of the cards has been fully determined.

Kevin

Yea I figure for an "essence of" series, the focus should be building intuition of fundamentals and pointing out common mistakes. Alongside the series, it would be neat to have isolated videos about specific applications of the theory or deeper related topics.

Duncan Fairbanks

I would like to see something about the bell curve as the sum of periodic functions, a la Fourier.

Mark LaJoie

One idea I haven't seen mentioned here could be covering amplitudes (often seen in a quantum computing context). I have never seen a good, intuitive explanation of what they are (although I think I understand them pretty well). It doesn't seem worthy of an entire video, though, since in my experience they don't show up too often.

Andre Popovitch

This is a nice list. I didn't shy away from listing applications just because I was worried that concentrating on probability itself would seem dry when the majority of the audience are probably interested in probability in a very applied context (as a precursor to statistics or machine learning, as part of science undergrad, or as a CS prerequisite). But in a way a deep dive into combinatorics and the other core areas listed here might be a more unique creation than a more superficial survey of statistical applications.

Dan Davison

As a follow up on @EdanMaor's post-- the notion of a random variable as a function was (and still is) very difficult for me to grasp intuitively. Also, when first learning about probability, the idea that PDF of an RV gave a probability *density* and not a concrete probability was a little foreign, even though it's in the name, so perhaps a mention of that distinction may help.

Ben Granger

Some things that would be cool : - Bayes Theorem - Graphical Models, D-Separation, and all things Pearl - Moment-generating functions - Exponential Families - Monte Carlo-Markov Chain (MCMC) inference - variational inference - the simplex - stochastic processes - pittman-yor/dirichlet processes

Brian McMahan

I would personally like to have a better intuitive interpretation of moment-generating functions and the central limit theorem But I will cast my vote for Bayes' theorem because it is important that people understand it

Jonathan Gjertsen

I would love a measure-theoretic view on probability. This is something not usually touched upon in probability courses online. It made me so much more interested in probability when I found out what random variables are made of -- they are just measurable functions! Discrete random variables, continuous random variables, they are all just different functions. Conditional probabilities are pretty tough. What is a conditional probability distribution of a random variable if we condition on an event with zero probability? How do we formulate this from a measure-theoretic perspective? How can we reconcile this with intuition? These are some things I've been struggling with on my own from David Williams's book (<a href="https://www.amazon.co.uk/Probability-Martingales-Cambridge-Mathematical-Textbooks/dp/0521406056" rel="nofollow noopener" target="_blank">https://www.amazon.co.uk/Probability-Martingales-Cambridge-Mathematical-Textbooks/dp/0521406056</a>).

Tuan Anh Le

If you can add something about the dangers of P-value hunting in a way that makes the whole thing feel intuitive, I think that would be both a nice example of how statistics has implications beyond abstrac mathematics and a very useful daily-life skill/bit of awareness (since viewers become more aware of bad pop-science reporting).

Job van der Zwan

Axioms should be in there, like the central limit theorem. :) Venn diagrams and basic set theory are quite helpful for visualization, especially for getting closer to Bayesian statistics e.g., in the typcial medical example. I think it would help to get into the major schism in statistics as early as possible: aleatory vs. epistemic uncertainty, likewise frequentists vs. Bayesians. A short and neat explanation is in: Tony O'Hagan: Dicing with the unknown, signficiance, 2004. // also, it might be helpful not to spend too much time on the p-value hacking hype ;-) Frequentists statistics is more intuitive and kind of denotes the basic to the entire domain. Thinking about probability as expected frequency, would it be possible to include a part on which statistic tests and distance measures are appropriate for which data? Not as much as "if A then B", more as a guidance to get the basic intuition to be able to dig deeper by one self (without feeling too much lost in the big forest). As a computer scientist researching on machine learning, I would love to see some "essence of" probability basics for machine learning and decision making. It would really be helpful to explain which data properties require which distribution. I found some good summary and introduction for practitioners, but getting into its essence of "why" would be neat - to my PhD research, Prince's book is quite helpful, i.e. chapters 2-9 of: Prince: Computer Vision: Models Learning and Inference, Cambridge University Press, 2012. online: <a href="http://computervisionmodels.com/" rel="nofollow noopener" target="_blank">http://computervisionmodels.com/</a> Especially, on the concepts of maximum likelihood, marginalization and joint probability. Personally, I like the idea of Gaussian mixture models to be able to emulate almost any distribution (also from a Fourier transform perspective), however where is the crossing point for you of "essentials" to deeper topics, i.e. will Markov chains and HMMs play a role? Can the concept of statistic moments be linked to the concept of velocity, acceleration, etc., i.e. derivatives? Would be awesome :) Supposedly, I'm already getting off the track from "essential", however coming back to the beginning - aleatory vs. epistemic uncertainty - an example, I'd like to see some emphasis on: when one argues about likelihood ratio tests, exhaustivity is important (frequentist) - however arguing comparing two likelihood ratios and depending prior odds, the underlying hypotheses don't need to be exhaustive, they solely need to be mutually exclusive (Bayesians). Are proper scoring rules related? A basic could be the Brier score for predicting the accuracy of weather forecast problem. Going a bit into the direction of Bayes risk. When comparing (score) distributions, statistic tests are one option, another are divergency and uncertainty measures, such as the Kullback-Leibler divergence and cross-entropy. Though, this is going more into the direction of information theory. Actually, this is a thing I started appreciating about probability, it brings together many fields of maths from set theory and combinatorics over machine learning to information theory. Though, it might be out of scope for essentials, I'd be curious about your thoughts on the Pool Adjacent Violators algorithm. I'm researching on making good decisions in binary classification, where calibration of Bayesian systems is an issue (in "uncertain" environments). Regarding "expected" decision costs (frequentist point of view), this book chapter provides a good overview: <a href="https://sites.google.com/site/nikobrummer/appindepeval-chapter.pdf?attredirects=0" rel="nofollow noopener" target="_blank">https://sites.google.com/site/nikobrummer/appindepeval-chapter.pdf?attredirects=0</a> Regarding information theory motivated performance, this paper utilizes the KL divergence: <a href="http://www.isca-speech.org/archive_open/archive_papers/odyssey_2008/papers/od08_004.pdf" rel="nofollow noopener" target="_blank">http://www.isca-speech.org/archive_open/archive_papers/odyssey_2008/papers/od08_004.pdf</a> - however, this is too far for "essential theory" - but it also resembles my main motivation for contributing to your channel, I'm curious on the basics, that I missed on the way ;-) // btw, Cllr as binary decision performance is actually a neat measure which can be motivated from either perspectives - frequentist and Bayesian :) Regarding decision making, risk assessment and economics, approximations by Taylor series are interesting as well as convex hulls, when two event PDFs (such as error rates) are compared. Many application thrive from probability, and contribute back to its basics --- it is an interesting line to draw. From my intuition, quantum mechanics should be in there, but maybe it appears to be too advanced to tackle e.g., quantum teleportation: <a href="https://www.youtube.com/watch?v=DxQK1WDYI_k" rel="nofollow noopener" target="_blank">https://www.youtube.com/watch?v=DxQK1WDYI_k</a> Would you like to put an initial outline to "Essence of probability" and more "Advanced probility" to discussion? // btw, your latest videos are helping our younger (PhD) students right now! :) - thanks

Andreas Nautsch

I'd like it if you did something on: - Deriving probability theory from axioms. - [Markov chain] Monte Carlo methods. - Expected value (and why it's undefined for some distributions). Maybe you could touch on the Gambler's Ruin problem. - Negative probability -- something I've heard of as useful for something but know nothing about. - Information and entropy. Something I'd like to know more about. - Bayes factors. - Measure theory stuff. Maybe none of this is basic enough (apart from expected value), but I'd love it if you touched on anything even a bit. Looking forward to the series -- good luck!

Jake Palmer

I too am in favor of basic measure theory. It's actually an incredibly concise subject, and I think animation could make it VERY accessible.

Jacob Mirra

I'm very happy you're doing a probability course. Some random notes: 1. One of the hardest things for me when studying probability the first time, was to understand what the objects actually were. E.g. E[X] is a number (not a function). X, a random variable, is actually a function mapping outcomes to probabilities. Etc. A lot of the notation here is deeply unintuitive IMO. 2. I'd *love* a good, visual guide to the different statistical tests and what they're for. I can never remember this. 3. I'd love a good explanaton of the differenfce between frequentist and bayesian approaches (not sure this is suitable though). 4. One of the things that made probability click for me was understanding where distributions come from. E.g. when I learned the first time, the definition of Normal dist, Poisson dist et. all was like some random formula from out of thin air. Only when I realized that these were arrived at by being limits of other distributions in a way that makes total sense, did probability start clicking and even being beautiful.

Edan Maor

(I should mention that I don't think this topic is right for an introductory course, because afaik there are still lots of ideas around how to answer this).

Edan Maor

Here's a harder question for you - if I asked you what the probability is that the billionth digit of pi is 5, what would you say? Intuitively (assuming pi's digits are distributed uniformly, which I think is the best guess today), you would say 1/10. But of course, this, even more than the flipping a coin example, is something that has an exact answer that you could theoretically sit down and calculate right now, no omniscience necessary. Btw, I *think* the Frequentist answer would be that you can run this experiment a million times and get a million different results. You can't in practice run this experiment with the same initial conditions.

Edan Maor

Like any confusing field of study, I think it's important to look at "paradoxes" encountered in the history of the field. <a href="https://en.wikipedia.org/wiki/Category:Probability_theory_paradoxes" rel="nofollow noopener" target="_blank">https://en.wikipedia.org/wiki/Category:Probability_theory_paradoxes</a> I personally like the Sleeping Beauty problem, Simpson's paradox, the Birthday problem, the Monty Hall problem, and violation of Bell's Inequality. Not all of these would be considered paradoxes, but some are rather non-intuitive. Another issue to deal with is that the axioms of probability require some understanding of set theory. I would probably just introduce the concepts of naive set theory as they are needed. As for topics to cover, I would go in this order: 0. Counting (product rule, permutations, combinations, binomial theorem, stars & bars) 1. Finite & countable sample spaces & Kolmogorov axioms 2. Uncountable sample spaces and sigma algebras (example: uniform [0,1] distribution) 3. Conditional probability & independence (law of total probability & Bayes' rule) 4. Discrete random variables (PMF, CDF, expectation & variance (moments), examples: Bernoulli & binomial) 5. Continuous random variables (zero-probability events and probability density, PDF, CDF, examples: uniform, normal) 6. Joint & conditional distributions (independence, marginalization, covariance & correlation) 7. Central Limit Theorem At this point, the viewer should have a solid understanding of the basics of probability theory. I would sprinkle those "paradoxes" into your lessons as problems for testing understanding. If you have time for more videos beyond these, you could look at applications, like maximum likelihood estimation, information theory, probabilistic graphical models, principal components analysis, (your favorite machine learning model here), etc. When I took my first probability theory class, I did not do well. I think the hardest part for me was counting events. I remember exam questions like, "what is the probability of dealing such and such hands of cards to 4 poker players," and it was just overwhelming because I didn't have a good feel for how to count and when to mod out indistinguishable events. I think my ideal class would focus heavily on combinatorics.

Duncan Fairbanks

Snap - I linked this further down.

Hi! The Bayesian answer to this, is that uncertainty is a property of minds, not of the environment. In the coin example, the universe is deterministic, but you in your mind do not know the outcome. As such, probabilities are things that help you make decisions surrounding events you are uncertain, but they do not describe reality at its deepest level. (I also believe in the Everett interpretation of quantum mechanics, which means I believe what I just said to be literally true.)

A combinatorial approach sometimes helps me when it comes to evaluating the significance of results. I just find it easier to get a handle on what the metric is defining as expected. (Particularly true when I was trying to get my head around Krippendorff's Alpha, although I wouldn't suggest you go there)

David Honour

Stuff on Bayes theorem would be neat.

Marc Person

I am SOOO happy you are doing a series about probability!!!! It was always very hard to me to wrap this topic intuitivly. Please start from the most basic thing: random variables! a) what is the difference between a random variable and distribution of a random variable b) how can you perforn operations on the random variables like addition, mulitiplication etc. c) where do distributions like Chi-squared, t, f come from and why are they used in some tests? d) what is moment generating function

Kuba Okrzesa

I would imagine you could build visual intuition around densities and masses and around how we can think of the probability of some event being related to that event's "size". You could then unpack the seemingly simple but actually very complicated notion of what size might mean in the context of the reals. I'd really like to see the following topics covered: probability spaces, conditional probabilities, expectations and conditional expectations (conditioned on events, sigma algebras, random variables). Some coverage of how a random variable can be said be measurable with respect to a sigma algebra would be nice. Some animations of stochastic processes and their properties (informal coverage of martingales, for instance) would be fantastic. Really looking forward to this, whatever angle you choose.

David Boyle

Hi Grant, here are some quick notes from me. (In addition to probability I'd also love to see EOLA2 / multivariable calculus!) ** Basics: discrete probability distributions - Conditional probability - Law of total probability - Simulating from discrete probability distributions - Example: Multinomial distribution ** Basics: continuous probability distributions - Probability "mass" - Example: Normal distribution ** Statistical Inference and Machine Learning - Classical hypothesis testing: What is a p-value? Essential -- all scientists need to know this. - The (pedagogically disastrous) "recipe book" approach that scientists are exposed to: Chi-squared test, t-test, F-test, ANOVA, linear regression, Z-scores etc etc - What's the difference between "Machine Learning" and "Statistics"? What's a "model"? - What does it mean for a data set to "contain information" about some question of interest? Likelihood and connection to information theory. - The idea of "latent variables" in statistical models and the connection to marginalization / law of total probability. HMMs? - Bayes rule; prior and posterior probabilities - Maximum likelihood inference, Bayesian inference ** Simulation - To generate data sets - To estimate (approximate an integral) - If you can program a simulation then you can write down the likelihood ** Information Theory - Shannon information / entropy / likelihood / inference ** Scientific Applications - Statistical Mechanics? - Quantum Mechanics? - Evolution and population genetics?

Dan Davison

Here's something I would really like for you to touch on. This is something I've wondered a lot about and haven't been able to come up with a satisfying answer. Let's say you flip a coin. Typically we say that the coin has a 50/50 chance of landing heads. But now let's think about it from a physics perspective: If we knew the exact amount of force going into the coin, the exact direction, the exact rotational momentum, etc. we would be able to do the math and determine how it would land. Suppose it landed heads. From this perspective, there was a ZERO percent chance that it lands tails. How do we resolve these two perspectives?

Connor Alexander McCranie

You could mention about <a href="https://en.wikipedia.org/wiki/Bayesian_network" rel="nofollow noopener" target="_blank">https://en.wikipedia.org/wiki/Bayesian_network</a>

Michał Łuniewski

I think that, for the purposes of understanding the axioms of probability, the definition of conditional probability (and why it makes sense), as well as understanding the theorem of total probability and Bayes' Theorem is saying (and why they, too, make sense), it's really helpful to see probability functions presented as (what the philosopher Bas van Fraassen calls) "muddy Venn diagrams". In ordinary Venn diagrams, you have a bunch of propositions or events or whatever you want to call them (sets of possible outcomes), laid over each other in such a way as to make the logical relations between them clear: if landing 2 entails landing even, then the "landing 2” circle will be inside of the “landing even” circle; if it’s possible for the die to land both odd and land greater than 4, then the “landing greater than 4” and “landing odd” circles will intersect. From this standpoint, all a probability function does is /enrich/ that representation of the logical relations between those propositions/events (a representation which tells us what is possibly true/what could possibly happen and what is not possibly true/what could not possibly happen) with some additional information, telling us how likely something is to be true---which we can represent with some mud. You think about it like this: you're given 1 kg of mud, and you get to spread it out over that Venn diagram however you like. If you put more mud on top of a circle in the Venn diagram, then that circle is more likely. If you put less mud on top of a circle in the Venn diagram, then that circle is less likely. To calculate the probability of a proposition/event, according to a particular probability function, you just look at what percentage of the mud is sitting on top of its circle in the muddy Venn diagram. All the probability axioms are telling you is that the probability of an event must be determined by the amount of mud sitting on top of it in some muddy Venn diagram. This makes some consequences of the axioms just jump right out at you: e.g., if the A circle is inside the B circle, then you can’t have a higher probability for A than for B---since all the mud which sits on A must also be sitting on B. The conditional probability of A, given B, is just that percentage of the mud which remains on A, once you sweep away all the mud not on B. It makes sense to call this the probability of A, under the supposition that B, since it seems very natural to say that sweeping away all the mud not on B is a way of supposing that B is true. This also allows for pretty visualizations of Bayes’ Theorem (though I think that the undeniably best way to introduce Bayes theorem is with the base rate fallacy), and other consequences of the probability axioms like the theorem of total probability.

J. Dmitri Gallow

Another controversial one: explain why sharing likelihood ratios is better than p-values.

I'm not sure where to post general suggestions that are not for a specific series, so I hope you don't mind if I make a non-probabilistic suggestion here. I think that one of the most useful ideas in math is constrained optimization. Although I took a lot of high-level math and physics courses as an undergrad, I never encountered linear optimization (linear programming), nor was I exposed to its deep connections with variational calculus. This is a beautiful subject that is very simple at its essence, and is enormously powerful.

Norman Margolus

Nothing in probability makes sense to me. People with different information will assign different probabilities to events, and I never learned how to think about that. In particular I never learned how to distinguish cases where probabilities can be assigned from cases where they can't. Concrete example: You hand me an envelope stuffed with cash, labeled A. You show me a second envelope, labeled B; tell me it either has 2x or 1/2x as much cash in it as A; and offer to trade B for A. Should I trade? If yes, then *after* trading, say you show me A and offer to trade back. Should I trade? Why not? Probability gets entangled with time, ontology, and epistemology in ways other fields of math don't. That is where I get stuck.

Jason Orendorff

I've always found it really useful to test edge cases when reasoning about probability. Like N choose N should be 1. So whatever formula N choose K is should be 1 when K == N. Also, reframing probability problems in terms of 1 minus some other probability that's easier to calculate. A section about how to lie with statistics would be fun. Simpson's paradox and whatnot. Difference between Bayesian and Frequentist. Moving from discrete to continuous functions.

Shelby Doolittle

In first year Engineering we had a Physics class, and the first lab was SOO boring. We had a loop about a meter across and we had to measure its thickness and diameter 10 times, and then plot the results, and calculate the mean thickness and diameter, as well as calculate the std deviation. I hated it.... But then for the rest of my science life I keep thinking back to it. That stupid ring actually was surprisingly hard to measure. There really was some uncertainty due to the measuring equipment. I wonder how big that ring really was supposed to be... I still think of that ring when I think of statistics.

Adam Berkan

Sorry I have not seen integration videos yet, but Lebesque measure would be interesting in probability context

Stan Serebryakov

Bayes theorem. Man, I don't know what it is with that thing. I've learned it multiple times, looked at it from different directions, but it never sticks. It always *seems* intuitive at the moment, but then for whatever reason I can't hold on to the intuition and the next time I need it I have to re-learn the whole thing from scratch.

jason black

When i first encountered probability/statistics in high school, i was turned off by the kind of ad-hoc approach of applying this or that model/test because the data met conditions x, y, z, without any deeper exploration of the basis of it all. It was only later when i got into the real good stuff (someone else mentioned Jaynes, i'm also a fan) that connected the models to the proofs, and then explained how they made intuitive sense, that things started to click for me. I don't have anything specific to say, except that i'm really looking forward to this series!

baikal

I think these guys have a nice set of visualizations for probability: <a href="http://students.brown.edu/seeing-theory/" rel="nofollow noopener" target="_blank">http://students.brown.edu/seeing-theory/</a> And there is also "Statistics for hackers" talk by Jake Vanderplas: Vid: <a href="https://www.youtube.com/watch?v=Iq9DzN6mvYA" rel="nofollow noopener" target="_blank">https://www.youtube.com/watch?v=Iq9DzN6mvYA</a> Slides: <a href="https://speakerdeck.com/jakevdp/statistics-for-hackers" rel="nofollow noopener" target="_blank">https://speakerdeck.com/jakevdp/statistics-for-hackers</a> Or his talk regarding Frequentism vs Bayesianism: Vid: <a href="https://www.youtube.com/watch?v=KhAUfqhLakw" rel="nofollow noopener" target="_blank">https://www.youtube.com/watch?v=KhAUfqhLakw</a> Slides: <a href="https://speakerdeck.com/jakevdp/frequentism-and-bayesianism-whats-the-big-deal-scipy-2014" rel="nofollow noopener" target="_blank">https://speakerdeck.com/jakevdp/frequentism-and-bayesianism-whats-the-big-deal-scipy-2014 </a> The last video of Veritasiuam regarding Bayes' Theorem was also very simple to follow.

FJNav

Classic entropy (ideal gas) would look wonderful on your videos. I wonder, though, when/if will you ever tackle differential geometry.

João Streibel

I feel like Bayes' Rule is necessary. Explaining different probability distributions would be awesome. And then maybe explaining some statistical tests/estimation?

faisal

Statistics is one of those subjects I studied and excelled in, but never actually understood. More so than calculus, it will be difficult to explain. I'd like at each stage of the video if you did a short bit on what 'seems' obvious to the human brain, and why - followed up with how things really work. I suppose what I'm saying is that explaining why the brain expects things to work one way, will help when you explain why it works another. Just my 2c.

Christopher Burke

I have 2 exams on this in two weeks, I don't suppose you can finish the series by then can you? :P

Jack Shaw

This is the best explanation of Bayes' theorem I've found, and was really helpful for me: <a href="https://arbital.com/p/bayes_rule/?l=1zq" rel="nofollow noopener" target="_blank">https://arbital.com/p/bayes_rule/?l=1zq</a>

Bayes theorem cannot be hit too hard -- because it is so often counter-intuitive and people often confuse P(A|B) with P(B|A). Also, I'd love to see a discussion of Chi-Squared and Poisson distributions.

Jason Paul DeMont

This is my favourite explanation of information theory, and it's very visual! <a href="http://colah.github.io/posts/2015-09-Visual-Information/" rel="nofollow noopener" target="_blank">http://colah.github.io/posts/2015-09-Visual-Information/</a>

Frequentist vs Bayesian perspective, i.e., objective frequencies as N->oo vs. subjective beliefs. How one can use bets to measure this belief (this was an eye opener for me). Bayes(-Prince-Laplace)' theorem and it's use in decision making, search, etc. Marginalization (could be great to visualize). Correlation vs causality. Probability mass vs density and Dirac mixtures as the link between the two. Expectations and higher order moments and why a mean alone is almost always useless. Different types of variables (nominal vs ordinal vs interval and ratio scale). And a bit unrealistic, but this was a big wtf for me: Simpson's paradox.

Matthias Richter

Grant, I think that shorter videos are better. "Chunk" the information as they say.

Jason Paul DeMont

I've never head of the theorem before, not having taken any courses focusing on probability, but I have a feeling it could be proven intuitively using geometry (with arbitrary dimensions, of course).

Darwin Kim

I'd love to get your take on the density convolution formula and in general operations on random variables. After that, definitely intuitions and other angles on Bayesian inference.

Peter Bork

The most powerful way I've found for thinking about probability distributions is as programs that generate data using some number of random decisions. Or more concretely, the program involves occasionally flipping a coin and using the random answer. Then the entropy of the distribution is the average number of coinflips taken to generate a datapoint. Seeing distributions as probabilistic programs is a great way to build on knowledge that people may already have about programming.

Richard Futrell

<a href="https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)" rel="nofollow noopener" target="_blank">https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)</a> I think is important because it shows just how difficult to be rigorous in our statement of (what seems like straighforward) probability questions.

Manuel Garcia

I would love to see explanations of Bayes' theorem, Bayesian Inference, and in particular, applications involving Markov Chain/Monte Carlo methods. This one idea is such a magnificently powerful and adaptable tool that, once I really understood it, I've found myriad applications for these types of algorithms in my work (am a Software Engineer).

Vecht

If you want to include a more practical video on probability, something on Bayesianism and Frequentism, and perhaps the scientific implications, would be really cool. OTOH I feel like if moving in this direction would be like succumbing to the forces of talking about controversial things (relative to pure maths). Interpretations of probability aren't quite as controversial as interpretations of quantum mechanics, but they still cause a lot of confusion and disagreement..

The big focus I have always had in probability is the counting angle, the probability is always the number of ways something can happen over the number of ways anything could happen. No matter what maths you need or choose to use to enumerate things clearly, it always come back to that beautiful fraction. Probability trees, nCr, nPr, factorials and all the rest have their own stories to tell but within probability they are always just the means to find either the top or the bottom of that fraction. Of course this approach has it's own limits and blind spots, such as no appreciation for the negation and what to do with the fraction once you find it. But overall I think that this fraction is the core of probability as I see it, even to this day. From that ideal fraction you can then go on to "best approximation of it" that marks the transition to statistics; because outside of maths questions or very limited events the concept at the heart of the denominator becomes infeasible. In the real world we have no good handle on 'the number of ways anything could happen'. My piece on probability spoken, I eagerly await each new video. Keep doing what you do!

Kerigorrical

Bayes nets might be a neat one off somewhere in there but nothing more. In general I think it's cool to understand a) what a probability distribution is and b) understand some of the common ones. I use the binomial distribution pretty often nowadays. There's all these little cases where it comes up. Say I want to know how long it's going to take me to catch pokemon X in one of the pokemon games. I look up the probability of running into one and then I use the binomial distribution to figure out where the mean is. Or sometimes there isn't a particular pokemon where I'm looking for it and the binomial distribution lets me know when I should say "well this isn't very probable, maybe I should check to make sure I didn't make a mistake". Also I think showing how you can think geometrically about probability distributions is neat. If you think about a triangle you can assign every point on a triangle to a distribution. In my little works I found this really useful. It let me take convex optimization techniques and apply them to finding optimal probability distributions. I think talking about the Kolmogorov axioms is important. Also showing how probability distributions are defined by measures not functions is important IMO. That about sums it up for me: 1) show how distributions can be used in everyday life. 2) show how geometric distributions are 3) explain a bit more of the formalism behind distributions and include a basic bit of measure theory.

Jake Ehrlich

Entropy didn't make sense to me until I read (and spent a long time thinking about!) Gromov's explanation of where it comes from: <a href="https://plus.google.com/+johncbaez999/posts/RUSdib9dnQa" rel="nofollow noopener" target="_blank">https://plus.google.com/+johncbaez999/posts/RUSdib9dnQa</a>

Sebastien Zany

I'd like to see some abstract definitions and proofs, alongside concrete examples. Off the top of my head, a good first topic might be tree diagrams, which animation can really do justice. Motivate the study of combinatorics with an example like "what's the probability of having 2 boys and 2 girls out of 4 children?" You can do it with a tree diagram, but then point out that it's better to view all outcomes as possibilities in an "event space" of 16, and the solution is best found by finding a good way to "count" how many "events" fall into the "2-boys-2-girls" subset. A sad thing for me to see is little or no unification of discrete and continuous probability distributions. I'm sure you'll do that justice. As a great example, might I recommend something like, "find the probability that if I roll two 6-sided dice, I get a number greater than 8." Once again, the tree diagram is bad, but the problem is very nice if you visualize the event space as a 6-by-6 grid. Then you see that what we really want to do is measure the size of the "greater-than-8" subset relative to the size of the event space. With some animation magic, you can go from this problem to an engineering problem I've seen: suppose a satellite will die when both of its on-board computers fail. The satellite goes up with two computers. When the first one fails, the second one turns on. Both computers have a random uniformly-distributed life expectancy of 0 - 10 years. What is the probability the satellite stays alive for more than 16 years? I don't think you need me to explain how this can be explained by discretization, and I think is about as good of a motivation as you can get for studying multi-variable Riemann sums. What other topics in probability did you have in mind though, Grant?

Jacob Mirra

Related to the second point, what does it intuitively mean for a distribution not to have an expectation, like power laws, whose mean rises as you continue to take a random element (in expectation)? (Intuitive explanations of different distributions and when to use them would be *delightful*.)

Expected value is a really important one with a lot of application to decisions people make. Basically, just try to include as many examples as possible of how terrible humans are at estimating and weighting probabilities--all the fallacies in thinking. Another thing that is hard to square sometimes is thinking about probabilities theoretically vs experimentally (Bayesian vs. Frequentist interpretations)

Chuck Larrieu

One of my favorite posts, about information theory but grazes on probability: <a href="http://colah.github.io/posts/2015-09-Visual-Information/" rel="nofollow noopener" target="_blank">http://colah.github.io/posts/2015-09-Visual-Information/</a>

Darwin Kim

1) Visual explanation of Bayes theorem (probably a given to be included). 2) Visual explanation of conditional probability, joint probability, etc. (also probably a given to be included) 3) Connection to statistics. Probably not a given to be included, but I feel relevant. 4) Central Limit Theorem. Probably a given. 5) Connection to probability in every day life: polls (stats again!), scientific studys, gambling.

Matthew O'Connor

I don't know how deeply into measure theory you want to go with these videos, but the direct connection between "measures" or volumes/areas and the abstract probabilities we learned about in high school definitely helped it click for me. and of course, this helps with the visualization of probability spaces as subsets (i guess like the ven diagrams we started with back then)

Majed Samad

I've always found the Buffon needle problem to be a very concrete and intuitive way of understanding probability densities, and it has definitely helped my understanding of the more abstract realm of quantum mechanics. Speaking of which... it would be really cool to see a visual take on the uncertainty principle! But I totally get it if that would take the series too far afield. Also: I think a discussion of Bertrand's paradox would be really great!

Jeffrey Samuelson

I'm sure you don't need to be told to do this, but helping people to build intuition for things, which I think you did really well with Essence of Calculus, would definitely be really useful. Probability's probably more accessible than calculus for building up intuition, but I've seen a sizeable minority of maths students not really understanding why things in probability are the way they are, and I think it's harder for them to "get away" with applying things blindly in probability than in calculus (might be just me). Thanks!

Maximum likelihood estimation please!

Abay Bektursun

For those keen on exploring CLT and comfortable with programming, check out Peter Norvig's notebook. It appears around the end. <a href="http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb" rel="nofollow noopener" target="_blank">http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb</a>

Bigyan Bhar

Defying the powers that be!

Alexey Badalov

I think Bayes' Theorem has to be there for sure.

Biran Falk-Dotan

I agree. Even if not a proof, defiantly explore why the Central Limit Theorem is true.

Scott Ramsay

Congratulations on completing the Calculus series. I am about halfway through. Great work. I have enjoyed Norvig's ipython notebook on probability; <a href="http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb" rel="nofollow noopener" target="_blank">http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb</a> . Something along those lines with which covers distributions would be a valuable resource for many. Principles of Statistics by Bulmer is another great resource which clarifies basic ideas; animations and infographics on that would be wonderful. Being a programmer, I enjoy the ability to code up some of these concepts. Links to some examples along those lines would be highly appreciated. Keep up the good work!

Bigyan Bhar

I did well in my college statistics class, which was not calculus based. But I haven't been able to see how statistics makes it possible to understand quantum mechanics, thermodynamics, and the other sciences that apparently are less soluble when looking for discrete answers.

Burt Humburg

Why does nature tend to the normal distribution, why is the Cauchy distribution so weird and why isn't E(X) the same as the average value of a function? Great series in calc btw, but I wished you would've covered integration a bit more, like integration by parts and by substitution. Love you man.

Jack

Something that always excited me was reading E.T.Jaynes' derivation of the axioms of provability theory from verbal descriptions of what we needed a calculus of probability for. But that involves a little multi-variable calculus, so is probably beyond the scope. Something that people often get confused by in my experience is Bayes' theorem, which I'd love there to be an intuitive, video explanation of :-)

Maybe this is a lot to ask, but it would be great to see a proof of the Central Limit Theorem. I've seen it presented without proof in classes, so it would be nice to get a feel for where it comes from if that's something you can do visually.

Biran Falk-Dotan


Related Creators