XXX4Fans
3blue1brown from patreon
3blue1brown

patreon


Bayes' rule!

Hey everyone,

Although I have yet to decide on the  full table of contents for the probability series, the most obvious video to include is one covering Bayes' rule.  So I decided to start creating things out of order and begin with this one.  I expect this will show up as chapter 3 or 4 in series, and it will definitely be preceded by some discussion of probability basics, conditional probabilities, and likely something on counting.

In fact, I decided to include two videos on Bayes' rule, since it's just that important.  This one covers examples where the rule aligns with intuitive judgements and the second one discussing the classic example where it does not, which I describe more in its own post.  The main focus of this one is to provide the visual to help intuit how a posterior distribution changes as you tweak the prior/conditional affecting it.

As always, I'm eager to hear your thoughts/suggestions/hopes for future videos.

-Grant

Bayes' rule!

Comments

Great video! When will the Baysian Network video be released ?

Roshni

This is also a very interesting resource on Bayes' rule: <a href="https://arbital.com/p/bayes_rule_guide/" rel="nofollow noopener" target="_blank">https://arbital.com/p/bayes_rule_guide/</a>

Fela Winkelmolen

I like to think of Bayes rule as made up of two factors: (1) the prior probability, multiplied by (2) a factor that is greater than 1 if the observed information is more likely for the given belief than over all beliefs, and smaller than 1 if it is less likely for the given beliefs than over all beliefs. The latter factor can then be further decomposed in P(I|B)/P(I), which makes sense given the meaning of the factor: it indicates how much more likely the observed information is for the given belief, compared to not assuming any belief, which should intuitively be proportional to how much and in what direction we update our belief. For me this makes Bayes rule very easy to remember.

Fela Winkelmolen

I like thinking about "Likelyhood" in a little bit more intuitive way that a subject of statistical inference. If P(B) is a prior probability and P(B|I) is a posterior probability, then P(I|B), the Likelyhood, is simply a value (an impact) of new information we received, that transforms our beliefs. By saying other ways: “How critical this information is, given what we want to measure.“ If new information very slightly transforms our beliefs - this information is not critical and vice versa.

Arsen Torosyan

I'm not comfortable with the cards, poker and "flush" lingua that you used in the video and I don't think it's as popular in Asia as it is in the US and so could you try to explain the Bayes' Rule with a very general example like population, diseases, education etc ? I think that would cater to a much wider audience in the world. This is just what I think and I'll probably watch this and learn the rule again if you couldn't make the video again and so, I'm glad, anyway, that you made this video for us.

Sunil Subramanian

Cards could get accidentally exposed though.

Roman Odaisky

I'm a bit uncomfortable with the way you dismiss the term "likelihood". I do agree that it's a very confusing term, as many beginners think "likelihood" is just another synonym for probability. But the point about the likelihood term in Bayes' rule is that P($$|flush) is *not* a conditional probability distribution; rather, it's a probability-valued function which takes values in a set of hypotheses/parameters {flush, no-flush}, and for each one, outputs the probability of the observed data, which is fixed ($$). So I think suggesting to people that it's a probability with a confusing name doesn't quite live up to the standards you've set regarding the teaching of subtle concepts in EOC and EOLA. It really is a fundamental concept in inference -- one reason is that it is the only place in the mathematics where the data-generating process enters; it is solely responsible for capturing information from scientific observations and feeding them into the inference process. I believe that, under certain assumptions (e.g. the model is correct!), then if I can give you the likelihood function over a set of hypotheses/parameter space for a fixed data set, then that contains all the information in the data set regarding the parameters, and you could discard the data set. (I think the Neyman-Pearson lemma is in that territory).

Dan Davison

Good point. I guess there's a certain assumption of objective quality here that doesn't really apply to music.

3blue1brown

That's a great point. One's friends might indeed sincerely believe that hearing something from the next earth-shaking musical shift is distasteful upon first hearing.

Don Sanderson

Great question, I'll try to get to parameter estimation in a later video. At least for this one intro, it seemed best just to start with judging the truth/falsehood of a binary hypothesis.

3blue1brown

I find it ever again amazing how you come up with ideas for visualizing the mathematical relationships. Looking forward to the next episodes.

Lionel Pöffel

I'm having a hard time connecting this to scientific inference. If I want to estimate a parameter from different data sets, what would P(x) mean? I imagine updating an entire probability distribution , not a single number.

Julian

I don't know how easy it is for most mortals to "just change these rectangles to add up to 1 unit high and have the same width without changing their area". For example, if you showed a layperson a square and asked them to draw 2 or 3 rectangles on top of that square sharing one corner chunk, and of a different aspect ratio (eg: narrower or taller or such) but with the same total area, I expect that they would at least know that wider has to couple with shorter and taller has to couple with skinnier, but if they don't have a chance to really think the edge cases through they would form corners distal from the fixed corner that trace a locus almost like a line instead of a hyperbola. EG, they would favor "take one unit from the height and add one unit to the width" as their sense of fair redistribution because they are more instinctively accustomed to additive redistribution than multiplicative. Thus they would wind up intuitively drawn to conserving perimeter instead of area and find that mentally to be a lot less work, to come more naturally. Showing them an edge case like "if you keep doing that won't you wind up with a toothpick sliver only twice as long as the square, so basically zero area?" leads them to understand that eventually small shifts in a narrow dimension lead to great leaps in the long one. But that's still not how their intuitions are best trained to redistribute.

Jesse Thompson

Excellent presentation as always, Grant. One minor difficulty I had with it is the 1% probability of the musician's friends saying that the music is bad when it is really good. This was described as the possibility of them being sadistic jerks, but that seems like only the smaller part of that space. They might also really not like the music but not be good judges of what might sell. Had my best friend played the first heavy metal piece for me back in the day, I would have told him that I hated it. Empirically, the probability that I will actually think a random track is something I would want to hear again is not that high, even if it is considered good work by others. I suppose this is a small thing, but I found myself having to spend mental energy reversing the designation of the small rectangle as "they are sadistic jerks" to include the missing possibilities that the music is actually good but they don't think so, and ultimately had to back up the video to the point where my concentration was derailed.

Nicholas Sterling

I'm going to be honest: the look on the pi creature's face when he was contemplating the idea that his music might be bad actually made me cry. (I am feeling kind of tired and emotional at the moment, in fairness, but still.)

Tom Hawking

Agreed, it does seem silly to have a name. I guess that's kind of the point, though, it the same way that you shouldn't "memorize" the butterfly method or FOIL, instead just reason through the relevant problem, the same applies here. The double counting perspective you bring up is indeed a great one. And I plan to mention it in a video on conditional probabilities to precede this. The main benefit I see to calling out the Bayes' rule formula itself as something special is the idea of framing problems in terms of priors, new information and posteriors.

3blue1brown

The top post on this stack exchange is interesting. It seems most people write/say Bayes' because that's how it was in earlier works before things were standardized. <a href="https://english.stackexchange.com/questions/92267/bayes-theorem-or-bayess-theorem-similarly-charles-law-or-charless-law" rel="nofollow noopener" target="_blank">https://english.stackexchange.com/questions/92267/bayes-theorem-or-bayess-theorem-similarly-charles-law-or-charless-law</a> Most things I read have it spelled as Bayes', but I'll look into it more to see if the other is really more correct.

3blue1brown

Numerical Analysis is probably the biggest one for me. Also things that I love as tools but never excited me on their own: Commutative Algebra, dynamical systems, harmonic analysis.

Sean Bibby

There are some forms of poker where some cards are dealt face up. But you never see their face down cards in any version until the end if they go to showdown. You never see a folded hand.

Scott

Just a note on the 'I don't expect you to remember this': I've actually never found bayes' rule hard to remember (actually I'd put it in the 'why does this have a name again?'-bucket for things like the butterfly method or FOIL or what have you). The way I think about it that makes it so obvious is by an easy double counting argument. You count the probability of both events occuring (eg P(hearts and $$)). On the one hand this is P(hearts)*P($$ given hearts), on the other hand it's also P($$)*P(hearts given $$), since that's (literally?) the definition of conditional probability. So both terms are equal and bayes follows by rearrangement.

Jan Nienhaus

Great video as always. But I think there is a typo with "Bayes' rule" the correct way according to OED (<a href="https://en.oxforddictionaries.com/punctuation/apostrophe" rel="nofollow noopener" target="_blank">https://en.oxforddictionaries.com/punctuation/apostrophe#apostrophes_showing_possession</a>) is "Bayes's rule".

Owen Allemang

In a hand with lots of players, do some versions of poker allow you to see/know other players' folded hands so as to have a partial "count" of the other hearts out there?

Don Sanderson

Ah, right, forgot about that.

3blue1brown

Commenting while the video goes. I think the example is good, and the subjectivity doesn't bother me. It's better than the "standard" False Positive / False Negative analysis for disease testing I'm sure you'd find in every textbook. I'm going to assume at this point in the series you've already covered binomial notation. Sure. But based on the illustration at 3:06, I'm hoping you've also thoroughly covered conditional probability, because these different regions and shading would be otherwise bewildering. 3:23 is a great moment. I was also thinking, couldn't she also get an estimate on how likely I am to call her bluff.... and on-and-on it would go. I wonder if I might model this with a Markov chain and find some kind of equilibrium point... whatever, that's not the point of the video. 5:59 visualizing the result of adjusting one's subjective assumptions, is wonderful. "A thumbs up from an honest friend means a lot more" Nice. And on this topic, Grant. Your work sucks. Stop doing it. jk. Nice work. No major complaints, and in fact you just made me really want to write AI.

Jacob Mirra

Indeed I do have a copy of Cox's book. It should be a good source when I get to entropy and the principle of maximizing entropy.

3blue1brown

Totally agree, this is why I want to include them even if they are not "essential" to probability. It's just such a nice structure.

3blue1brown

Good catch! Accidentally froze the frame during a blink :)

3blue1brown

Thanks so much Sean. What's a topic you always wish was less dry?

3blue1brown

@6:20, people don't count cards in poker, just in blackjack. There's a new shuffle every deal in poker.

Scott

This is just so so great. Can't wait to see more from this probability series!

Mark Mulvey

On 2:16 (first sighting of Bayes rule), P(B) should be P(Belief) for consistency. IMO the music example is better than the poker example and could come first. In addition, the poker example might be a bit confusing: in a poker game, you would not only consider if there is a bet, but also the amount of the bet, i.e., the data term is a continuous RV. The example hints at this ("high bet"), but treats the data term as discrete, binary RV. Even in this example, the data term would have at least three outcomes: "high bet", "low bet", "no bet". To fix this, you could just say "bet" without quantifying it. Other than that, I like the example, because it illustrates that you can quantify beliefs by letting people bet on outcomes and it could pave the way for a discussion of risk (= expected cost) and risk minimization.

Matthias Richter

Good video. I especially liked the practical example at the end. It was much more engaging than the pretty standard introduction with a slight geometric tweak. In order to catch the viewers interest and attention I think it would be better to swap the two. Bring the guitar example first and than the more specific one afterwards.

Hi Grant, if you want some ideas for the very early videos in the series, see if you can get your hands on a copy of a book called "The Algebra of Probable Inference" by Richard T. Cox. It's a slim book, quite old, but written in a plain style. I found the middle and last chapters (on entropy and expectation) to be very helpful.

Peter Dulimov

<a href="https://media0.giphy.com/media/11sBLVxNs7v6WA/giphy.gif" rel="nofollow noopener" target="_blank">https://media0.giphy.com/media/11sBLVxNs7v6WA/giphy.gif</a> I love you! I'm currently writing master's thesis about Bayesian Networks and I can't wait to see how you explained them. I would love to see them from different perspective, as there are limited resources about them in internet, unfortunately.

Michał Łuniewski

at 10:20 the eyes of the 3rd pie creature counting from the left disappear and then eventually return.

I am hoping you can do something about statistical hypothesis testing, how does that work and what it tells us when it doesn't work. And maybe throw in regression analysis as well. Although that deserves a series of its own. Or maybe even category theory

Anton Novikov

Life changing videos. I feel smarter every time I watch one of these and feel like I was being cheated this whole time!

You stud muffin. These vids are lit.

Ian Wessen

Wow, I was worried that a probability series would be kind of dry (since I so far have found every exposition on probability dry, probably why I've studied it the least out of any math). But this is really great! The conversational tone is more effective than ever here, and the pacing is spot on. I'm thrilled for the rest of the series, Grant.

Sean Bibby

Really great way to visualize Bayes' rule, I love it!

Edan Maor


Related Creators