Chapter 4: Expecting the Improbable  
"Attention passengers."  
[1:] "This train is being held in the station due to police activity at 50th Street; someone has apparently fallen onto the tracks. We will be moving as soon as it's possible." 

[2:] Thus began one of my most striking and memorable encounters with probability  an important object lesson in how abstract mathematical concepts should intrude on everyday life. 

[3:] I had just been asked to write an article for a science magazine on an astronomical controversy concerning quasars. The standard model, described in Chapter 3, holds that quasars are very distant sources of enormous power driven by the accretion of material onto a massive black hole. This model invokes extreme conditions  a black hole of a billion solar masses or more, its ingestion of more than 100 million trillion tons of matter each day  but the laws of physics used to describe these process are standard ones that have been tested in Earthbound laboratories and through other astronomical observations. 

[4:] Some time ago, however, there was an alternative hypothesis: that quasars are relatively nearby objects ejected from neighborhood galaxies. Their proximity, astronomically speaking, meant they had to produce far less power to appear as bright as they do. This made their energy requirements much less extreme, but at a significant cost: to explain the large redshifts evident in their light, "new" (and unspecified) physics needed to be invoked. The principal advocate for this view was Halton Arp. In support of his unconventional viewpoint, he produced striking discoveries of quasars aligned with nearby galaxies, strongly suggesting a physical association between the two. Furthermore, Arp found improbable numerical relationships among adjacent objects. In one of his celebrated examples, two triplets of quasars were found lying in straight lines on the sky with separations in the ratio 1:3:4. His claim was that these alignments were too statistically improbable to be coincidences and so must represent some new and unknown physics. 

[5:] I must confess to being quite impressed by this sort of coincidence and was pondering its potential significance for my article's conclusion as I walked to the subway station near my apartment to await the train to Columbia. At the first stop after boarding the train, I heard the announcement quoted above. Later that day, as I was riding the Staten Island ferry across New York harbor to my motherinlaw's house, I thought about the statistical likelihood of the subway accident: What were the odds of my arriving within fifteen minutes of such an event? 

[6:] In New York City each year, roughly ten people fall, jump, or are pushed in front of trains from one of more than 400 platforms. The probability of my witnessing such an event involves the number of events per year, the number of subway stations in New York, and the fraction of time I spend each year in the subway system. The answer is about 1 in 4000 (HUH?). Thus, one would have to live in New York City for about 2000 years to have a 50:50 chance of witnessing one such event. Yet I had just done so! 

[7:] Approaching the Staten Island terminal of the ferry on my way back to Manhattan two days later, the train I was riding on stopped for an unusually long time at one of the stations. Someone had just fallen under the southbound train on the adjacent track, and heavy equipment was being called in to free the victim. 

[8:] As you will learn below, the probability of several unlikely events occuring by chance is the product of the probabilities for each event individually. My chances of being present at two such subway accidents in two days at two separate stations is about 1 in 16 million (HUH?). 

[9:] Now, suppose you were on a jury for which the prosecutor intoned: "The odds of the defendant (me, in this case) being present at two subway deaths in two days are 1 in 16 million...unless, of course, he is the perpetrator." Would you vote to convict? 

[10:] The purpose of this chapter is to equip you with the skills to make such a judgement. It may also help you decide whether or not to buy lottery tickets, provide you with a quantitative basis for skepticism about claims made in the media, and even offer the opportunity to profit modestly at the expense of your friends who have failed to read what follows. 

PROBABILITY DEFINED  
[11:] Probability and statistics have a terrible reputation that has endured for centuries. The title of the next chapter ("There are lies, damn lies, and statistics..."), coined by the nineteenthcentury British Prime Minister Benjamin Disraeli, evinces a particularly robust notion  that probability and statistics are used primarily as tools for manipulation and deception. In addition, the two have a reputation for being abstruse, boring, or both. I would ask, however, that you suspend your belief in these cultural biases for at least a few pages, in order that I might attempt to win you over. For probability now shapes our understanding of the physical world, and statistics stands as the arbiter between our scientific models and the real Universe they purport to describe. They are a central feature of a scientific mind and a bulwark against skulduggery and exploitation. 

[13:] Probability P = (number of outcomes of interest)/(number of possible outcomes) 

[14:] and is therefore always a number between zero and one. Defining carefully all the outcomes of interest  and assuring a complete census of all the possible outcomes  is sometimes the tricky part. But the basic mathematics is a simple ratio. 

[15:] For example, the probability of pulling a King from a deck of 52 playing cards is calculated as follows: 

[16:] number of outcomes of interest: king of clubs + king of diamonds + king of hearts + king of spades = 4 

[17:] number of possible outcomes in pulling one card from 52 different cards = 52 

[18:] Probability of getting a (any) king, P(king) = 4/52 = 1/13. 

[19:] Likewise, the probability of a new classmate you meet having a birthday in the same month as yours is given by 

[20:] outcomes of interest = 1 (your birth month) / possible outcomes = 12 months 

[22:] Actually, this last example is not perfectly accurate, since there are not the same number of days in each month, and not the same number of births on each day of the year. For example, even if births were perfectly evenly distributed over all the days, the chances of a February birthday are 28.25/366 = 0.0779, whereas a March birthday has a probability of 31/366 = 0.0847. Neither of these are equal to 1/12 = 0.0833, although both are close. The approximation 1/12 may well be good enough in many situations. (See Chapter 2 on approximation as a tool of science.) 

[23:] Adding two simple rules about combining probabilities to the basic definition provides a powerful set of tools to predict or assess the likely outcome in many situations. 

[25:] Rule 1. The total probability of any one of several possible mutually exclusive outcomes happening is the sum of the probabilities of each outcome. In a more mathematical form, 

[27:] For example, the probability of pulling a king OR a queen from a deck of cards is 

[28:] P(K or Q) = 4/52 (for a king, as above) + 4/52 (for a queen) = 8/52 = 2/13; 

[29:] the probability of a new classmate having the same birthday month as either you or your mother (assuming hers is different from yours) is 1/12 + 1/12 = 1/6. 

[30:] This rule is consistent with the definition of probability, since the sum of all possible outcomes has a probability of 1.0. The probability of getting a six when rolling a die is 

[31:] 1 outcome of interest (a 6) / 6 possible outcomes = 1/6. 

[32:] The probability of getting a 1 or a 2 or a 3 or a 4 or a 5 or a 6 when rolling a die is 1/6+1/6+1/6+1/6+1/6+1/6 = 6/6 = 1. 

[33:] Rule 2. The total probability of several independent outcomes all happening is the product of the probabilities of each outcome. Mathematically speaking, 

[35:] For example, the probability of the first two new classmates you meet both sharing your birth month is, then, 1/12 x 1/12 = 1/144 = 0.0069  less than 1 chance in a hundred (assuming, of course, that your first two acquaintances are not twins). If this actually happened to you, you might well find it a remarkable coincidence, just as Dr. Arp did when he found all those aligned quasars, and you might even attach some cosmic significance to the event. If that would be your inclination, be sure to read the section below entitled "Rare Things Happen All the Time". 

[36:] So, is the probability of pulling a king AND a queen from a deck of cards 

[38:] This is a good example of how careful one must be in defining both the outcomes of interest and the total number of possible outcomes when calculating a probability. Let's examine the situation carefully. If you start with a deck of 52 cards containing four queens and four kings and your goal is to draw two cards and end up with one of each, the first card drawn can be either a king OR a queen. Using rule number 1, the probability of that outcome is 

[39:] 4/52 (for kings) + 4/52 (for queens) = 8/52. Having produced one of the outcomes of interest (say, a queen), one then draws from a deck of only 51 cards (and so, 51 possible outcomes) with 4 remaining outcomes of interest (four possible kings). Thus, the correct probability calculation for getting one king and one queen is 

[41:] more than twice the probability of the naive (and incorrect) initial calculation. Needless to say, if your casino ran a game based on pulling kings and queens, and you based the betting odds on the first calculation, you would not be in business long. 

[42:] It is also essential in calculating probabilities to make certain that the question you are asking is clearly stated. For example, the probability of obtaining a king then a queen on two successive draws is different from the probability of holding a king and a queen after two draws. If the order matters (e.g., first a king, then a queen), the probability would be 

[43:] P(K then Q) = 4/52 x 4/51 = 16/2652 = 0.0060. Note that even this ordered draw has a (slightly) higher probability than the initial naive calculation would suggest. 

[44:] Obviously, the same calculation applies for drawing a queen then a king: 

[46:] and, if either order is okay, then, according to rule 1 we add the two and get back to 

[48:] What is the practical application of all this? Well, how about making money off friends and acquaintances? Or, in my case, my students. For the past twenty years, I have attempted to reinforce my lessons in probability by lightening the wallets of my students using the following demonstration. 

[49:] Let's say there are 60 students in the class. There are 366 possible birthdays for those sixty students (although Feb 29 occurs only 25% as often as the other 365). I offer odds of 20:1 that at least two will have the same birthday, and emphasize the point by laying $20 bills all across the lecture table at the front of the room and inviting people to cover the bets with $1 each. 

[50:] It sounds like an almost irresistable deal  only 60 total birthdays and 366 days to chose from. It seems as though I should be demanding 6:1 odds (i.e., 60/366) in my favor rather than giving 20:1 odds to the class. But probabilities are often not what they "seem" to be. Let's follow the above definition and rules, and calculate the odds of two people having the same birthday. 

[51:] A straightforward consequence of our basic definition is that the probability of something happening is one minus the probability of it not happening: the total possible outcomes (in this case, an event happening or not happening which logically covers all the possibilities) must equal 1. In this problem, as in many others, it is often most convenient to frame the calculation in this way. That is, the probability of two people having the same birthday is one minus the probability of each having a different birthday, or 

[52:] P(two identical birthdays) = 1  P(all different birthdays). 

[53:] So, let's start with the first person. The odds of them having their own birthday is 1. For the second person, the odds of having a different birthday than person #1 is (ignoring leap years to keep it simple) 364/365 (there are 364 outcomes of interest  a different birthday  out of 365 possible outcomes). With two dates now taken, the odds of a third person having yet a different birthday is 363/365, and so forth. If we want person #2's date to be different from person #1's and person #3's to be different from # 1's and #2's, according to rule 2 we multiply the probabilities. Thus, the probability of two people in any random group having the same birthday 

[54:] P(two identical birthdays) = 1  [1 x 364/365 x 363/365 x 362/365 x....(365N1)/365] 

[55:] where N is the total number of people in the group. If you carry this out, you will find that for N=23, the odds are roughly 50:50 that two will have the same birthday. Yes, only 23. So while you may find it amazing when you discover that two people on your dorm floor have the same birthday, if there are more than 23 students on the floor, probability PREDICTS that this should happen more than half the time. In my class of 60, the odds are 

[56:] P(two identical birthdays) = 10.0059 = 99.4% or nearly 200:1 in my favor  which explains why I have never lost in twenty years of fleecing my students. 

[57:] The message here (in addition to an obvious moneymaking scheme to perpetrate on your probabilistically challenged relatives at Thanksgiviing) is that your "common sense," "gut feeling" about the likelihood of some "coincidences" may often be very misleading. Beware. 

RARE THINGS HAPPEN ALL THE TIME  
[58:] This section proposes to convince you that its title is not oxymoronic (or just plain moronic). Along the way, I will try to induce some healthy skepticism informed by your growing knowledge of probability. 

[59:] The human mind has evolved over the past million years or so to look for patterns. There are many instances in which this highly developed ability is important for the survival of an individual or of the species. The ability of the eyebrain combination to quickly recognize tiger stripes is undoubtedly an advantage to an individual who wishes to pass on his genetic code to future generations (see Chapter 3). The recognition of recurrent patterns in the length of the day, the temperature, and the amount of rainfall clearly enhances the ability of a society to hunt more efficiently, or even to plant crops for food. But in an age of caged tigers and Krispy Kremes, some of our primitive cognitive skils are no longer essential for survival, and their application in a modern technologicial society can lead us astray. Indeed, our predilection for seeking patterns can make truly random events seem highly ordered and in need of an "explanation." 

[60:] In fact, humans have a remarkable penchant for "explaining" random events, as well as for simply accepting nonrandom events at face value. Both tendencies can get you into trouble. Take, for example, a simple (and oftused) stock swindle. 

[61:] Suppose among the mountains of spam in your email inbox, you find a note from me, your trusted science professor. Naturally, you open it. To your surprise, it does not contain an incomprehensible problem set or the announcement of yet another review session for the exam, but a stock tip. In fact, I suggest that, over the next two weeks, you watch a particular stock; I predict it will climb significantly. Thinking this might be an assignment in disguise, you look up the stock on line and track it. Indeed, it does go up. 

[62:] Two weeks later, you get another email on the stock. Keep watching that stock it says; in the next two weeks it will rise again. And it does. A third message cites inside information about the company that leads your moonlighting science professor to suggest a significant selloff in the stock. Lo and behold  right again! In fact, this continues throughout the semester. By the time you go home for break, I'm six for six, and you convince your parents to put the money for your next three years' tuition in my hands  you'll be able to pay for Columbia and buy that red Porsche at graduation. 

[64:] I begin by sending 4,000 letters, one to each Columbia College undergraduate. In 2,000 of them, I predict my stock will go up in the next two weeks, and in the other 2,000 I predict it will plummet. Almost inevitably in this artificially volatile market, it will do one or the other. Let's say it goes up. I then delete the addresses of the 2,000 students to whom I sent the wrong advice and send a new letter to the 2,000 to whom I sent a correct prediction. For 1,000 of them, I predict the stock will continue to rise, while for the remainder I forecast a selloff. Two weeks later, I am starting to look pretty good to 500 people, while the remainder never hear from me again. 

[65:] I repeat this process four more times. For 62 people (HUH?) I have reached guru status  I've been right six times in a row. The odds of being that good by chance are only 1/64. The next email to this select group is a request for $300 to remain on my stock newsletter mailing list. And I won't be disappointed  many recipients are likely to accept my purely random guessing (plus my selective application of it) as a sign of deep insight. 

[66:] This instance of a rare event  picking the direction of a stock price successfully six times in a row  is, of course, artificially produced. And while I am not suggesting that this is standard operating procedure in the world of equities trading, many stock brockers and buyers alike believe they can predict the future. For a while, some even do. But does this necessarily imply either clairvoyance or a genuinely profound understanding of market trends? Not necessarily. 

[67:] To be more precise about the title of this section, I will parse it into two statements: 

[70:] Rare events are certainly always improbable, but improbable events are, as we shall see, not necessarily rare at all. 

[71:] As an example of point 1, take the New York State lottery. The odds against winning when you buy a ticket vary depending on the number of tickets sold, but are usually at least five million to one. What a remarkably improbable event winning is  one chance in five million. It's almost as improbable as being present at two subway deaths in two days without causing either of them. And yet, of course, it happens almost every day. Someone (or often more than one someone) wins. How can we call a daily event improbable? 

[72:] The probability of something happening must, as noted above, be carefully defined. The odds of you buying a ticket and winning may be 5 million to one, but it is also 5 million to one for everyone else buying tickets, and if 10 million tickets are sold, it is hardly surprising that someone wins. Improbable things happen all the time. 

[73:] And rare things happen  largely owing to the fact that a very large number of "things" happen every day, and selective memory and/or news reporting emphasizes the unusual. For example, on November 4, 2000, the New York Times ran the following Reuters newswire story: 

[74:] "Remembering the Elian Gonzales case was the ticket to a winning lottery number for 192 people in Miami. 

[75:] On Thursday, one of his relatives said he would buy the home at 2319 NW Second Street, where the Cuban boy stayed during his custody fight. 

[76:] That prompted many people to choose 2319 when picking numbers, the Miami Herald reported today. Those numbers turned out to be the winning combination in Florida's Play 4 lottery on Thursday, and the winners each receive $5,000." 

[77:] I will leave aside the question as to whether or not this should be considered newsworthy. But I have no doubt that many winners (and losing newspaper readers) found great significance in this event. Of course, over one million other people bet numbers on that Thursday (HOW do I know this?). Some may have bet on their own birthdays, and an octogenarian born on February 3, 1919 might have won as well. Would the "explanation" of her win be different from the explanation for those who bet on Elian's street address? If someone bet on George Bush's birthday in the Florida lottery (his brother, for example) and won, would that make the news? Does it require an explanation? 

[78:] There is nothing wrong with seeing patterns and looking for explanations in the world around you. As noted above, it is a natural consequence of evolution that our brains do this. But applying it compulsively  like my spouse's dog sniffing every lamppost to see if another dog has passed recently  can be unproductive. Some "patterns" are illusory, and searching hard for explanations can lead both individuals and societies to adopt irrational beliefs. 

[79:] For example, consider the following rare and wildly improbable event: 

[80:] You arrive at College anxious to meet your roommate. Your name is Brianna, you were born on February 23, and you have a brother named Jonathan. Encountering your roommate for the first time, you are dumbstruck. Her name is also Brianna. Her birthday is also February 23 and, yes, you guessed it, she has a brother Jonathan too. 

[81:] Clearly, cosmic forces are at work (after all, you are both the same astrological sign!). But let's calculate the odds of this happening by chance. 

[82:] Googling "frequency of girls' names," I immediately found some useful data. A recent sample of 85,000 given names found that of the 41,465 girls born in a single year, 232 were named Brianna (1 in 179), while of the 43,740 boys born in the same year (WHY so many more?), 341 were named Jonathan (1 in 128). So, the probability of this event is calculated as follows: 

[83:] P(you being named Brianna) = 1 (that's your name!); P(another random person you encounter named Brianna) = 1/179; P(your brother being named Jonathan) = 1 (that's his name); P(your roommate's brother being named Jonathan) = 1/128; P(your birthday being February 23) = 1; P(your roommate's birthday being February 23) = 1/365 

[84:] Thus, the probability of all these independent things being true at the same time is given by their product: 

[85:] 1 x 1/179 x 1 x 1/128 x 1 x 1/365 = 1.2 x 10^{7}  it is nearly a one in 10 million chance! 

[86:] Even rarer than winning the lottery. It must be in the stars. 

[88:] There are now nearly 300 million people in the US. Roughly 1 in 75 is the age of a firstyear college student (HUH?), about 40% of people this age attend college, perhaps 1/3 of these live in dorms, with roughly 2/3 in doubles. So there are roughly 

[89:] 3 x 10^{8} x 1/75 x 0.4 x 1/3 x 2/3 = 360,000 students 

[90:] in your situation  showing up at a residential college and meeting a roommate. This means there are 360,000/2 or 180,000 pairs of people finding out for the first time their roommate's name, birthday, and brother's name each fall. If we wanted to know how likely it is that the exact outcome I have postulated occurred, we simply multiply the probability of it happening times the number of times we perform the experiment: 

[91:] 1.2 x 10^{7} x 1.80 x 10^{5} = 2 x 10^{2} or one chance in 50. 

[92:] However, you would no doubt find the coincidence of first names, birthdays, and brother's names equally astounding if it were two Emilys, both born on December 7, and both with brothers named Jacob. In fact, in the sample I used, Emily and Jacob were the two most common names, and the odds of that happening are therefore higher  about 1 in 10. To find the probabilty of this triple coincidence occurring for any set of names, all I need to do according to our rules is to add the probabilities of each set of names. In the end, one such coincidence is more than likely. In a strictly random universe bereft of astrological influences, such a remarkable event will almost certainly occur. 

[93:] Indeed, if we leave out the brothers, it will occur quite often. A student named Brianna with a birthday on February 23 can be expected to find her roommate named Brianna with the same birthday two or three times each year in US colleges! 

[94:] Yes, it is a rare event, but no, it does not require an explanation. Unless nonrandom factors intervene (e.g., firstyear room assignment committees disallow roommates with the same first name), we should expect such a coincidence to happen. If it happens to you....yes, well, it has to happen to somebody. 

[95:] This illustration brings up another concept of central importance in statistics: a priori vs. a posteriori calculations. These Latin phrases simply refer to calculations done before and after the fact, respectively. Confronted with a roommate of the same name and birthdate with matching sibling names, you might well do the calculation above and tout it as a one in 10 million chance. But suppose it were your mothers rather than your brothers with the same name? Or your fathers? Or uncles? Or dogs? There is a very large number of "coincidences" that could have occurred. In this case, you have first defined the exact set of coincidences and then have done the calculation  a clear case of a posteriori statistics. If, on the other hand, you defined in advance all of the coincidences you would describe as remarkable and then added the probabilities together ("I would find this OR this OR this... strange"), the chance of something strange happening would be appropriately judged as much more likely. 

[96:] In science, we place much more value on a priori calculations than on a posteriori ones. As noted in the Introduction, Halton Arp found "remarkable" coincidences between quasar separations and positions, and the longer he looked for such events, the more spectacular the coincidences became. But we should expect this even if there is no physical connection at all between quasars and nearby galaxies at all. If one flips a coin a hundred times, it is very unlikely that ten consecutive heads will come up. The odds against this are, according to our rules for combining independent events, 1/2 x 1/2 x 1/2.... = (1/2)^{10} = 1 in 1024, so in 91 tries (HUH?) there is less than a 10% chance of it happening. In a million flips, however, this "unlikely" event should be expected to occur. Eventually, the ardent coin flipper will cease to react with surprise to ten heads in a row, but might still find "remarkable" a run of twenty (What are the odds against that?). It is of great importance to define the rules of the game before beginning to play. Dr. Arp only reported his quasargalaxy coincidences after he found them. Thus, although each new report was more stunning than the last, his hypothesis of nearby quasars soon faded from the scientific scene; he never produced an a priori prediction, and his a posteriori statistics no longer impress anyone. 

A MATHEMATICAL DEFINITION OF "LUCK"  
[97:] What is luck? Is it determined genetically? By the stars and planets? By zip code? 

[98:] In my view, some of these explanations are more likely than others. But consider the following scenario based on a story in John Allen Paulos' charming book Innumeracy. 

[99:] Two people who lunch together every day decide that instead of bothering to figure out how to divide each bill, they will flip a coin and keep a running total of heads and tails. If, on any given day, there are more heads, Brianna will pay, whereas if tails are in the lead, Jonathan will pay. If heads = tails, they'll simply split the bill in half. Notice this system is subtly different from one in which each day's flip determines who pays. But it still sounds like a fair enough system, yes? 

[100:] Well, after three years, one of the two is going to think of herself as a real winner, and the other is going to demand an end to the practice. How do I know this? 

[101:] After 1000 flips, it is much more likely that one person pays more than 90% of the time than it is that they are splitting as closely as 45:55, and it is even more likely that one pays 96% of the time than they are as close to even as 48:52. 

[102:] But how can this be? For a truly random process such as flipping a fair coin, you would expect the ratio of heads to tails to get closer and closer to the average, or "mean" value. One flip and it will be 100% heads and 0% tails, and after 10 flips it may be 60% H and 40% T. But after 100 flips, you would not expect 60 H and 40 T  probably more like 54 H and 46 T. And these expectations are correct. They even have a name: "regression to the mean," sometimes referred to colloquially as "the law of averages." But suppose you flip six heads in a row. What is the probability of getting a head on the next flip? 

[103:] If you have had some exposure to probability before  or if you have completely internalized and accepted the foregoing  you will respond 50:50. There is always a 1 in 2 chance of getting heads (one outcome of interest over two possible outcomes). Intellectually you accept this. But do you really believe it? If you flipped 11 heads in a row, don't you feel strongly that the next flip is likely to be a tail? 

[104:] This "feeling" also has a name: the Gambler's Fallacy. If you have lost several hands of blackjack in a row, you just know your luck has to change, so you keep playing. Casinos rely heavily on this "feeling." 

[105:] At first, you might appeal to regression to the mean: If I have 11 heads in a row, and I know if I keep going I have to get closer and closer to 50:50, then clearly it must be the case that a tail is more likely. Wrong. It is wrong to think of regression to the mean as a rubber band. This is why the lunch partners are sure to split in anger after a while. Say the score is 519:481 for Brianna and Jonathan after a thousand flips. That's pretty close to 50:50  within 2 %. But the odds of a head on the next flip are still 50:50  Brianna's advantage is just as likely to grow as it is to shrink. 

[106:] It is true that in the next 1000 flips, the ratio of heads to tails is likely to grow ever closer to 50:50. But, the difference between heads and tails tends to grow with time, and the lead changes become less and less frequent. Brianna begins to feel like a lucky person  she believes there is such a thing as a free lunch, since she gets one every day. Jonathan resigns himself to being a perennial loser. The psychological effects are selfreinforcing. But psychology is beyond the scope of this chapter. Brianna's "luck" is just a consequence of purely random events obeying the rules of probablity. 