ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel
  • »
  • Books, Literature, and Writing»
  • Books & Novels

Book Review: The Signal and the Noise

Updated on October 17, 2015

"All models are wrong, but some models are useful” George E. P. Box

This book is centered around the idea of using big data to make predictions, and why we so often fail at it. Even with the explosion of information now available to us, it isn’t very helpful because we aren't able to differentiate between the signal and the noise. The problem is we can see patterns when there are none or our assumptions can be wrong. It ultimately takes not only seeing the patterns, but also making correct assumptions in order to make good predictions.

Examples of Models

Nate Silver runs through a number of examples where predictions are being made by using big data. These examples include predicting weather, earthquakes, political elections, baseball players stats, and thats just to name a few. Silver is more than qualified to be speaking on this subject as well given his body of work. He has created a couple of notable models over the years. The first one he talks about is his PECOTA model. It is his model that he uses to predict baseball players stats. To do this, his model compares players to other players that had similar build and abilities. Then by looking at how those similar players faired in the majors, Silver could predict up and coming stars. Another model that he created, FiveThirtyEight, is used to predict political elections, and turned into a successful website as well.

Bayesian Theory

All of Silver’s models were created using a thought process called Bayesian Reasoning. This theory/reasoning essentially believes that with the more we learn, our approximations will get us closer and closer to the truth. The mathematical equation that Silver uses has 3 variables that are used in a simple equation shown below.

Description
Variables/Equation
Prior Probability
x
Probability of something happening if hypothesis is correct
y
Probability of something happening if hypothesis is incorrect
z
Posterior Probability
(x*y)/(x*y+z*(1-x))

As new information becomes available, you continue to update your predictions. As you go through the next iteration of your probability, your Posterior Probability becomes your new Prior Probability. This loop continues as your predictions get more and more accurate (hopefully).

Silver’s first example calculation using this equation is actually an odd one; cheating. It just shows that this reasoning can be used for many different scenarios in life. He starts the scenario by saying, “How likely is it that your spouse is cheating on you if you find a pair of panties that aren’t yours in the house?” In order to mathematically calculate the probability, there are 3 variables that need to estimated.

First you need to estimate the probability of the panties meaning that you are being cheated on. You think he would be more careful if he actually was cheating, so lets just say 50%. Secondly, what is the probability of the panties meaning that you are not being cheated on. There are plenty of innocent reasons for the panties showing up as well, but they are less likely. So lets estimate 5%. The final step is to determine what the probability that you were being cheated on before you found the panties. This may is very difficult to objectively estimate, so we can look at statistical information. Some studies show that 4% of married partners cheat on their spouse in a given year, so we can use that number. Putting these numbers into the equation we arrive at the new probability of:

Description
Variables/Equation
Prior Probability
x = 4%
Probability of something happening if hypothesis is correct
y = 50%
Probability of something happening if hypothesis is incorrect
z = 5%
Posterior Probability
(x*y)/(x*y+z*(1-x)) = (.04*.50)/(.04*.50+.05*(1-.04)) = .294 = 29.4%

Interesting Side Note

One section of the book talked about the stock market. It mentioned how Shriller was able to statistically look at the P/E Ratio of S&P 500 over the years and show the probable returns. (see below). This method could also help to indicate bubbles in the market. Of the 8 periods where the S&P 500 has increased by 2x it's long-term average, 5 times were followed by a severe and notorious crashes (Great Depression, dot-com bubble, Black Monday 1987)

The second half of the book is a little more practical in terms of application, but ultimately there were nuggets of good information throughout the whole book. I was personally reading this book in hopes of trying to make my own models, so I was a little disappointed in how little he talked through actual numbers. I would have enjoyed going over more mathematical examples, but I know those aren’t everyones cup of tea. Some of the more practical tips that I was able to pull out of the book is shown below.

  • 3 basic principles for creating a model:
  1. Think probabilistically
  2. Make the best prediction with the information you have today and update as new information becomes available
  3. Look for consensus on your results
  • Always calibrate your model. Make adjustments to your model based on the results of your model. For example, if your model is consistently predicting values higher than what actually happens, then you can know that your model predicts high and you can adjust accordingly.
  • The very act of Making predictions may change the way people act, thus altering the outcome and making your model incorrect.
  • When something follows the power-law distribution it has a very useful property: you can forecast the number of large scale events from the small scale ones (earthquakes and terrorist attacks)
  • Often the most complicated models make worse predictions than a more simplified version. But models can’t be over simplified either.
  • Watch out for causation vs correlation: There was a period between 1967 through 1997 where the outcome of the super bowl "predicted" the direction of the stock market. If a team from the AFC won then the market would be down that year, and if a team from the NFC won then the market would be up. There of course is not actually causation between the stock market and the Super Bowl. Be sure to not make this same mistake.

Conclusion

Ultimately models are simplifications of the universe, and aren’t meant to be the universe. The best model of a cat is a cat. The usefulness of any model relies on how accurate our assumptions and simplifications are. Finding patterns are what mediocre gamblers are able to do. The key is determining if those patterns are signals or just noise.

Comments

    0 of 8192 characters used
    Post Comment

    No comments yet.

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: "https://hubpages.com/privacy-policy#gdpr"

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized. (Privacy Policy)
    CloudflareThis is used to quickly and efficiently deliver files such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisements has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)