ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

Big Data - What Is It and How Does It Affect Me?

Updated on July 1, 2014

What Is “Big Data”?

“Big Data” is a computer term that refers to very large data sets. These sets are so large that the typical database management tools aren’t able to process them in a reasonable amount of time. The complexity of the data is also a defining characteristic of Big Data.

“Big” means different things to different organizations. For small companies used to dealing with data in spreadsheets, a relational database in the dozens of gigabytes may be considered “big” since they don’t have the tools or experience to efficiently deal with it. For larger organizations like Google or Facebook, dealing with many hundreds of terabytes is routine.


Who Uses Big Data?

The amount of data being captured continues increasing steadily. So too does computing power and processing techniques. This has increased the number of groups that can take advantage of what can be found in Big Data.

These groups include:

  • Governments

  • Scientists

  • Business

  • Military

  • Industry

Data Size Terminology

1,000 bytes
1 Kilobyte (KB)
1,000 KB
1 Megabyte (MB)
1,000 MB
1 Gigabyte (GB)
1,000 GB
1 Terabyte (TB)
1,000 TB
1 Petabyte (PB)
1,000 PB
1 Exabyte (EB)
1 EB
1,000,000,000,000,000,000 bytes
Data sizes using hard disk manufacturer's terms.

What Does Big Data Do?

By itself, Big Data doesn’t do anything. There may be lots of useful information hidden away in the data, but it just sits there until someone uses tools to discover it. These analytical tools can find trends and correlations that may not be noticeable with smaller sets of data.

One such tool is Apache Hadoop.

Hadoop is open source framework for processing large data sets using clusters of commonly available hardware. It uses HDFS (Hadoop Distributed File System) to spread the data across multiple computers and replicates the data in multiple places to protect against loss in case of hardware failure. Hadoop also spreads the processing of the data over multiple computers - possibly thousands - to crunch the data.

Where Does This Data Come From?

The world today is very connected and very monitored. Data is constantly being generated and captured in many different ways. A short list of examples includes:

  • Internet search terms

  • Transaction information from stores

  • Traffic monitors in cities and on highways

  • Weather stations

  • Telephone call logging

  • Television viewing information from cable companies

  • Medical tests

  • Scientific instruments

  • Usage data from electric and gas utilities

Examples of Big Data in Action

The following examples are just a few ways that different groups are using Big Data for different purposes.



Google is one of the original Big Data companies. The size of Google’s data collection isn’t publicly acknowledged. Because they constantly scan and index most everything accessible on the Internet, they’re likely to have hundreds of terabytes of data.

They also closely track the people who use their services. You pay for these “free” services with information about your Internet habits gleaned from your search terms and browser cookies. This information becomes part of the Big Data and is used to select advertisements to display to you. In effect, the customer that Google sells to is its advertisers and the product it sells is information about you!

Large Hadron Collider

The Large Hadron Collider, built and operated by CERN, was built to conduct experiments in particle and high energy physics. In 2012, data about the particle collisions was being generated at a rate of 25 petabytes per year. This Big Data is analyzed using the world’s largest computing grid made up of 170 facilities in a worldwide network across 36 countries.

Among other things, this has confirmed the existence of the elementary particle called the Higgs Boson (the “god particle”), which was first theorized in 1969.



Amazon has transaction records for over 215 million active customer accounts and 1.5 billion items in its on-line store. It also has information about shipping, product availability, product reviews, supply, demand, pricing, and many other things. It uses this information to give its customers a better shopping experience and to make suggestions to get them to spend more. Amazon also packages this information and sells it marketers who use it to display advertising tailored to you.

Health Care

Big Data is used in many areas of health care. There are four general sources of this data:

  1. Clinical data (e.g. patient records)

  2. Pharmaceutical research data (e.g. clinical trial results)

  3. Activity and cost data

  4. Patient behavior data

Big Data provides the tools to correlate information from these different sources to identify patients that are more at risk for certain medical conditions. It helps researchers understand which treatments are more or less effective for certain conditions and certain people. The cost effectiveness for different treatments is also researched.



Weather monitoring and prediction is a Big Data application. Weather is very complex and the more information you have from as many monitoring sources as possible, the better the predictions will be.

As of 2013, The Weather Company, the parent company of The Weather Channel and other weather-related outlets, takes in 2.2 million weather data points from around the world 4 times per hours. That’s over 211 million data points daily. The new system that they’re in the process of implementing will increase that to 2.5 billion data points 15 times per hour - an increase of more than 4,200 times.

In addition to using all this data to predict the weather, The Weather Company also uses it to select and present relevant advertising to local areas affected by the weather. If rain is in your forecast, don’t be surprised to see advertisements for umbrellas.


One of the largest retail chains in the world, Wal-Mart handles more than 1 million customer transactions every hour. These transactions are fed into databases estimated at more than 2.5 petabytes and include information on the purchasing activity of over 145 million Americans. They use this information in their customer relationship management tools to not only track all of your purchases, but to also make predictions about your future interests.

A division of Wal-Mart called @WalmartLabs developed a large database called the “Social Genome”. They describe the Social Genome as “a vast, constantly changing, up-to-date knowledge base with hundreds of millions of entities and relationships. We then use the Social Genome to perform semantic analysis of social media and to power a broad array of e-commerce applications.”

Among other things, this information is used to determine how to best market products to its customers and the best time to mark down prices in different locations to maximize sales.

Privacy Concerns

While there is no question that there are benefits to Big Data, there are also risks. The biggest concern that most people have is privacy.

Because of recent leaks, it is now known that the US government, in cooperation with the governments of Canada, Great Britain, Australia, and New Zealand, has been gathering huge amounts of data regarding phone calls, and information from Google and Yahoo accounts and other sources. While the stated purpose of this Big Data is to track down terrorists, it includes information about many millions of ordinary citizens.

All the Big Data being processed and sold to marketers for the purpose of getting you to buy their products also seems to border on invasions of privacy. In one famous case, the department store chain Target was able to use the data it gathered about the purchasing habits of a high school girl to determine that she was pregnant. Her father, who was unaware of her condition, was quite surprised when she started receiving baby-related coupons.

I recently encountered a web site that had records including my name, age, address, phone number, and the same information about 10 relatives in my state all grouped as probable family members. This was all found in publicly-available records - Big Data.

Do you use "Customer Loyalty" cards?

See results

How Can I Protect Myself?

There is little you can do to stay completely out of these data sets without impacting your usual way of doing business and interacting on-line, but there are a few things you can do that help.

- Pay Cash

Paying by cash instead of using credit or debit cards will allow you to avoid having certain transactions tracked. While this works for brick-and-mortar stores, it doesn't help when shopping on-line.

- Disable Browser Cookies

You can disable cookies in your browser, but this is likely to affect your browsing experience. Some sites won't work at all. Plus, there are ways other than cookies that web sites use to track you.

- Keep Your Phone Number To Yourself

Many stores routinely ask you for your phone number at the cash register. I'm always surprised at how many people automatically give it out. Cashiers are often shocked when I refuse to give it to them.

- Don't Use Store "Customer Loyalty" Cards

The sole purpose of these cards is to feed as much information as possible about you into the store's database. I don't mind losing the discount to keep a little privacy.

- Social Media

Be mindful of what you post on social media sites. Assume that everything you post will be examined by Big Data tools for marketing and other purposes. Social media marketing is one use of Big Data.


Big Data is a relatively new thing; we’ve only been able to process such large amounts of data in a useful time frame for a few years. Like all tools, Big Data can be used for both positive and negative purposes. Society will need to decide what kind of limitations it wants to put on its use and pressure government to enact laws enforcing those limits.


    0 of 8192 characters used
    Post Comment

    • ronbergeron profile imageAUTHOR

      Ron Bergeron 

      4 years ago from Massachusetts, US

      It's true that personal privacy is very much at risk. There are a few steps you can take to reduce the risk a little as I described in the article, but there's no way to live in the modern world and be completely invisible.

    • smartknowledge profile image


      4 years ago

      Personal privacy is almost non-existent. There is little the common man can do about this, but it's a fact. If you don't like it, stop using it!

    • ronbergeron profile imageAUTHOR

      Ron Bergeron 

      4 years ago from Massachusetts, US

      There's a huge amount of data available. I think that one of the biggest challenges is knowing what questions to ask of it. Thanks for your comment.

    • profile image


      4 years ago

      I wonder how data mining will evolve with Big data. Really useful thoughts presented in this Hub.

    • ronbergeron profile imageAUTHOR

      Ron Bergeron 

      4 years ago from Massachusetts, US

      Thanks, Christy. I hope you're enjoying San Fran - I'm about to get 8+ inches of snow here in the Boston area.

    • Christy Kirwan profile image

      Christy Kirwan 

      4 years ago from San Francisco

      Great overview, ron. I like your tips on protecting personal privacy as well.


    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at:

    Show Details
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the or domains, for performance and efficiency reasons. (Privacy Policy)
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)