ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

How Data Mining Affects You People: Freakonomics was right, Terrorists Have Banking Patterns Too

Updated on December 21, 2011

Data Mining Is All Around You

Data Mining is a generic term that describes people looking at large amounts of data and find relations and patterns in the data that prove to be useful. With advent of computers that can store, access, and process large amounts of data, and increasingly sophisticated tools to do so, people are finding a lot more data that affects you and me.

When data mining was mentioned, most people think of how Amazon predicts what books you would like, or how Google would know what sort of ads to serve for you. Some may even know that your credit card issuer use data mining to detect fraudulent activities. However, data mining is far more wide-spread then you suspect, as they affect elections, cheating in schools, cheating online, law enforcement, counter-terrorism, and much much more.


Data Mining for Election Cheaters

November 2nd was election day in the US, and data mining is helping people spot "astro-turf" (fake grass-roots) uprising of sentiment for or against a candidate through social media such as Twitter. The Truthy Project at Indiana University spotted a few users who are "generating" a sentiment by promoting each other's tweets.

By studying Twittersphere "feeds" which shows all tweets, and the ability given by Twitter to look up Twitter users through the public API, scholars were able to perform "network analysis", which trace back tweets and retweets to the originator of the tweet, and from there, look up his or her information, and by looking at the relationship between the users (friends and such) you can predict whether the sentiment is actually spontaneous or actually instigated by someone to look spontaneous (but is actually not)

They found that some are using Twitter to generate attack messages and sending it to all related people as they were traced back to a group of accounts all created within a few minutes of each other. Twitter found out and suspended the accounts, but the tweets already reached over 60000 people. Real spontaneous sentiments would occur from widely disparate locations around the nation, from people who are not friends of each other

For more information, see Busted! Astroturf Campaign on Twitter

Data Mining for Cheating Teachers

There's a chapter in "Freakonomics" about how the Chicago schools archived results of achievement tests from every student for many years, and analysis of those results shows that some teachers cheated. But first we have to explain how a test is written.

In almost ALL standardized achievement tests, from the SAT down to STAR or whatever tests your state administers for your schools, whatever level, the questions are always arranged in order of ascending difficulty. The first question will be much easier than the last question. Thus, a student is far more likely to fail the later questions than the earlier questions.

Analysis of several classes that showed massive improvements shows that some of the test results are inconsistent with the predicted curve. Many students failed the initial easy questions, gotten a lot of the middle questions right even though they are more difficult, then failed the ending hard questions. The same students, when transferred to other classes (no longer under the same teacher), no longer shows such odd test patterns.

The conclusion is undeniable: the teacher erased the wrong results of the students, then 'fixed up" the results, and artificially inflating the test results. Several teachers were fired, according to the book, when the results were made public.

Read an excerpt from the book Freakonomics!  (PDF download)

Data Mining for Terrorists

Counterterrorism don't always happen with guns blazing. A lot of work is done looking up what makes up a terrorist so one can try to identify them before they strike.

In a chapter of the book Super Freakonomics, one identified what sort of banking characteristics would a terrorist have, based on the patterns identified in US and UK based on the terrorists in 9/11 and 7/7, respectively. Some of the characteristics are obvious: Muslim names, lack of life insurance, lack of "Friday after-work ATM visits", lack of normal living expenses paid through checks or debit cards, many foreign wire transfers, and so on.

Let's just say that not all characteristics are as obvious as these, and only data mining could have discovered these patterns that perhaps not even the terrorists were aware as common factors.

In 2006, it was suspected that major phone companies such as BellSouth, AT&T, and Verizon had turned over terabytes of call records to help NSA populate a database that was supposed to help them detect terrorist phone usage patterns. Based on post-9/11 investigations, 206 international calls were made by the 19 terrorists that conducted the attacks.

There are criticisms that terrorism don't happen often enough to lend it self to data mining, but that is up to debate.

Data Mining for Criminals

Police is now heavily involved in data mining in order to concentrate their enforcement efforts on the "worst" parts of the city by identifying patterns and increasing patrols in the area during the hours, so they get more results for the same amount of resources. Many of the statistics are now available online, either through the local police department, or through a public website where you can find the crimes reported all around you, from simple theft to vandalism all the way up to assault and murder.

NYPD was probably the first to developed a management system called COMPSTATS, which heavily relies on some software that it was often mistaken for the software system (and vice versa). It was later adopted by various police departments around the country. The idea is to locate patterns such as crime-heavy locations and hours and saturate those areas with police presence to deter crime.

Even Microsoft and IBM gotten into the business. Microsoft gave away several software suites to Interpol to help track down exploited children.

Visit and see crimes in your area


As more data gets tabulated patterns can be teased out of the data to identity groups, trends, and other relations that even the people involved may not be aware of. That means you can me.

While privacy is important, data mining is not all bad. It is necessary to understand how the data is collected, used, and disseminated.


    0 of 8192 characters used
    Post Comment
    • NaxaSolutions profile image


      3 years ago

      Nice Post. You r a great writer.. Keep it up

    • nicomp profile image

      nicomp really 

      8 years ago from Ohio, USA

      Great write-up. Thanks for putting it together.


    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at:

    Show Details
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the or domains, for performance and efficiency reasons. (Privacy Policy)
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
    ClickscoThis is a data management platform studying reader behavior (Privacy Policy)