ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

Using Open Source Libraries for Sentiment Analysis on Social Media

Updated on February 14, 2016
Real-time data analytics by
Real-time data analytics by
Statista as a prominent and key portal for statistical data
Statista as a prominent and key portal for statistical data

Sentiment analysis is widely used by research scholars and others. In this approach, there are a number of tools and technologies available for fetching live data sets, tweets, emotional attributes, etc. Using these tools, real-time tweets and messages can be extracted from Twitter, Facebook, Whats App and many other social media portals. This article presents the fetching of live tweets from Twitter using Python programming.
The emotional attributes of Internet users on social media portals can be analysed, and certain conclusions arrived at and predictions made using this method. Let us suppose that we want to evaluate the overall cumulative score of a celebrity.
For this, Python or PHP based programming scripts can fetch live tweets about that celebrity from Twitter. After that, using natural language processing toolkits, the fetched data in the form of tweets or messages can be analysed and the popularity of that particular person or movie or celebrity can be more accurately assessed.

The following are the statistical reports from and about the real time data on social media and related Web portals.
Around 350 million tweets flow daily from more than 500 million accounts on Twitter. Around 571 new websites are hosted every minute on the World Wide Web. There are more than 5 billion users on their mobile phones concurrently.

On WhatsApp, there are 700 million active users. There are more than 1 million new user registrations every month.Around 30 billion messages are sent and 34 billion received every day on WhatsApp. On Facebook, five new profiles are created every second. There are also around 83 billion fake profiles. Around 300 billion photos are uploaded every day by 890 billion daily active users. About 320TB of data is processed daily, with 21 minutes being spent by every user, on an average.
Now, the question is: how to do research on these datasets? Also, which technologies can be used to fetch the real-time datasets? The live streaming data can be fetched using Python, PHP, Perl, Java and many others used for network programming.

Live tweets fetched from Twitter in JSON format
Live tweets fetched from Twitter in JSON format

Fetching live streaming data from Twitter using Python code

Specific packages named Tweepy and Twitter with Python are required to fetch live tweets from Twitter. After these packages are installed, the Python code will be able to fetch live data from Twitter. These can be installed using the Pip command as follows:

$ python -m pip install tweepy
$ python -m pip install Twitter

The code to fetch live tweets from Twitter is:

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
my_app_consumerkey =
my_app_consumersecret = ‘
my_app_accesstoken = ‘
my_app_accesssecret = ‘
class TweetListener(StreamListener):
def on_data(self, mydata):
print mydata
return True
def on_error(self, status):
print status
auth = OAuthHandler(my_app_consumerkey,
accesstoken, my_app_accesssecret)
stream = Stream(auth, TweetListener())
stream.filter(track=[Name of the Celebrity
or Movie or Person’])
OpenRefine tool for processing of messy datasets
OpenRefine tool for processing of messy datasets

After execution of this script, the output dataset is fetched in JSON file format. The JSON file can be parsed using the OpenRefine tool in the XML, CSV or any other readable format by the data mining and machine learning tools.
OpenRefine is a powerful and effective tool used for processing the Big Data and JSON file formats. In a similar way, the timeline of any person or Twitter ID can be fetched using the following code:

import tweepy
import time
my_app_consumerkey = ‘XXXXXXXXXXXXX’
my_app_consumersecret = ‘ XXXXXXXXXXXXX ‘
my_app_accesstoken = ‘ XXXXXXXXXXXXX ‘
my_app_accesssecret = ‘ XXXXXXXXXXXXX ‘
auth = tweepy.auth.OAuthHandler(my_app_consumerkey, my_app_
auth.set_my_app_accesstoken(my_app_accesstoken, my_app_
api = tweepy.API(auth)
list= open(‘Twitter.txt’,’w’)
print ‘Connected to Twitter Server’
currentuser = tweepy.Cursor(api.followers, screen_
while True:
u = next(currentuser)
list.write(u.screen_name +’ \n’)
u = next(currentuser)
list.write(u.screen_name +’ \n’)

The following script of Python can be used to parse the JSON to CSV format:

JSON - CSV Parser
import fileinput
import json
import csv
import sys
l = []
for currentline in fileinput.input():
currentjson = json.loads(‘’.join(l))
keys = {}
for i in currentjson:
for k in i.keys():
keys[k] = 1
mycsv = csv.DictWriter(sys.stdout, fieldnames=keys.keys(),
for row in currentjson:

Fetching data from Twitter using PHP code

For fetching live tweets using PHP code, the API TwitterAPIExchange is required.
After including this API in this PHP code, the script will directly interact with the Twitter servers and live streaming data.

$settings = array(
‘oauth_my_app_accesstoken’ =>
‘oauth_my_app_accesstoken_secret’ => “
‘my_app_consumerkey’ => “ XXXXXXXXXXXXXXXXXX “,
‘my_app_consumersecret’ => “ XXXXXXXXXXXXXXXXXX “
$url = “
$myrequestMethod = “GET”;
$getfield = ‘?screen_name=gauravkumarin&count=20’;
$Twitter = new TwitterAPIExchange($settings);
$string = json_decode($Twitter->setGetfield($getfield)
->buildOauth($url, $requestMethod)
->performRequest(),$assoc = TRUE);
if($string[“errors”][0][“message”] != “”)
{echo “<h3>Sorry, there was a problem.</
h3><p>Twitter returned the following error message:</p><p>
foreach($string as $items)
echo “Tweeted by: “. $items[‘currentuser’]
[‘name’].”<br />”;
echo “Screen name: “. $items[‘currentuser’]
[‘screen_name’].”<br />”;
echo “Tweet: “. $items[‘text’].”<br />”;
echo “Time and Date of Tweet:
“.$items[‘timestamp’].”<br />”;
echo “Tweet ID: “.$items[‘id_str’].”<br />”;
echo “Followers: “. $items[‘currentuser’]
[‘followers’].”<br /><hr />”;
echo insertTweetsDB($items[‘currentuser’]
function insertTweetsDB($name,$screen_
$mysqli = new mysqli(CURRENTDBHOST,
if ($mysqli->connect_errno) {
return ‘Failed to connect to Database: (‘ .
$mysqli->connect_errno . ‘) ‘ . $mysqli->connect_error;
TWEETTABLE.’ (name, screen_name, text, timestamp, id_str,
followers) VALUES (?,?,?,?,?,?);’;
if ($insert_stmt = $mysqli->prepare($QueryStmt)){
if (!$insert_stmt->execute()) {
return ‘Tweet Creation cannot be done at
this moment.’;
return ‘Tweet Added.’;
return ‘No Tweet were Added.’;
return ‘Prepare failed: (‘ . $mysqli->errno .
‘) ‘ . $mysqli->error;

Using these technologies, the parsing, processing and predictions on real-time tweets and their association with a particular event can be mapped. News channels adopt these technologies for exit polls, which help to predict the probability of a political party or candidate winning. In a similar manner, the success of a movie can be predicted after careful analysis of the live streaming data.
Research scholars can work on such real life topics related to Big Data analytics, so that effective and presentable research work can be accomplished.


    0 of 8192 characters used
    Post Comment

    No comments yet.


    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at:

    Show Details
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the or domains, for performance and efficiency reasons. (Privacy Policy)
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
    ClickscoThis is a data management platform studying reader behavior (Privacy Policy)