ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel
  • »
  • Technology»
  • Internet & the Web

Using Open Source Libraries for Sentiment Analysis on Social Media

Updated on February 14, 2016
Real-time data analytics by InternetLiveStats.com
Real-time data analytics by InternetLiveStats.com
Statista as a prominent and key portal for statistical data
Statista as a prominent and key portal for statistical data

Sentiment analysis is widely used by research scholars and others. In this approach, there are a number of tools and technologies available for fetching live data sets, tweets, emotional attributes, etc. Using these tools, real-time tweets and messages can be extracted from Twitter, Facebook, Whats App and many other social media portals. This article presents the fetching of live tweets from Twitter using Python programming.
The emotional attributes of Internet users on social media portals can be analysed, and certain conclusions arrived at and predictions made using this method. Let us suppose that we want to evaluate the overall cumulative score of a celebrity.
For this, Python or PHP based programming scripts can fetch live tweets about that celebrity from Twitter. After that, using natural language processing toolkits, the fetched data in the form of tweets or messages can be analysed and the popularity of that particular person or movie or celebrity can be more accurately assessed.

The following are the statistical reports from InternetLiveStats.com and Statista.com about the real time data on social media and related Web portals.
Around 350 million tweets flow daily from more than 500 million accounts on Twitter. Around 571 new websites are hosted every minute on the World Wide Web. There are more than 5 billion users on their mobile phones concurrently.

On WhatsApp, there are 700 million active users. There are more than 1 million new user registrations every month.Around 30 billion messages are sent and 34 billion received every day on WhatsApp. On Facebook, five new profiles are created every second. There are also around 83 billion fake profiles. Around 300 billion photos are uploaded every day by 890 billion daily active users. About 320TB of data is processed daily, with 21 minutes being spent by every user, on an average.
Now, the question is: how to do research on these datasets? Also, which technologies can be used to fetch the real-time datasets? The live streaming data can be fetched using Python, PHP, Perl, Java and many others used for network programming.

Live tweets fetched from Twitter in JSON format
Live tweets fetched from Twitter in JSON format

Fetching live streaming data from Twitter using Python code

Specific packages named Tweepy and Twitter with Python are required to fetch live tweets from Twitter. After these packages are installed, the Python code will be able to fetch live data from Twitter. These can be installed using the Pip command as follows:

$ python -m pip install tweepy
$ python -m pip install Twitter

The code to fetch live tweets from Twitter is:

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
my_app_consumerkey =
‘XXXXXXXXXXXXXXXXXXXXXXXXXX’
my_app_consumersecret = ‘
XXXXXXXXXXXXXXXXXXXXXXXXXX ‘
my_app_accesstoken = ‘
XXXXXXXXXXXXXXXXXXXXXXXXXX ‘
my_app_accesssecret = ‘
XXXXXXXXXXXXXXXXXXXXXXXXXX ‘
class TweetListener(StreamListener):
def on_data(self, mydata):
print mydata
return True
def on_error(self, status):
print status
auth = OAuthHandler(my_app_consumerkey,
my_app_consumersecret)
auth.set_my_app_accesstoken(my_app_
accesstoken, my_app_accesssecret)
stream = Stream(auth, TweetListener())
stream.filter(track=[Name of the Celebrity
or Movie or Person’])
OpenRefine tool for processing of messy datasets
OpenRefine tool for processing of messy datasets

After execution of this script, the output dataset is fetched in JSON file format. The JSON file can be parsed using the OpenRefine tool in the XML, CSV or any other readable format by the data mining and machine learning tools.
OpenRefine is a powerful and effective tool used for processing the Big Data and JSON file formats. In a similar way, the timeline of any person or Twitter ID can be fetched using the following code:

import tweepy
import time
my_app_consumerkey = ‘XXXXXXXXXXXXX’
my_app_consumersecret = ‘ XXXXXXXXXXXXX ‘
my_app_accesstoken = ‘ XXXXXXXXXXXXX ‘
my_app_accesssecret = ‘ XXXXXXXXXXXXX ‘
auth = tweepy.auth.OAuthHandler(my_app_consumerkey, my_app_
consumersecret)
auth.set_my_app_accesstoken(my_app_accesstoken, my_app_
accesssecret)
api = tweepy.API(auth)
list= open(‘Twitter.txt’,’w’)
if(api.verify_credentials):
print ‘Connected to Twitter Server’
currentuser = tweepy.Cursor(api.followers, screen_
name=”gauravkumarin”).item()
while True:
try:
u = next(currentuser)
list.write(u.screen_name +’ \n’)
except:
time.sleep(15*60)
u = next(currentuser)
list.write(u.screen_name +’ \n’)
list.close()

The following script of Python can be used to parse the JSON to CSV format:

JSON - CSV Parser
import fileinput
import json
import csv
import sys
l = []
for currentline in fileinput.input():
l.append(currentline)
currentjson = json.loads(‘’.join(l))
keys = {}
for i in currentjson:
for k in i.keys():
keys[k] = 1
mycsv = csv.DictWriter(sys.stdout, fieldnames=keys.keys(),
quoting=csv.QUOTE_MINIMAL)
mycsv.writeheader()
for row in currentjson:
mycsv.writerow(row)

Fetching data from Twitter using PHP code

For fetching live tweets using PHP code, the API TwitterAPIExchange is required.
After including this API in this PHP code, the script will directly interact with the Twitter servers and live streaming data.

<?php
error_reporting(0);
define(‘CURRENTDBHOST’,’localho
st’);
define(‘CURRENTDBUSERNAME’,’root’);
define(‘CURRENTCURRENTDBPASSWORD’
,’’);
define(‘ CURRENTDBPASSWORD
‘,’Twitter’);
define(‘CURRENTTWEETTABLE’,’Twitte
rtable’);
require_once(‘TwitterAPIExchange.
php’);
$settings = array(
‘oauth_my_app_accesstoken’ =>
“XXXXXXXXXXXXXXXXXX”,
‘oauth_my_app_accesstoken_secret’ => “
XXXXXXXXXXXXXXXXXX “,
‘my_app_consumerkey’ => “ XXXXXXXXXXXXXXXXXX “,
‘my_app_consumersecret’ => “ XXXXXXXXXXXXXXXXXX “
);
$url = “https://api.Twitter.com/1.1/statuses/user_
timeline.json”;
$myrequestMethod = “GET”;
$getfield = ‘?screen_name=gauravkumarin&count=20’;
$Twitter = new TwitterAPIExchange($settings);
$string = json_decode($Twitter->setGetfield($getfield)
->buildOauth($url, $requestMethod)
->performRequest(),$assoc = TRUE);
if($string[“errors”][0][“message”] != “”)
{echo “<h3>Sorry, there was a problem.</
h3><p>Twitter returned the following error message:</p><p>
<em>”.$string[errors][0][“message”].”</em></p>”;exit();}
foreach($string as $items)
{
echo “Tweeted by: “. $items[‘currentuser’]
[‘name’].”<br />”;
echo “Screen name: “. $items[‘currentuser’]
[‘screen_name’].”<br />”;
echo “Tweet: “. $items[‘text’].”<br />”;
echo “Time and Date of Tweet:
“.$items[‘timestamp’].”<br />”;
echo “Tweet ID: “.$items[‘id_str’].”<br />”;
echo “Followers: “. $items[‘currentuser’]
[‘followers’].”<br /><hr />”;
echo insertTweetsDB($items[‘currentuser’]
[‘name’],$items[‘currentuser’][‘screen_name’],$items[‘text’]
,$items[‘timestamp’],$items[‘id_str’],$items[‘currentuser’]
[‘followers’]);
}
function insertTweetsDB($name,$screen_
name,$text,$timestamp,$id_str,$followers){
$mysqli = new mysqli(CURRENTDBHOST,
CURRENTDBUSERNAME, CURRENTCURRENTDBPASSWORD, MYDBNAME);
if ($mysqli->connect_errno) {
return ‘Failed to connect to Database: (‘ .
$mysqli->connect_errno . ‘) ‘ . $mysqli->connect_error;
}
$QueryStmt=’INSERT INTO ‘.MYDBNAME.’.’.CURRENT
TWEETTABLE.’ (name, screen_name, text, timestamp, id_str,
followers) VALUES (?,?,?,?,?,?);’;
if ($insert_stmt = $mysqli->prepare($QueryStmt)){
$insert_stmt->bind_param(‘ssssid’,
$name,$screen_name,$text,$timestamp,$id_str,$followers);
if (!$insert_stmt->execute()) {
$insert_stmt->close();
return ‘Tweet Creation cannot be done at
this moment.’;
}elseif($insert_stmt->affected_rows>0){
$insert_stmt->close();
return ‘Tweet Added.’;
}else{
$insert_stmt->close();
return ‘No Tweet were Added.’;
}
}else{
return ‘Prepare failed: (‘ . $mysqli->errno .
‘) ‘ . $mysqli->error;
}
}

Using these technologies, the parsing, processing and predictions on real-time tweets and their association with a particular event can be mapped. News channels adopt these technologies for exit polls, which help to predict the probability of a political party or candidate winning. In a similar manner, the success of a movie can be predicted after careful analysis of the live streaming data.
Research scholars can work on such real life topics related to Big Data analytics, so that effective and presentable research work can be accomplished.

Comments

    0 of 8192 characters used
    Post Comment

    No comments yet.