DATA MINING IN SOCIAL NETWORKING
Data Mining In Social Network (General Overview)
Ramani and Nancy (2011) explains that Data mining is a wide area which integrates various methods from several fields including, statistics, machine learning, pattern identification, database systems, and artificial intelligence, for analyzing massive data. Applications in the analysis of Social network data have experienced significant development within the past few years. This may be partly attributed to the increasing trends towards users interacting with one another on the internet. There has been a huge amount of data mining Algorithms entrenched in these fields in performing different tasks related to data analysis in social networks.
The in formation technologies have brought with it new ways of connecting individuals through social networking for the purpose of community collaboration. Computation in social network is a new and developing model involving a multidisciplinary angle in modeling and analyzing social behaviors and characters on various media platforms so as to produce intelligent and interactive applications that could achieve relevant effective results. King, Li, and Chan, (2009) surveyed specific computational and machine learning techniques employed in social computing. These authors assessed various platforms in social networks such as social websites, games, social bookmarking, and information sites. This is whereby, the computational technology was needed in gathering, extracting, mining, processing and visualizing data so generated in these social network. The researchers then carried on evaluations surveys on various instances and techniques of data computational tasks, for instance, analysis in social networking, mining, link modeling, ranking and sentiment analysis, of which are being employed on these social platforms in acquiring relevant outcomes. The researchers also presented a small detachment of an elaborate list of references, which contained of over 150 references that were pertinent in relation to the latest progress in social computing aspect.
The multi-disciplinary framework in this respect includes sociology, organization theory, computing, social psychology, human computer interaction, communication theory and so on. Wang et al explains that computation facilitation in social studies and the dynamics of human aspect of sociology plus the design and utilization of information systems ought to be based on the social context. Even though various computational methods such as clustering, classification and regression and so on could be employed in social network data computing, there is one main element used in differentiating the input data for social computing and which is the extra layer in both social network and contextual information. Said differently, the imputed data is perceived to have suppressed information which an individual could take advantage in formulating the outcomes is no more sufficient in considering every record in the input room as self-sufficient. Hence, there is a necessity of intelligent computation methods owing to the fact that information is dynamic, nonlinear, voluminous, and complicated. Therefore, the outcome needs to be intelligent and at the same time adaptive. The aspect of social computing has therefore added advantage over other traditional computation methods.
The data acquired from social media sites in most cases differ from conventionally attributed value data for typical data mining. This may be attributed to the fact that social media data are mostly user created content on social network sites. Data in social media happens to be vast, distributed, noisy, dynamic and unstructured. These features pose s challenges to tasks relating to data mining in inventing new, effective methods and algorithms for data mining purposes. For instance, social media networks such as twitter and face book report web traffic of approximately more than 150 million users in USA alone.
Modern research works in two closely related areas of computer science, such as, data mining and machine learning have established methods for constructing statistical models of network data. Such Categories of data include; rational databases, social networks, network of webpage’s, and data with regard to interrelations of people, things, places, and events extracted from various texts. The influence of social networks to users establishes both co relational behaviors which need to be modeled and analyzed appropriately. With all the propaganda surrounding social networks, there is no doubt that the area proves to be a perfect environment for an elaborate data mining process, research, and development.
Algorithms Data Mining in Social Network
An evaluation was conducted by Yassine and Hajj (2010) on finding a basis in emotion mining of data from texts posted in social media. The authors proposed a new scaffold for characterizing emotional relations and then, employing these features in differentiating acquaintances and friends. Their main focus was to mine the emotional content of the texts in the social networking sites. They were interested in finding out whether the texts in the social media were an impression of the user’s emotions or not. For the intention of this study, the methods in text mining were performed on the basis of the comments obtained from social networks. The framework for the study included: data collection, database schemas, data mining, and data processing.
The commonly used informal language in social networking was taken into consideration before the actual text mining process. That was according to the authors the reason on why the framework of the study included the advancement of specific lexicons. Particularly, this study presented a novel perspective for evaluating friendship relationship as well as emotional expression in social media, where it considered both language and nature of these social sites. This study employed a Lebanese face book site as its case study. The study technique was unsupervised whereby it particularly used the K-means clustering algorithm. The finding in this study indicated that there was a high rate of accuracy for the model adopted in either predicting friendship or determining subjectivity.
A Paper written by Ignatov (2012) was focused on evaluating how Biclustering and Triculusting could be used and applied in analyzing online social networking data. In essence, the paper employs examples on how the new novel methods can be used by analyzing Vkontake data, (which is a social network in Russia). The authors used biclustering method in extracting a category of users who harbor the same interests and establish users belonging to the same group. In the use of triclustering, the scholars portray users as tags and employ them in describing the groups in Vkontakte category. After the social tagging procedure, the researchers give a recommendation to the groups in relevant categories to establish new friends from the groups which they have a similar taste in. In doing so, they find out that the methods are very efficient in analyzing massive data.
Premchand and Suseela (2007) demonstrated through example how the fuzzy model can be applied in the analysis of social networking data. The scholars pointed out that the social networking analysis depicts a meaningful correlation among the entities as a graph. These entities may include events, people, and symbols in a text, organizations, verbal sounds, states of the world and so on. They introduce a method of consolidating the constituents of the information in the fuzz graph termed as a consolidation operation, which is a type of binary operation.
The consolidation operator which this paper introduces has unique properties such as zero gain, incremental effect, and solid evidence alongside those of closure, associability, and commutability. According to the authors of this paper, this consolidation operator is a far much better in comparison to the conventional maximum value operator regarding data fusion issues.
According to RAJPUT et al, (2012), clustering method has been proved to be one of the most crucial techniques widely used in applications involving analysis of social networking sites, company performances, and aircraft accidental and so on. In recent times, communication, marketing and advertising through social networking have become more common as well as an interactive strategy for the users. The scholars in this study were concerned with finding the large scale measurement study and analysis, efficiency of a communication strategy, analysis of information with regard to the usage, interest of individuals on social networking sites in promoting and marketing their brands through social networking sites. The researcher used a pre- processing technique in cleaning data and performed the clustering procedure in generating patterns that work as heuristics for establishing more efficient social networking sites.
A study conducted by Ramani and Nancy (2011) was focused on comparing the performance of various Data Mining Classification Algorithms. These algorithms were implemented on the dataset “Social side of the Internet”. To begin, the entire dataset was categorized into 3 subsets. The whole attribute set included 162 attributes that were very vast and therefore, the feature reduction was performed to recognize the most relevant attribute for the target variable. The chosen attributes were given as input to specific algorithm of Data Mining Classification, and the rates of errors were compared and analyzed. The finding of this study indicated that in all the subsets that were considered in the study, RndTree Algorithm produced fewer rates of error in comparison to other algorithms.
Kvernvik, and Hildorsson (2010) observe that, majority of the algorithms employed in the analysis of social networking are costly in implementing them. Therefore, they suggest that using scalability in such a case would be a very effective alternative for social network analysis. The authors articulate that, scalability is improved by introducing approximations in the existing algorithms or performing the analysis for the closest neighbors of the targeted note. Using the proof of an example of a concept prototype, the researches proved that it is now possible to analyze large junks of data within a limited timeframe.
Presently, computational applications now go beyond personal computing, facilitating collaboration and social relations. Social computing is an IT area concerned with intersecting social and human studies that are associated with computer studies. A study carried out by Tavakolifard and Almeroth, (2012) was focused on surveying three famous computing services in social networking. These computing services include: recommender systems, reputation/trust systems and social networks. The scholars approached these services from the perspective of data representation and the major challenges facing these services such as cold start and network sparsity. The researchers presented a new graph model which offered a novel abstract taxonomy and a representation model that were common for the three services. Through the graph model, it could now become clearer that data from diversified contexts could be related in the sense that solutions could be explored, therefore, providing illumination and streamlining for the problems discovered above. In addition, the findings would also trigger new research.
Aligning users in social networks is a crucial process which improves harmonization and activity recommendations in social networks. The clustering technique in data mining could be employed in grouping users in social networks. However, the prevalent general objective clustering algorithm is not very efficient on social network data owing to the special nature of the user’s data in social networks. One cause for this is the constraint realized in aligning its users in social networks. The second reason is the necessity of capturing the massive data and information concerning these users, thus imposing complexity in computation with regard to an algorithm. The paper written by Nayak et al (2012) proposed an effective and scalable clustering algorithm that was constraint based. This algorithm was based on a measure of global similarity which takes into account the constraint of users and their relevance in social networks. The importance of each constraint was measured on the basis of occurrence of this constraint in the data base. Both external and internal tools were employed in this study in ascertaining the performance of an algorithm on a data set. Data used was obtained from a dating website. The findings indicated that the proposed algorithm was capable of increasing the accuracy of aligning users in social networks by 10% when compared to other algorithms.
Data mining is a wide area which integrates methods from specific fields such as pattern identification, artificial aptitude, machine learning, statistics, and data base systems for the purpose of analyzing massive data. There have been numerous numbers of data mining algorithms which are intended to perform various tasks related to data analysis. Some of the commonly used algorithms in data mining with regard to social network context include clustering, decision tress, recommender systems, RndTree Algorithm, reputation/trust systems, and the K-means clustering algorithm.
The Essence of Data Analysis in Social Network
A weblog is considered as a website whereby entries are created in a diary form. This site is normally created by a specific author or blogger and displayed in a reversed sequential style. Owing to the convenience plus the freedom of weblog publishing, this type of media provides a platform for terror gangs to promote their ideologies and as a platform for planning and organizing their criminal ideologies. A paper written by Yang and Tobun (2012) presented a framework for visualizing and analyzing weblog social networks that are embedded beneath relevant weblogs garnered through explorations that were topic based. The scholars used link analysis in identifying the relationship among bloggers in the creation of the weblog social network. They also use content analysis in relating similar blog messages so as to divulge the implicit relationship realized in the semantics to further improving the weblog analysis in social network. Users may employ various techniques in information visualization in exploring specific aspects of the underlying social networks at various levels of abstraction. With the ability of visualizing and analyzing weblog social networks in terrorist and related issues of crime, law enforcement agencies and also intelligence agencies will be in a position of having extra tools in ensuring the security of the state.
The quantity of a weblog collection is build upon an interactive relationship between information flows and bloggers within the network. The significance of blog analysis is to offer some understanding in the influence of synergy in the time of a blogger’s interactions, content contributions and the various degrees of influence from expressed thoughts. link analysis emphasizes on understanding the people’s aspects in the underlying social networks. Content analysis on the other hand emphasizes on comprehending the context and content in messages of the weblog and the flow of information within the network.
The social network analysis perceives social interactions in the form of links and nodes. Nodes stand for the bloggers; that is persons, organizations and groups within the network. On the other hand, links stands for interactions or associations between the bloggers. The form of associations existing within the weblogs is friends, subscriptions, comments, recommendations, and so on. The direction and quantity of links as well as the type of these links between the nodes could be employed in analyzing the social capitals of bloggers in a social network (Portes, 1998, p55- 57).
The ability to evaluate and recognize patterns of operations is an important element in behavioral and social sciences. In past times, this was usually done by use of either a visual technique or statistical method. On the other hand, the automated sequential analysis pattern by use of sophisticated tools of data mining and visualization as well as evaluation is poised to open up opportunities and possibilities for interactive exploration of the data. A study by Cooper et al, (2007) explored the addition of a sequential pattern method of identification by the analysis tool of visual activity termed as VISUAL-TimePAcTS and how its effectiveness in the process of pattern analysis of social sciences diary. The findings revealed that the method employed herein was very much accurate in identifying patterns and conveying them to the social scientist in a manner of facilitating easy and quick understanding of the importance of these patterns.
Surma and Furmanek (2010), postulates that, social networks have created great expectations related to their potential business value. The scholars conducted a study termed improving marketing response by data mining in social network. The main focus of this study was to evaluate the hypothesis that that rudimental application of data mining methods was poised to bring forth significant improvements in the accuracy of the marketing response throughout the virtual community. In this study, the regression tree and classification was employed in creating a classification tree which facilitated the formulation of specific regulations in identifying the proper target groups. In the empirical experiments performed, and which were based on the actual data from the social media, established that it was possible to improve marketing response through this perspective.
In the recent years, the increased and exponential rate at which the social digital media is used has resulted into an increase in popularity in social networking as well as the advancement in social computing. In essence, social network is a configuration created through social entities (such as individuals) and which are related to some particular type of interdependence such as acquaintance. Majority of users in social media platform such as Google+, Face book, Twitter, MySpace, and LinkedIn have many connections with regard to friends, followers and so on. In these linkages, there are those which are more significant in comparison to others. For example, some acquaintances of a particular user could be casual of which they encountered at some point in their life. On the other hand, there are other friends who may be so close that they care about him in a manner as consistently posting on his walls, viewing his updated profile, sending regular frequent messages, inviting him for friends and even following his tweets. Leun and Tanbeer (2011) in Identifying Strong categories of acquaintances among a group of friends in Social Networks analyzed data mining methods in social networks which could be employed by users in the social digital media in differentiating important friends from the large number in their social networks.
In the perspective of adaptive intelligent systems, it is important to establish user models to be considered in the purposes of adaptation. Personality is an interesting user face to be included in the user models. It may result into knowing the needs and preferences of the user in different circumstances. Therefore, eliciting the user personality in this direction is required. The information should therefore be acquired as unobtrusively as possible, yet they are not supposed to compromise the models of reliability. Ortigosa et al (2011) presented a method for eliciting user personality by evaluating the interactions between users in face book, with the objective of mining behavioral patterns. The researchers developed a technique termed as TP2010 which is a face book application that was based on the ZKPQ-50-cc questionnaire in extracting information on the user’s personality and his or her interactions in face book. The classifier model was established from analyzing a set of data from more than 10000 users. The findings indicated that it was practical to acquire information regarding the user’s personality by analyzing data garnered from social network interactions.
Analysis in social network data is typically based on the assumption regarding the significance of relationship existing among the interacting units. As we have seen in the literature review, the perspective in social networking incorporates models, applications theories and applications which are articulated on the basis of relational concepts or processes. Alongside the rising interest, and increased rate of network use, there seems to be consensus with regard to the underling principles of data mining in the perspective of social network. Other aspects noted in addition to the relational concepts include: The actors and their characters are perceived to be interdependency instead of being independent autonomous units, relational connections between the channels and actors for the resource flow. The social network models emphasize on persons who perceive the environment in social media as providing opportunities for or against the constraint of an individual.
Last but not least, the network models conceptualize on structure such as economical, social, and political and so on as the lasting pattern of relations among the actors. The unity of analysis in the analysis of social networks happens to be not on the individual’s perspective but an entity constituting a collection of individuals and how they are related with each another. The methods in social networking emphasize on dyads that are two actors, plus their ties, triads, that is (three actors and their ties) or broader systems (may be subgroups or the whole network).
From the literature review, we can get the notion that analysis in social networking is focused on unearthing patterns of the users’ interaction. Social network analysis is grounded on the assumption that patterns are essential features of those individuals who portray them. Social network analysis consists of a notion that the life of an individual depends largely on how that particular individual is related into the wider part of the social connection. There is another assumption that the success or failure of organizations and societies is dependent on patterns of their internal structures. In this review of literature, the perspective in social network analysis has been based on theories arranged in mathematical terms and also in analysis of systematical data. The graph theory in social data analysis has rapidly developed owing to the emergency of super computers.
Considerations in Mining and Exposing Social Network Data
As the interest in mining and sharing social network data grows, there comes an increased demand for privacy in preserving the published social network data. A paper presented by Watanabe, Amagasa and Liu, (2011) evaluated privacy risks associated with publishing social network data and the design policies for developing countermeasures. The focus for this study was threefold: First and foremost, the study was an attempt to defining the utility of the released data in terms of query types and level of exposure. The authors argued that employing the levels of exposure in characterizing the utility of anonym zed data could be employed as a general and metric that could be generally used. On the other hand, the query types could be used as the framework usage driven utility metric. Secondly, the authors of the paper identified two forms of inference attacks that were knowledge based and which could break some of the most representative graph permutation on the basis of anonymization method with regard to anonymity violations. The third objective in this paper was to describe some design considerations for developing countermeasures to be employed in privacy preserving social network data publishing.
A paper written by Chen and Li (2010) presented ecosystems, concepts and research challenges as well as directions for computing and analyzing social sciences. According to these authors computing social services is a new paradigm sweeping through social computing, service computing, internet and computing cloud systems. Computer systems, individuals, and physical things are related through complex control services, and dedicate communication which may belong to various individuals, organizations or entities. These entities create social networks by associating with one other. The major functions of social network computing consist of: clustering and service classifications, service migration, service composition, service recommendation and service discovery and publishing in the milieu of social networks. The authors emphasizes on the social aspects of service computing based on the framework of a large data in massive cyberspace. According to this assessment, the novel programming and business models are formed in paradigm and virtual of social services computing systems.
Ting, Wu and Chang (2009) observe that in modern times, social computing has aroused to become a famous internet appliance. The researchers developed a methodology of collecting and analyzing multi source social for the purpose of extracting data from the social networks. In this research, a system was employed in indicating how data could be extracted, processed and analyzed. This system proved very effective to users in using data as a resource for supporting personal decision. This methodology was established to assist in understanding the relationship between the “actors” which in this case could mean an individual, an organization, an object or an event (Borgatti, Everett, 200). In a social network, every actor is presented as anode and every pair of these nodes is connected by lines indicating this relationship. Again in social networks, the graph structure is mostly established by nodes and lines. Therefore, analysis in social is a methodology which is supposed to be employed in understanding the actors and the relationship in the social network. Actors are the most important components in a social network in defining individuals, objects or events. On the other hand, ties are employed in constructing the interrelations of these actors by employing the means of path in creating this relationship either indirectly or directly. Ties are categorized as either strong or weak ties in accordance to the strength of the relationship. They are very essential in identifying the subgroups in a social network. Relationships are employed to depict both the relationship and interactions between two categories of actors. Moreover, various relationships may make the network into reflecting specific features (Wellman, Berkowitz, 1988, p60).
Scott, (2002) explains that the most crucial measurements of the SNA include: network diameter, density, size, centrality and hole structures. Size as a measurement is supposed to figure out the quantity of links or nodes in a network. Density on the other hand is used to compute the closeness in a network. These methods of measurements are mostly employed in studies that related with a social network.
Visual representation of a social network is paramount to comprehending the network data and conveying the outcome of the analysis. Many of the software for social network analysis in this literature review have modules for visualizing the modules. Data exploration and collection should be undertaken through the display of ties and nodes in specific layout and attributing size, color and other complicated properties to the nodes. The visual representation of networks happens to be an influential technique for conveying complicated information. Great care should be undertaken in interpreting graph and node properties from only visual. This is because they may misrepresent structural properties which are best captured by means of quantitative analysis.
Researchers and analysts concerned with social network data ought to be aware that most texts found in social media have their specifications and may be informal. This aspect should be taken into consideration when performing data mining procedures in social networks. It is in fact a common phenomenon for users in such sites to employ a language which is less structured and informal in communicating with their friends. In addition to the use of unstructured language, sentences used by these users may lack appropriate syntax structure including misspelling of words. The use of other languages other than English among these users is also a common phenomenon (Corney, 2002, 25).
Corney proposed a concept and future work in testing if, by considering the use and structure of sentences and the syntax cues that were extracted by use of a parser could produce better outcomes. According to him it is important to extract the context of the comment and employing the actual world knowledge in evaluating the emotion from the comments. The context of the comment, alongside the systematic understanding of the specific comment could offer more knowledge on the understanding with regard to the nature of the association between two people. With regard to the challenges in dealing with a variety of languages for data mining in social networks, Corney posts that it is crucial to establish new and innovative ways in learning and coping with the language transformation both in these social media and chatting.
Granovetter,(1973) also adds his voice in this perspective by emphasizing that other than analyzing only the social capital of individual users in a social network, it is paramount to understanding the usefulness or the influence of the network to the users when performing data mining. Small and tighter networks cold be less influential or useful to the members in comparison to the network with plenty of weak or loose connections to users outside the main network.
How Data Mining Facilitates Knowledge Sharing Among Users in Social Networks
Establishing applications for data mining in environments that are decentralized necessitates the use of services in both the resources that are needed such as data, computing nodes, algorithms as well as sharing the inferred knowledge among the users after the completion of the data mining process. The necessity for an efficient execution of such functionalities is of great significance and scenarios that are of large scale such as the Grid, whereby centralized approaches are not scalable.
In dealing with these issues, Saberi, Trunfio and Talia evaluated the aspect of decentralized P2P approaches such as Semantic Overlay Networks (SONs) and social networks in defining a set of mechanisms and services in sharing both information and contingent knowledge in the knowledge grid for applications of distributed data mining. The grid knowledge offers publishing services and retrieval of metadata in a category of nodes in supporting the application design of the data mining. Since the search mechanism has been realized to be efficient, especially in the small scale scenarios, the use of SON and SN approaches could assist in making such services to be more effective in large scale scenarios. In essence, the authors presented in this paper two layered model in which an sn was build over a SON in effectively sharing both knowledge and research resources.
The social entity is connected to another entity as a friend, collaborator, next of kin, coworker, co-author, classmate, business partner, team member and so on. Therefore, identifying social entities or categories of entities which are associated with a large number of other social entities may offer useful information to users. For instance, between and among friends of k, some of which may be very famous in the sense that they are hugely connected. Identifying these popular friends is poised to provide essential knowledge to k since they may also harbor much social connection and therefore, may be very influential to the other members of the social groups. In the same way, anew person may want to be introduced to persons having high social connections so as to recognize more people in shorter time. The same comments are applied to users in other social network places.
The question that has been pondered in recent times in this perspective is whether, given the social network, how to identify popular social entities or friends, more especially in the case when the user has a large number of friends or connections? Going though the entire network manually is almost an impractical approach. This is whereby the more systematic approach comes in handy. In recent moments, data mining methods have been applied to social computations so as to retrieve implicit, formerly unknown and potentially important information or interesting knowledge for instance, identifying significant or strong friends from social networks. The number of messages posted by users in social sites such as Facebook could help in identifying strong or significant friend (Effendy, 2011, 810).
The social networks which are established of social entities such as individual users that are related to some specific type of interdependencies such acquaintance are very much popular in facilitating both collaboration and knowledge sharing among users. Such kind of interdependencies or interactions can be influenced or is dependent upon the features and characteristics of users such as the centrality, connectivity, weight, relevancy and activity in the social networks. In this perspective, some users in the social networks are considered as highly influential in comparison to others. The literature review has clearly portrayed how the computational model in social network data mining could assist users in discovering close or influential friends from their social networks.
Studies and research with regard to data mining have successful created various techniques, algorithms, and tools for handling the massive amount of information and data to solve actual problems. The main objective in the process of data mining is on how to efficiently handle the large scale data, gain insightful knowledge and extract actionable data. Since the social media is largely used for many purposes, there exists a large amount of the user generated data and these could be availed in the course of data mining. In social media, data mining could expand the capability of the researcher in appreciating new phenomena, owing to the use of the social media and improvement of business intelligence in providing enhanced services as well as encouraging innovative opportunities. For instance, data mining methods could assist in identifying close of influential people in a large blogosphere, recognizing, implicit hidden groups in a networking site, sensing user’s sentiments for the purposes of proactive planning, making new friends, Understanding the evolution in network, protecting privacy, changing an entity relationship, building or developing trust between or among the users or entities and security purposes. Mining data in social networks is therefore a rapidly growing multidisciplinary area where researchers and scholars from various backgrounds could make crucial contributions that could influence and trigger research in development as well as social media.
Social media sites are dynamic and continuously developing. For instance, Face book has recently incorporated many concepts such as the user’s timeline, creation of in-groups for users as well as the various changes with regard to user policies. The dynamic nature of the social media data is a crucial challenge for the continuous and rapidly nature of social media sites. There are many interesting queries in relation to the human behavior which could be researched and analyzed by using the data in social network media. These sites have also proved to be of great help to advertisers in terms of influencing people and potential clients in maximizing the market reach of their products within their budget. In addition, social networking and social media is also poised to help sociologists and social data analysts in unearthing the human behavior as either in group and out of group behavior for the users through data mining.