Custom cover image
Custom cover image

Aspect Based Sentiment Analysis Using Real Time Social Media Data (PhD Thesis)

By: Material type: TextTextLanguage: English Publication details: Karachi : NED University of Engineering and Technology Department of Computer Science and Information Technology, 2019Description: XX, 153 p. : illSubject(s): DDC classification:
  • 006.312378242 QAZ
Summary: Abstract : The Internet is rapidly becoming the main source of information. It has also become a read-write platform with the increase of online social networks (OSN), such as online news, online reviews sites, online forums etc. This has made the user-generated content (UGC) available to all users of internet specially the social media users in the form of unstructured text and is gaining a significant attention due to its importance for many businesses. Text mining is necessary to mine information and discover facts from this enormous amount of textual data available on social media networks. Working on text data means that we need a better understanding of the text. Natural Language Processing (NLP) techniques are very helpful for superior analysis of the textual data. The Sentiment Analysis (SA) is a well-known technique of text mining. Opinion of people towards individuals, events or any other topic is studied in text mining, which is done by using computational techniques. The goal of SA is to find thoughts and feelings expressed in a text document by identifying the opinions expressed in it and then accordingly classify opinion polarity. The thesis offers a framework for Aspect Based Sentiment Analysis (ABSA) for social media data by analyzing textual data extracted from Twitter, which are in English language. This I framework comprises of different units. The first unit is the preprocessing and data cleaning unit which performs various functions on the data collected. Data annotation process is performed on the preprocessed data. The second unit applies numerous NLP techniques on the dataset which includes removal of stopwords, swapping the negation words and the word following the negated word with the antonyms of the negated word and the applying part-of-speech tags for selective words (verbs, adjectives). During the time of this research, twitter increased its tweet size from 140 characters to 280 characters which enable twitter users to write more text in a tweet and it was observed that people started using idioms in their tweet, so we decided to include a method for detection of idioms present in the text which helps us to get better results in our analysis. 1hird unit implements classification method by using various machine learning techniques for example Naive Bayes and SVM. It also uses three feature selection approaches like unigrams, bigrams and trigrams to perform aspect based sentiment analysis on the real time data available on social media platform named Twitter. The framework components were analyzed at each stage to explore different configurations of each component of framework and to find which is suitable for data used in different scenario. The framework is also applied on the English language benchmark corpus named Twitter Airline dataset to enhance its analysis in addition to the dataset which was prepared during this research work from OSN twitter site. The framework was applied and tested on the English language corpora. The experiments performed during this research shows that the OSN data is extremely unbalanced and the accuracy of results is affected by buzzwords and spams which are present in tweets. This was tackled and improved by applying different text processing techniques. The results from different experiments performed shows that the classification accuracy of the NB classifier was improved and it also reduces the training time of the classifiers. The performance was measured with accuracy, F-measure and training time criteria. The results also demonstration that SVM and Boosting classifier gives better results for noisy data then other machine learning classifiers. Key Words: Text mining, Sentiment Analysis, Natural Language processing.
Holdings
Item type Current library Shelving location Call number Status Date due Barcode
Reference Collection Reference Collection Government Document Section Govt Publication Section 006.312378242 QAZ Available 96736

Abstract :

The Internet is rapidly becoming the main source of information. It has also become a read-write platform with the increase of online social networks (OSN), such as online news, online reviews sites, online forums etc. This has made the user-generated content (UGC) available to all users of internet specially the social media users in the form of unstructured text and is gaining a significant attention due to its importance for many businesses. Text mining is necessary to mine information and discover facts from this enormous amount of textual data available on social media networks. Working on text data means that we need a better understanding of the text. Natural Language Processing (NLP) techniques are very helpful for superior analysis of the textual data.

The Sentiment Analysis (SA) is a well-known technique of text mining. Opinion of people towards individuals, events or any other topic is studied in text mining, which is done by using computational techniques. The goal of SA is to find thoughts and feelings expressed in a text document by identifying the opinions expressed in it and then accordingly classify opinion polarity.

The thesis offers a framework for Aspect Based Sentiment Analysis (ABSA) for social media data by analyzing textual data extracted from Twitter, which are in English language. This I framework comprises of different units. The first unit is the preprocessing and data cleaning unit which performs various functions on the data collected. Data annotation process is performed on the preprocessed data. The second unit applies numerous NLP techniques on the dataset which includes removal of stopwords, swapping the negation words and the word following the negated word with the antonyms of the negated word and the applying part-of-speech tags for selective words (verbs, adjectives). During the time of this research, twitter increased its tweet size from 140 characters to 280 characters which enable twitter users to write more text in a tweet and it was observed that people started using idioms in their tweet, so we decided to include a method for detection of idioms present in the text which helps us to get better results in our analysis. 1hird unit implements classification method by using various machine learning techniques for example Naive Bayes and SVM. It also uses three feature selection approaches like unigrams, bigrams and trigrams to perform aspect based sentiment analysis on the real time data available on social media platform named Twitter.

The framework components were analyzed at each stage to explore different configurations of each component of framework and to find which is suitable for data used in different scenario. The framework is also applied on the English language benchmark corpus named Twitter Airline dataset to enhance its analysis in addition to the dataset which was prepared during this research work from OSN twitter site.

The framework was applied and tested on the English language corpora. The experiments performed during this research shows that the OSN data is extremely unbalanced and the accuracy of results is affected by buzzwords and spams which are present in tweets. This was tackled and improved by applying different text processing techniques. The results from different experiments performed shows that the classification accuracy of the NB classifier was improved and it also reduces the training time of the classifiers. The performance was measured with accuracy, F-measure and training time criteria. The results also demonstration that SVM and Boosting classifier gives better results for noisy data then other machine learning classifiers.

Key Words: Text mining, Sentiment Analysis, Natural Language processing.