A conceptual framework for mining and analysis of social media data

Abstract:

Data has turned to be an essential aspect of every individual, group, industry, financial system, business enterprise, commercial enterprise and society. In this era, where social networks yields inconceivable volumes of data for different purposes on daily bases, an improved approach to analysis of data is required to extract information that best matches user interest. One of the ways of achieving this is through data mining. Data mining could help any consumer or producer of information to make knowledge driven decisions. Data mining helps with the analysis of texts from different perceptions into valuable information that can be fine tuned into useful organisational solutions and help forecast that future trends. There has been some work that has done previously in the area of social media data analysis. Most of these works place more emphasis on the development of computational methods. Literature has argued that, while computational methods are good with regards to statistical and algorithmic analytics, they are limited in that they cannot capture in-depth meanings and semantics from data. Content from social media requires human interpretation and intervention for proper analysis. Therefore, the aim of this study was to develop a Conceptual Framework that offers a generic approach to addressing the limitations of computational methods to social media data analysis. In addition, the proposed Framework provides a guide on other aspects such as data gathering and text-pre-processing. The application of the proposed Framework was then demonstrated through classification of data. Even though only one application scenario was demonstrated through classification, the framework gives other options for applications such as regression, association and clustering. The proposed Conceptual Framework was evaluated in two stages using an example case of the political landscape of Botswana data collected from Facebook and Twitter platforms. Firstly, a user study was carried through the Inductive Content Analysis (ICA) process using the collected data by the use of focus groups. Additionally, a questionnaire was rolled out to evaluate the usability of ICA as perceived by the participants. This was mainly performed to help mitigate the limitations of computational methods of failure to capture in-depth meanings from data. Secondly, an experimental study was conducted where data which was pre-processes through ICA was classified through the use of data modelling, and an evaluation of data mining algorithms was made with metrics which measured their performance and accuracy. The evaluation of the ICA process was made based on “Usability” component of the ISO 9126 model. The aspects which were evaluated on Usability are: Learnability, Ease of use, Perceived usefulness, Satisfaction and Flexibility. Each evaluation metric had five questions. The total for the positive responses of strongly agree and agree for the five questions of each metrics are as follows: Learnability scored between 91.7% and 100%. Ease of use scored 95.8% and 100% with the exception of the one question which asked if the overall process took a lot of time. The majority mostly disagreed with 87.5%, Perceived ease of Use scored between 91.7% and 100%, Flexibility scored between 95.8% and 100% and Satisfaction scored between 95.8% and 100%. The scores for accuracy of the model performance in classifying the data correctly for Instance Bases learning with parameter K (IBK) classifier for K-Nearest Neighbour and Sequential Minimal Optimization (SMO) classifier for Support Vector Machine were both 94.6%, Naïve Bayes (NB) classifier for Naïve Bayes multinomial with 82.8% and J48 classifier for Decision tree C4.5 was with 78%. The results from the experimental study show that data mining algorithms had higher accuracy in classifying data when supplied with data from the ICA process. That is, through the application of the ICA process, data mining algorithms were able to overcome the difficulty of failure to capture in-depth meanings and semantics within data. Overall, the results of this study, including the Proposed Conceptual Framework are useful to scholars and practitioners who wish to do some researches on social media data mining analysis. The Framework serves as a guide to the mining and analysis of social media data in a systematic manner.