Clustering Data National Examinations based on Social Media Using K-Means Methods

Chandra Eko Wahyudi Utomo; Mochamad Hariadi; Surya Sumpeno

doi:10.12962/j25796216.v4.i2.152

PDF

Published: Oct 16, 2020

DOI: https://doi.org/10.12962/j25796216.v4.i2.152

Chandra Eko Wahyudi Utomo

Institut Teknologi Sepuluh Nopember; Pranata Komputer at University of Jember

http://orcid.org/0000-0002-1806-3611

Mochamad Hariadi

Institut Teknologi Sepuluh Nopember

Surya Sumpeno

Institut Teknologi Sepuluh Nopember

Abstract

The development of social media as a source of data is now increasingly interesting to study. The social media studied in this research is Twitter. Twitter as one of the top-ranked social media among social media accessed by the people of Indonesia. People's behavior can be learned by collecting and processing data, one of which is people's sentiments or opinions about national examinations in Indonesia. Twitter user behavior in the form of their comments about the national exam in Indonesia. This study aims to analyze the public sentiments of social media users about the National Examination in Indonesia. Data is retrieved by crawling data via the Twitter API. The data needs to be preprocessed first and feature extracted using TF-IDF. However, because the text data on Twitter is unstructured and very diverse data (variety), the grouping stage must be done first. Grouping technique using K-Means Clustering on Spark. Spark clustering techniques are used to overcome the grouping of data on very large and complex amounts of data. From the clustering process using Spark it was found that the grouping process resulted in 3 clusters where elbow detection was found in the third cluster of the number of clusters between 2 and 50. The results of clustering in the form of 3 large groups were further processed (with classification techniques) to get a positive or negative sentiment comparison of social media user comments about the national exam. Furthermore, these results become recommendations and new knowledge about community behavior regarding Social Media-based National Exams.

Keywords: clustering, K-Means, national exam, sentiment analysis, social media.

Issue

Vol. 4 No. 2 (2020): October

Section

Articles

Copyright

Submission of a manuscript implies that the submitted work has not been published before (except as part of a thesis or report, or abstract); that it is not under consideration for publication elsewhere; that its publication has been approved by all co-authors. If and when the manuscript is accepted for publication, the author(s) still hold the copyright and retain publishing rights without restrictions. Authors or others are allowed to multiply article as long as not for commercial purposes. For the new invention, authors are suggested to manage its patent before published. The license type is CC-BY-NC 4.0.

Disclaimer

No responsibility is assumed by publisher and co-publishers, nor by the editors for any injury and/or damage to persons or property as a result of any actual or alleged libelous statements, infringement of intellectual property or privacy rights, or products liability, whether resulting from negligence or otherwise, or from any use or operation of any ideas, instructions, procedures, products or methods contained in the material therein.

Author Biographies

Chandra Eko Wahyudi Utomo, Institut Teknologi Sepuluh Nopember; Pranata Komputer at University of Jember

Department of Electrical Engineering

Mochamad Hariadi, Institut Teknologi Sepuluh Nopember

Department of Computer Engineering; Department of Electrical Engineering

Surya Sumpeno, Institut Teknologi Sepuluh Nopember

Department of Computer Engineering^;Department of Electrical Engineering

References

A. P. Jain, “Application of Machine Learning Techniques to Sentiment Analysis,” pp. 628–632, 2016.

H. T. Gemilang, A. Erwin, and K. I. Eng, “Indonesian president candidates 2014 sentiment analysis by using Twitter data,” Proc. - 2014 Int. Conf. ICT Smart Soc. “Smart Syst. Platf. Dev. City Soc. GoeSmart 2014”, ICISS 2014, pp. 101–104, 2014.

J. Messias et al., “An evaluation of sentiment analysis for mobile devices,” Soc. Netw. Anal. Min., vol. 7, no. 1, p. 20, 2017.

Z. Jianqiang and G. U. I. Xiaolin, “Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis,” vol. 5, 2017.

A. Krouska, C. Troussas, and M. Virvou, “The effect of preprocessing techniques on Twitter sentiment analysis,” pp. 1–5, 2016.

R. Koordinasi, “Persiapan UN dan USBN,” 2016.

B. Pang, L. Lee, H. Rd, and S. Jose, “Thumbs up ? Sentiment Classification using Machine Learning Techniques,” 1988.

J. Martineau and T. Finin, “Delta TFIDF : An Improved Feature Space for Sentiment Analysis,” no. May, 2009.

Pak, A. and Paroubek, P. (2010) Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Proceedings of the 7th International Conference on Language Resources and Evaluation, 1320-1326.

Glass K and Colbaugh R. Estimating the sentiment of social media content for security informatics applications. Security Informatics 2012; 1(3).

P. Bholowalia, A. Kumar, "EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN", 2014.

Spark, A.: Clustering - spark.mllib (2016). http://spark.apache.org/docs/latest/mllib-clustering.html. Accessed 05 November 2019.

Rahmat Heru Kurniawan, Real Time Opinion Mining of Social Media about Indonesian Government Policy , Tugas Akhir Sarjana Terapan Politeknik Elektronika Negeri Surabaya, Surabaya, 2017.

Clustering Data National Examinations based on Social Media Using K-Means Methods

Abstract

Chandra Eko Wahyudi Utomo, Institut Teknologi Sepuluh Nopember; Pranata Komputer at University of Jember

Mochamad Hariadi, Institut Teknologi Sepuluh Nopember

Surya Sumpeno, Institut Teknologi Sepuluh Nopember

References

Find us

Publisher

Visitors

Article Sidebar

Main Article Content

Abstract

Article Details

Chandra Eko Wahyudi Utomo, Institut Teknologi Sepuluh Nopember; Pranata Komputer at University of Jember

Mochamad Hariadi, Institut Teknologi Sepuluh Nopember

Surya Sumpeno, Institut Teknologi Sepuluh Nopember

References