Clustering Data National Examinations based on Social Media Using K-Means Methods

Chandra Eko Wahyudi Utomo, Mochamad Hariadi, Surya Sumpeno


The development of social media as a source of data is now increasingly interesting to study. The social media studied in this research is Twitter. Twitter as one of the top-ranked social media among social media accessed by the people of Indonesia. People's behavior can be learned by collecting and processing data, one of which is people's sentiments or opinions about national examinations in Indonesia. Twitter user behavior in the form of their comments about the national exam in Indonesia. This study aims to analyze the public sentiments of social media users about the National Examination in Indonesia. Data is retrieved by crawling data via the Twitter API. The data needs to be preprocessed first and feature extracted using TF-IDF. However, because the text data on Twitter is unstructured and very diverse data (variety), the grouping stage must be done first. Grouping technique using K-Means Clustering on Spark. Spark clustering techniques are used to overcome the grouping of data on very large and complex amounts of data. From the clustering process using Spark it was found that the grouping process resulted in 3 clusters where elbow detection was found in the third cluster of the number of clusters between 2 and 50. The results of clustering in the form of 3 large groups were further processed (with classification techniques) to get a positive or negative sentiment comparison of social media user comments about the national exam. Furthermore, these results become recommendations and new knowledge about community behavior regarding Social Media-based National Exams.

Keywords: clustering, K-Means, national exam, sentiment analysis, social media.

Full Text:



A. P. Jain, “Application of Machine Learning Techniques to Sentiment Analysis,” pp. 628–632, 2016.

H. T. Gemilang, A. Erwin, and K. I. Eng, “Indonesian president candidates 2014 sentiment analysis by using Twitter data,” Proc. - 2014 Int. Conf. ICT Smart Soc. “Smart Syst. Platf. Dev. City Soc. GoeSmart 2014”, ICISS 2014, pp. 101–104, 2014.

J. Messias et al., “An evaluation of sentiment analysis for mobile devices,” Soc. Netw. Anal. Min., vol. 7, no. 1, p. 20, 2017.

Z. Jianqiang and G. U. I. Xiaolin, “Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis,” vol. 5, 2017.

A. Krouska, C. Troussas, and M. Virvou, “The effect of preprocessing techniques on Twitter sentiment analysis,” pp. 1–5, 2016.

R. Koordinasi, “Persiapan UN dan USBN,” 2016.

B. Pang, L. Lee, H. Rd, and S. Jose, “Thumbs up ? Sentiment Classification using Machine Learning Techniques,” 1988.

J. Martineau and T. Finin, “Delta TFIDF : An Improved Feature Space for Sentiment Analysis,” no. May, 2009.

Pak, A. and Paroubek, P. (2010) Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Proceedings of the 7th International Conference on Language Resources and Evaluation, 1320-1326.

Glass K and Colbaugh R. Estimating the sentiment of social media content for security informatics applications. Security Informatics 2012; 1(3).

P. Bholowalia, A. Kumar, "EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN", 2014.

Spark, A.: Clustering - spark.mllib (2016). Accessed 05 November 2019.

Rahmat Heru Kurniawan, Real Time Opinion Mining of Social Media about Indonesian Government Policy , Tugas Akhir Sarjana Terapan Politeknik Elektronika Negeri Surabaya, Surabaya, 2017.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.