Clustering Data National Examinations based on Social Media Using K-Means Methods

Chandra Eko Wahyudi Utomo; Mochamad Hariadi; Surya Sumpeno

doi:10.12962/j25796216.v4.i2.152

Authors

Chandra Eko Wahyudi Utomo Institut Teknologi Sepuluh Nopember; Pranata Komputer at University of Jember http://orcid.org/0000-0002-1806-3611
Mochamad Hariadi Institut Teknologi Sepuluh Nopember
Surya Sumpeno Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.12962/j25796216.v4.i2.152

Abstract

The development of social media as a source of data is now increasingly interesting to study. The social media studied in this research is Twitter. Twitter as one of the top-ranked social media among social media accessed by the people of Indonesia. People's behavior can be learned by collecting and processing data, one of which is people's sentiments or opinions about national examinations in Indonesia. Twitter user behavior in the form of their comments about the national exam in Indonesia. This study aims to analyze the public sentiments of social media users about the National Examination in Indonesia. Data is retrieved by crawling data via the Twitter API. The data needs to be preprocessed first and feature extracted using TF-IDF. However, because the text data on Twitter is unstructured and very diverse data (variety), the grouping stage must be done first. Grouping technique using K-Means Clustering on Spark. Spark clustering techniques are used to overcome the grouping of data on very large and complex amounts of data. From the clustering process using Spark it was found that the grouping process resulted in 3 clusters where elbow detection was found in the third cluster of the number of clusters between 2 and 50. The results of clustering in the form of 3 large groups were further processed (with classification techniques) to get a positive or negative sentiment comparison of social media user comments about the national exam. Furthermore, these results become recommendations and new knowledge about community behavior regarding Social Media-based National Exams.Keywords: clustering, K-Means, national exam, sentiment analysis, social media.

Author Biographies

Chandra Eko Wahyudi Utomo, Institut Teknologi Sepuluh Nopember; Pranata Komputer at University of Jember

Department of Electrical Engineering

Mochamad Hariadi, Institut Teknologi Sepuluh Nopember

Department of Computer Engineering; Department of Electrical Engineering

Surya Sumpeno, Institut Teknologi Sepuluh Nopember

Department of Computer Engineering; Department of Electrical Engineering

References

A. P. Jain, â€œApplication of Machine Learning Techniques to Sentiment Analysis,â€ pp. 628â€“632, 2016.

H. T. Gemilang, A. Erwin, and K. I. Eng, â€œIndonesian president candidates 2014 sentiment analysis by using Twitter data,â€ Proc. - 2014 Int. Conf. ICT Smart Soc. â€œSmart Syst. Platf. Dev. City Soc. GoeSmart 2014â€, ICISS 2014, pp. 101â€“104, 2014.

J. Messias et al., â€œAn evaluation of sentiment analysis for mobile devices,â€ Soc. Netw. Anal. Min., vol. 7, no. 1, p. 20, 2017.

Z. Jianqiang and G. U. I. Xiaolin, â€œComparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis,â€ vol. 5, 2017.

A. Krouska, C. Troussas, and M. Virvou, â€œThe effect of preprocessing techniques on Twitter sentiment analysis,â€ pp. 1â€“5, 2016.

R. Koordinasi, â€œPersiapan UN dan USBN,â€ 2016.

B. Pang, L. Lee, H. Rd, and S. Jose, â€œThumbs up ? Sentiment Classification using Machine Learning Techniques,â€ 1988.

J. Martineau and T. Finin, â€œDelta TFIDF : An Improved Feature Space for Sentiment Analysis,â€ no. May, 2009.

Pak, A. and Paroubek, P. (2010) Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Proceedings of the 7th International Conference on Language Resources and Evaluation, 1320-1326.

Glass K and Colbaugh R. Estimating the sentiment of social media content for security informatics applications. Security Informatics 2012; 1(3).

P. Bholowalia, A. Kumar, "EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN", 2014.

Spark, A.: Clustering - spark.mllib (2016). http://spark.apache.org/docs/latest/mllib-clustering.html. Accessed 05 November 2019.

Rahmat Heru Kurniawan, Real Time Opinion Mining of Social Media about Indonesian Government Policy , Tugas Akhir Sarjana Terapan Politeknik Elektronika Negeri Surabaya, Surabaya, 2017.

Clustering Data National Examinations based on Social Media Using K-Means Methods

Authors

DOI:

Abstract

Author Biographies

Chandra Eko Wahyudi Utomo, Institut Teknologi Sepuluh Nopember; Pranata Komputer at University of Jember

Mochamad Hariadi, Institut Teknologi Sepuluh Nopember

Surya Sumpeno, Institut Teknologi Sepuluh Nopember

References

Downloads

Published

Issue

Section

Make a Submission

Current Issue

Browse

Information

Language

Find us

Publisher

Visitors