Abstract
Imputing values to missing cases is a subject that is frequently met in the fields of Machine Learning and Data Mining, and that require the researchers to study it. It is known that many computer-based analysis algorithms operate under assumption that there is no missing case. The lack of sufficient search of missing case by the researchers is able to negatively affect the performance of analysis results. In this study, it was studied with a data set consisting of 52 variables in order to measure the performance of Corporate Sustainability of district municipalities in Istanbul. Little’s MCAR was applied on 17 variables containing missing case, and it was determined that missing cases were MCAR, namely completely at random. And then Clustering Analysis was applied on 35 variables not containing missing case, and missing case imputations were made based on the clusters formed. It was observed that the cluster labels of municipalities, whose clustering analysis results obtained by data set with 35 variables that didn’t contain missing case, and whose results obtained by the data set with 52 variables following imputation were the same, didn’t change. The lack of change of cluster labels of municipalities indicates that the data set formed following imputation doesn’t draw away from the main data, namely that the data structure doesn’t get disrupted. Consequently, it can be said that clustering analysis is effective in terms of imputing more representative values in the imputation of missing case.
Keywords: Cluster Analysis, K-Nearest Neighbor İmputation Methods, Little’s MCAR Test, Missing Value Analysis
Jel Classification: C46
Suggested citation
A Proposal Method for Missing Value Analysis: Cluster Analysis Approach. Alphanumeric Journal, 9(2), 299-310. http://dx.doi.org/10.17093/alphanumeric.970448
().References
2021.09.02.STAT.04
alphanumeric journal
Pages 299-310
Received: July 12, 2021
Accepted: Dec. 31, 2021
Published: Dec. 31, 2021
2021 Arcagök, U., Arıcıgil Çilan, Ç.
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence, which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
scan QR code to access this article from your mobile device
Faculty of Transportation and Logistics, Istanbul University
Beyazit Campus 34452 Fatih/Istanbul/TURKEY
Bahadır Fatih Yıldırım, Ph.D.
editor@alphanumericjournal.com
+ 90 (212) 440 00 00 - 13219
alphanumeric journal has been publishing as "International Peer-Reviewed Journal" every six months since 2013. alphanumeric serves as a vehicle for researchers and practitioners in the field of quantitative methods, and is enabling a process of sharing in all fields related to the operations research, statistics, econometrics and management informations systems in order to enhance the quality on a globe scale.