• ISSN: 2148-2225 (online)

Ulaştırma ve Lojistik Kongreleri

alphanumeric journal

The Journal of Operations Research, Statistics, Econometrics and Management Information Systems

Classification of News Texts by Categories Using Machine Learning Methods


Mehmet Kayakuş, Ph.D.

Fatma Yiğit Açıkgöz


Abstract

In parallel with the advances in technology, digital journalism is preferred more than printed journalism day by day. Due to the fast and up-to-date sense of journalism provided by digital journalism and its ubiquitous accessibility features, it is read more by users. In addition to these advantages provided by digital journalism, it also has some difficulties compared to printed journalism. The stage of preparation and delivery of the news to the user requires more technological knowledge and equipment compared to printed journalism. The processes of title selection, text creation, photo selection and determination of the appropriate news category in the preparation phase of the news are designed to be both faster and user-friendly compared to printed publishing. The news created to be presented to the target audience may belong to one or more of different categories such as economy, politics, sports, technology, and health. The inclusion of the news in the appropriate category provides convenience in terms of reaching the right audience and archiving the news correctly. In this study, news texts were classified according to their categories based on the machine learning methods. In the study, news of five newspapers in three different categories were used. Bayesian classifier and decision tree methods were used to classify the news in the dataset including a total of 10.500 news. In the results of the study, it was observed that the Bayesian classifier classified the news more successfully according to their categories.

Keywords: Category, Classification, Machine Learning, News

Jel Classification: C46


Suggested citation

Kayakuş, M. & Yiğit Açıkgöz, F. (). Classification of News Texts by Categories Using Machine Learning Methods. Alphanumeric Journal, 10(2), 155-166. https://doi.org/10.17093/alphanumeric.1149753

bibtex

References

  • Acı, Ç.İ., Çırak, A. 2019. “Türkçe Haber Metinlerinin Konvolüsyonel Sinir Ağları ve Word2Vec Kullanılarak Sınıflandırılması”, Bilişim Teknolojileri Dergisi, 12(3), 219–228.
  • Adak, M.F., Yurtay, N. 2013. "Gini Algoritmasını Kullanarak Karar Ağacı Oluşturmayı Sağlayan Bir Yazılımın Geliştirilmesi," Internatıonal Journal of Informatics Technologies, 6(3), 1-6.
  • Amasyalı, M.F., Yıldırım, T. 2004. “Otomatik haber metinleri sınıflandırma”, 13. Sinyal İşleme ve Uygulama Kurultayı, 224–226, Kuşadası, Türkiye.
  • Amasyalı, M.F., Beken, A. 2009. “Türkçe Kelimelerin Anlamsal Benzerliklerinin Ölçülmesi ve Metin Sınıflandırmada Kullanılması”, IEEE 17. Sinyal İşleme ve İletişim Uygulamaları Kurultayı, Antalya, Türkiye.
  • Amasyalı, M.F., Diri, B., Türkoğlu, F. 2006. “Farklı özellik vektörleri ile Türkçe dokümanların yazarlarının belirlenmesi”, 15th Turkish Symposium on Artificial Intelligence and Neural Network, Muğla, Türkiye.
  • Aşlıyan, R., Günel, K. 2010. “Metin İçerikli Türkçe Dokümanların Sınıflandırılması”, Akademik Bilişim Konferansı, 659–665, Muğla, Türkiye.
  • Aydoğan, D. 2013. Türkiye’de dijital gazetecilik: Habertürk ve Hürriyet gazeteleri örneği. Turkish Online Journal of Design Art and Communication, 3(3), 26-40.
  • Bardoel, J. (1996). Beyond journalism: A profession between ınformation society and civil society. European Journal of Communication, 11(3), 283-302.
  • Başkaya, F., Aydin, İ. 2017. “Haber metinlerinin farklı metin madenciliği yöntemleriyle sınıflandırılması”, International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey.
  • Çakır, H., 2007. “Geleneksel Gazetecilik Karşısında İnternet Gazeteciliği”. Erciyes Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 22(1), 123-149
  • Dayıbaşı, O. 2022. “Metin Madenciliği’nde Kavramlar 1”, medium.com, https://medium.com/algorithms-data-structures/metin-madencili%C4%9Finde-text-mining-kavramlar-1-e11b87b28847, Son erişim tarihi: 29 Nisan 2022
  • Doğan, S., Diri, B., 2010. “Türkçe dokümanlar için N-gram tabanlı yeni bir sınıflandırma (Ng-ind): yazar, tür ve cinsiyet”, Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 3(1), 11-19.
  • Levent, V.E., Diri, B. 2014. “Türkçe dokümanlarda yapay sinir ağları ile yazar tanıma”, 15. Akademik Bilişim Konferansı, 735–741, Mersin, Türkiye.
  • Toraman, C., Can, F., Koçberber, S. 2011. “Developing a Text Categorization Template for Turkish News Portals”, International Symposium on Inovations in Intelligent Systems and Applications, İstanbul, Turkey.
  • Tüfekci, P., Uzun, E., Sevinç, B. 2012. “Türkçe Dilbilgisi Özelliklerini Kullanarak Web Tabanlı Haber Metinlerinin Sınıflandırılması”, 21. IEEE Sinyal İşleme ve İletişim Uygulamaları Kurultayı, Girne, KKTC.
  • Uslu, Osman, Akyol, S. 2021. “Türkçe Haber Metinlerinin Makine Öğrenmesi Yöntemleri Kullanılarak Sınıflandırılması”, Eskişehir Türk Dünyası Uygulama ve Araştırma Merkezi Bilişim Dergisi, 2(1), 15-20.
  • Usmani S, Shamsi J.A. 2020. “News Headlines Categorization Scheme for Unlabelled Data”, International Conference on Emerging Trends in Smart Technologies (ICETST), Karachi, Pakistan.

Volume 10, Issue 2, 2022

2022.10.02.MIS.02

alphanumeric journal

Volume 10, Issue 2, 2022

Pages 155-166

Received: July 27, 2022

Accepted: Oct. 20, 2022

Published: Dec. 31, 2022

Full Text [446.7 KB]

2022 Kayakuş, M., Yiğit Açıkgöz, F.

This is an Open Access article, licensed under Creative Commons Attribution-NonCommercial 4.0 International License.

Creative Commons Attribution licence

scan QR code to access this article from your mobile device


Contact Us

Faculty of Transportation and Logistics, Istanbul University
Beyazit Campus 34452 Fatih/Istanbul/TURKEY

Bahadır Fatih Yıldırım, Ph.D.
editor@alphanumericjournal.com
+ 90 (212) 440 00 00 - 13219

alphanumeric journal

alphanumeric journal has been publishing as "International Peer-Reviewed Journal" every six months since 2013. alphanumeric serves as a vehicle for researchers and practitioners in the field of quantitative methods, and is enabling a process of sharing in all fields related to the operations research, statistics, econometrics and management informations systems in order to enhance the quality on a globe scale.