A NEW MACHINE LEARNING CLASSIFICATION ALGORITHM  FOR PHISHING URLS DETECTION

ALARBI, ABDALRAOUF ALMAHDI MOHAMMED

DSpace Home
→
LİSANSÜSTÜ EĞİTİM ENSTİTÜSÜ
→
Lisansüstü Eğitim Enstitüsü Doktora Tezleri
→
View Item

dc.contributor.author	ALARBI, ABDALRAOUF ALMAHDI MOHAMMED
dc.date.accessioned	2023-07-18T07:49:29Z
dc.date.available	2023-07-18T07:49:29Z
dc.date.issued	2023-06
dc.identifier.uri	http://acikerisim.karabuk.edu.tr:8080/xmlui/handle/123456789/2765
dc.description.abstract	ABSTRACT In today's era of ever-increasing online dangers, the identification of phishing URLs has become a critical task to ensure user safety and protect sensitive information. With the rise in sophisticated cyberattacks, hackers have become adept at creating deceptive websites that mimic legitimate ones, making it challenging for users to distinguish between genuine and fraudulent URLs. This has led to an urgent need for robust and advanced techniques to detect and mitigate the risks associated with phishing attacks. By employing advanced algorithms and machine learning models, cybersecurity experts are continuously working towards enhancing the accuracy and efficiency of phishing URL detection systems, empowering users to make informed decisions while navigating the vast digital landscape. The study in this thesis consists of three stages. In the first stage, we propose a new classification algorithm called the Core Classification Algorithm (CCA), which is derived from the K-nearest neighbor algorithm (KNN) and hybridized with the unsupervised algorithm K-means. The primary objective is to find similarities while overcoming the challenge of excluding non-representative cores from the clusters. The hybridization process aims to leverage the synergies created by combining two different algorithms, iteratively modifying outcomes to achieve optimal solutions. This strategy improves the efficiency and accuracy of classifying data into two or more clusters based on their labels. In the second stage, we introduce the Improved Core Classification Algorithm (ICCA), an adaptation of the algorithm used in the previous section. Instead of relying on a single core point, we employ active sets. Compared to the utilization of various other available algorithms, this approach yields more accurate results. Finally, we analyzed phishing URLs using a comprehensive dataset consisting of 549,346 entries. Among these entries, 392,897 URLs were identified as phishing attempts, while 114,299 URLs were classified as legitimate. We conducted several preprocessing steps, including data cleaning, feature engineering, and feature selection, to enhance the overall quality of our analysis. These processes provided us with in-depth insights into the data and allowed us to extract critical features. Subsequently, we evaluated our algorithms, and the findings demonstrated encouraging prediction accuracy. ÖZET Günümüzde çevrimiçi tehlikelerin sürekli artmasıyla, kimlik avı URL'lerini belirlemek, kullanıcıların güvenliğini sağlamak ve hassas bilgileri korumak gittikçe daha önemli bir görev haline gelmektedir. Bu tezde bu problemlere çözüm olarak, Çekirdek Sınıflandırma Algoritması (CCA) adını verdiğimiz yeni bir sınıflandırma algoritması önerilmiştir. Bu algoritma, K-means algoritması ile hibritlenerek türetilmiştir. Hibritleştirme sürecinin amacı, mümkün olan en iyi çözümlere ulaşmak için sonuçları yinelemeli olarak değiştirerek iki farklı algoritma birleştirilmiştir. Bu strateji, verilerin bu kümelere sınıflandırılma doğruluğunu artırmanın yanı sıra, etiketleri temel alarak iki veya daha fazla kümeye ayırarak sınıflandırma verimliliğinin arttırılması sağlanmıştır. Tezin sonraki bölümünde, bir önceki bölümde kullanılan algoritmanın bir uyarlaması olan Enhanced Core Classification Algorithm (ICCA) sunulmuştur. Bu yinelemede tek bir çekirdek noktaya güvenmek yerine, bunun yerine Aktif kümeler kullanılmıştır. Literatürdeki diğer çeşitli algoritmalar ile karşılaştırıldığında, bu yöntemin sonuçlarının literatürdeki diğer algoritmalardan daha iyi sonuçlar verdiği görülmüştür. Tezin son bölümünde, içinde 549.346 giriş bulunan kapsamlı bir veri kümesini kullanarak kimlik avı URL'leri hakkında bir analiz yapmıştık. Bu girişler arasında phishing girişimi olduğu tespit edilen 392.897 URL ve yasal kabul edilen 114.299 URL vardı. Analizimizin genel kalitesini iyileştirebilmek için veri temizleme, özellik mühendisliği ve keşif veri analizi (EDA olarak da bilinir) gibi bir dizi ön işleme adımı gerçekleştirdik. Bu süreçler sayesinde, verilere ilişkin daha derinlemesine içgörüler elde edebildik ve kritik öneme sahip özellikleri ayıklayabildik. Ardından algoritmalarımızın analizini yaptık ve elde ettiğimiz bulgular tahminlerinin doğruluğu açısından cesaret vericiydi.	en_EN
dc.language.iso	en	en_EN
dc.subject	Classification; Phishing attacks; K-means; Hybridization; Core point; Active set; Clustering.	en_EN
dc.subject	Sınıflandırma; Kimlik avı saldırıları; K-anlamı; Hibridizasyon; Çekirdek nokta; Aktif küme; Kümeleme.	en_EN
dc.title	A NEW MACHINE LEARNING CLASSIFICATION ALGORITHM FOR PHISHING URLS DETECTION	en_EN
dc.title.alternative	KİMLİK AVI URL TESPİT İÇİN YENİ BİR MAKİNE ÖĞRENİMİ SINIFLANDIRMA ALGORİTMASI TASARIMI	en_EN
dc.type	Thesis	en_EN