Visualizing Realistic Benchmarked IDS Dataset: CIRA-CIC-DoHBrw-2020

dc.authoridSepelev, Vladimir/0000-0002-1143-2031
dc.authoridMOHD YUSOF, MOHAMMAD HAFIZ/0000-0002-9586-6485
dc.contributor.authorYusof, Mohammad Hafiz Mohd
dc.contributor.authorAlmohammedi, Akram A.
dc.contributor.authorShepelev, Vladimir
dc.contributor.authorAhmed, Osman
dc.date.accessioned2024-09-29T16:03:27Z
dc.date.available2024-09-29T16:03:27Z
dc.date.issued2022
dc.departmentKarabük Üniversitesien_US
dc.description.abstractIntrusion Detection System (IDS) dataset is crucial to detect lateral movement of cyber-attacks. IDS dataset will help to train the IDS classifier model to achieve earliest detection. A good near-realism public dataset is essential to assist the development of advanced IDS classifier models. However, the available public IDS dataset has long been under scrutiny for its practicality to reflect real low-footprint cyber threats, render real-time network scenario, reflect recent malware attack over newly developed DoH protocol, disregard layer 3 information and finally publish contradictory results of classification and analysis between various studies which makes it non-reproducible and without shareable results. This problem can be resolved by sophisticatedly visualizing a new realistic, real-time, low footprint and up-to-date benchmarked dataset. Visualization helps to detect data deformation before designing the optimized and highly accurate classifier model. Therefore, this study aims to review a new realistic benchmarked IDS dataset and apply sophisticated technique to visualize them. The review starts by carefully examining production network features. These are then compared with various well-established public IDS datasets. Many of them are static, unrealistic meta-features and disregard source and destination Internet Protocol (IP) information except CIRA-CIC-DoHBrw-2020 dataset. The study then applies Eigen Centrality (EC) technique from the graph theory to visualize this layer 3 (L3) information. Finally, using various visualization techniques such as Principal Component Analysis (PCA) and Gaussian Mixture Model (GMM), the study further analyzes and subsequently visualizes the data. Results show that the CIRA-CIC-DoHBrw-2020 simulated recent malware attack and has a very imbalanced dataset which reflects the realistic low-footprint cyber-attacks. The centrality graph clearly visualizes IPs that are compromised by recent DoH attack in real-time, and the study concludes decisively that smaller packet length of size 1000 to 2000 bytes is to fit an attack trait.en_US
dc.description.sponsorshipResearch Management Centre (RMC), Faculty of Computer and Mathematical Sciences (FSKM), Universiti Teknologi MARA (UiTM); [FRGS/1/2021/ICT07/UITM/02/3]en_US
dc.description.sponsorshipThis work was supported in part by the Research Management Centre (RMC), Faculty of Computer and Mathematical Sciences (FSKM), Universiti Teknologi MARA (UiTM); and in part by the Research Project under Grant FRGS/1/2021/ICT07/UITM/02/3.en_US
dc.identifier.doi10.1109/ACCESS.2022.3204690
dc.identifier.endpage94642en_US
dc.identifier.issn2169-3536
dc.identifier.scopus2-s2.0-85137882960en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.startpage94624en_US
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2022.3204690
dc.identifier.urihttps://hdl.handle.net/20.500.14619/6100
dc.identifier.volume10en_US
dc.identifier.wosWOS:000873921600001en_US
dc.identifier.wosqualityQ2en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherIeee-Inst Electrical Electronics Engineers Incen_US
dc.relation.ispartofIeee Accessen_US
dc.relation.publicationcategoryDiğeren_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectData visualizationen_US
dc.subjectReal-time systemsen_US
dc.subjectProtocolsen_US
dc.subjectPrincipal component analysisen_US
dc.subjectBenchmark testingen_US
dc.subjectIP networksen_US
dc.subjectIntrusion detectionen_US
dc.subjectMachine learningen_US
dc.subjectComputer securityen_US
dc.subjectCyberattacken_US
dc.subjectIntrusion detection system (IDS)en_US
dc.subjectIDS dataset reviewen_US
dc.subjectimbalanced dataseten_US
dc.subjectdata visualizationen_US
dc.subjectmachine learning in cybersecurityen_US
dc.titleVisualizing Realistic Benchmarked IDS Dataset: CIRA-CIC-DoHBrw-2020en_US
dc.typeReviewen_US

Dosyalar