Resampling is primarily used in imbalanced datasets to modify the data distribution in the training dataset [2]. It is considered an effective way of obtaining a more balanced data distribution in imbalanced datasets [3]. But before starting resampling, it is essential to understand the need for resampling. ML classifiers get skewed towards classes that have more data. And, since the majority class(es) have more data, classifiers often give high accuracy, but this is because they are predicting only the majority class(es) and not really classifying the minority classes. Hence, to accurately identify network attacks, which are usually the minority classes, using an ML classifier, it is necessary to balance the classes, and therefore the need for resampling.

Types

Reference List

  1. Bagui, S. S., Mink, D., Bagui, S. C., & Subramaniam, S. (2023). Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data. Computers, 12(10), 204.
  2. Brownlee, J. Random Oversampling and Undersampling for Imbalanced Classification. 2021. Available online: https://machinelearningm astery.com/random-oversampling-and-undersampling-for-imbalanced-classification/ (accessed on 17 April 2023).
  3. Branco, P.S.; Torgo, L.; Ribeiro, R.A. A Survey of Predictive Modelling under Imbalanced Distributions. arXiv 2015. Available online: http://export.arxiv.org/pdf/1505.01658 (accessed on 17 April 2023).