Oversampling addresses the minority classes in the dataset. It either creates duplicates of the existing minority class samples, as in the case of Random Oversampling (RO) or generates new synthetic points in the feature space, as is the case of the SMOTE family of methods. Oversampling of the minority class is done to scale up to the count of the majority class. That is, 10% oversampling means the existing minority class count is increased to match ten percent of the majority class.

In this work, resampling methods were not applied to the whole dataset but were limited to the training dataset. For ML models, it is imperative to monitor the class distribution in the training set to observe the patterns of the real population. Stratified sampling was used to ensure that, after the split, each class was represented in the training as well as testing data.

Reference List

  1. Bagui, S. S., Mink, D., Bagui, S. C., & Subramaniam, S. (2023). Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data. Computers, 12(10), 204.