none
Integrate SMOTE in Cross-Validation RRS feed

  • Question

  • Good afternoon,

    I'm trying to create a binary classification model in a very imbalanced dataset.

    In order to choose the best set of hyper-parameters and to avoid overfitting I would like to perform a 5-fold Cross-Validation with oversampling.

    The idea is during the cross-validation process, using SMOTE to oversample the folds used for training but the fold for testing should maintain  the original class distributions. Because Cross-Validation is an iterative process the folds for training and testing would change.

    I've already checked the solution presented here https://social.msdn.microsoft.com/Forums/en-US/88a5a1b5-4f43-4144-b686-2d37c64747b9/using-smote-with-partition-and-sample-amp-cross-validate-model?forum=MachineLearning , however this does not work for me because on the solution it's only used 2 folds and the author uses the option pick a fold, which on my case is not desirable because the folds for training and testing should change.

    Could someone help me?

    Thanks in advance,

    Kind regards,

    Miguel Simões Rosa

    Thursday, October 17, 2019 5:34 PM

All replies