Posts

Showing posts from October, 2020

Cross Validation and its Importance

Image
  In this post, we will take a look at Cross validation and it’s importance in building machine learning models. Overview: There are multiple approaches to split a dataset for training and estimating the accuracy of the model. One of the common approaches is to split the dataset into 3 sets, a training set, a validation set and a test set. While the training set is used for training the model, the validation set is used for choosing the best model from amongst many models. The test set is finally used to estimate the model performance on unseen data. Now, what if our split resulted in some important data points being grouped in the validation and test set. Our model would miss those important data points from being trained on and will result in a poorer model. To alleviate this, we use cross validation. There are multiple flavors of cross validation, viz., K-fold cross validation, Stratified cross validation and Leave one out cross validation (LOOC). K-Fold cross validation: