Cross Validation
Here we discuss methods that will get you through the parameters that you have to take a guess at, and whose value no one will tell you.
K-Fold Cross Validation:
K-Fold is popular and easy to understand, it generally results in a less biased model compare to other methods. Because it ensures that every observation from the original dataset has the chance of appearing in training and test set. This is one of the best approaches if we have limited input data. This method follows the below steps.
This method follows the below steps.
- Randomly split your entire dataset into k” folds”
- For each k-fold in your dataset, build your model on k — 1 folds of the dataset. Then, test the model to check the effectiveness for kth fold
- Record the error you see on each of the predictions
- Repeat this until each of the k-folds has served as the test set
- The average of your k recorded errors is called the cross-validation error and will serve as your performance metric for the model
Repeat this process until every K-fold serves as the test set. Then take the average of your recorded scores. That will be the performance metric for the model.
This will take more time but, in this method, we used every part of the data to training and every part of the data for cross-validation.
Resources: