Introduction may be the most frustrating issue of In this article, we’re going to see , , and most importantly . Overfitting Machine Learning. what it is how to spot it how to prevent it from happening What is overfitting? The word refers to a model that models the training data too well. Instead of learning the genral of the data, the model learns the for every data point. overfitting distribution expected output This is the same a to a maths quizz instead of . Because of this, the model cannot . Everything is all good as long as you are in , but as soon as you step outside, you’re lost. memorizing the answers knowing the formulas generalize familiar territory Looks like this little guy to do a multiplication. He only the answers to the questions he has already seen. doesn’t know how remembers The tricky part is that, at first glance, it that your model is performing well because it has a very on the data. However, as soon as you ask it to it will . may seem small error training predict new data points, fail How to detect overfitting As stated above, overfitting is characterized by the of the model . To test this ability, a simple method consists in splitting the dataset into two parts: the and the inability to generalize training set test set. When selecting models, you might want to split the dataset in three, I explain why here . The set represents about of the data, and is used to train the model (you don’t say?!). training 80% available The set consists of the remaining of the dataset, and is used to the of the model on data it has . test 20% test accuracy never seen before With this split we can check the performance of the model on to gain insight on the process is going, and spot when it happens. shows the different cases. each set how training overfitting This table Overfitting can be seen as the between the and error. difference training testing for this technique to work, you need to make sure both parts are of your data. A is to the order of the dataset before . Note: representative good practice shuffle splitting Overfitting can be pretty because it your just before them. Fortunately, there are a few tricks to it from happening. discouraging raises hopes brutally crushing prevent How to prevent overfitting - Model & Data , we can try to look at the of our system to find solutions. This means changing we are using, or which . First components data model Gather more data You model can only so much information. This means that the you feed it, the it is to . The reason is that, as you more , the model becomes to all the samples, and is to to make progress. store more training data less likely overfit add data unable overfit forced generalize Collecting more examples should be the in every data science task, as more data will result in an of the model, while reducing the chance of . first step increased accuracy overfitting The you get, the likely the model is to more data less overfit. Data augmentation & Noise Collecting more data is a and process. If you can’t do it, you should try to make your data as if it was . To do that, use so that each time a sample is processed by the model, it’s slightly different from the previous time. This will make it for the model to for each sample. tedious expensive appear more diverse data augmentation techniques harder learn parameters sees as of the original sample. Each iteration different variation Another good practice is to add noise: : This serves the same purpose as data augmentation, but will also work toward making the to it could encounter . To the input model robust natural perturbations in the wild : Again, this will make the training more diversified. To the output In both cases, you need to make sure that the is not too . Otherwise, you could end up respectively the information of the input in the noise_, or_ make the output Both will hinder the training process. Note: magnitude of the noise great drowning incorrect. Simplify the model If, even with all the data you now have, your model manages to overfit your training dataset, it may be that the model is . You could then try to of the model. still too powerful reduce the complexity As stated previously, a model can only overfit data. By progressively reducing its complexity — — you can make the model enough that it but enough to from your data. To do that, it’s convenient to look at the on depending on the model complexity. that much # of estimators in a random forest , # of parameters in a neural network etc. simple doesn’t overfit, complex learn error both datasets This also has the advantage of making the model , and . lighter train faster run faster On the left, the model is too simple. On the right it overfits. How to prevent overfitting - Training Process A possibility it to change the way the is done. This includes altering the or the way the model during training. second training loss function, functions Early Termination In most cases, the model by learning a correct distribution of the data, and, at some point, starts to overfit the data. By identifying the where this , you can the overfitting happens. As before, this is done by looking at the over time. starts moment shift occurs stop the learning process before training error When the starts to , it’s time to stop! testing error increase How to prevent overfitting — Regularization is a process of the of the model to . It can take many different forms, and we will see a couple of them. Regularization constraining learning reduce overfitting L1 and L2 regularization One of the most and well-known technique of regularization is to to the . The most common are called and : powerful add a penalty loss function L1 L2 The aims to minimize the of the weights L1 penalty absolute value The aims to minimize the of the weights. L2 penalty squared magnitude With the penalty, the model is forced to on its weights, as it can no longer make them . This makes the model , which helps combat overfitting. make compromises arbitrarily large more general The penalty has the added advantage that it enforces which means that it has a tendency to set to 0 the parameters. This helps identify the in a . The downside is that it is often as as the penalty. L1 feature selection , less useful most relevant features dataset not computationally efficient L2 Here is what the weight matrixes would look like. Note how the matrix is with many zeros, and the matrix has . L1 sparse L2 slightly smaller weights possibility is to during the training, which helps . Another add noise to the parameters generalization For Deep Learning: Dropout and Dropconnect This technique is specific to as it relies on the fact that process the information from one to the next. The idea is to randomly deactivate either ( ) or ( ) during the training. extremely effective Deep Learning, neural networks layer neurons dropout connections dropconnect This forces the network to become , as it can no longer on or to extract . Once the training is done, all neurons and connections are restored. It has been shown that this technique is to having an , which thus reducing overfitting. redundant rely specific neurons connections specific features somewhat equivalent approach ensemble favorises generalization, Conclusion As you know by now, overfitting is one of the main issues the has to face. It can be a to deal with if you don’t know how to it. With the techniques presented in this article, you should now be able to your models from the learning process, and get the you deserve! Data Scientist real pain stop prevent cheating results 🎉 You’ve reached the end! I hope you enjoyed this article. If you did, feel free to like it, share it, explain it to your cat, follow me on medium, or do whatever you feel like doing! 🎉 If you like Data Science and Artificial Intelligence, subscribe to the newsletter to receive updates on articles and much more!