Do it right, buddy! — Fit is everything!
The toy challenge
The robots are back with another learning session. Again two scenarios to be explored! But for these, we look at a game where we have a set of toys with different colors and they get shown to a player and they need to identify toys with a described property, for example, an orange-colored toy with wheels, and select all toys colored orange and have wheels. Phew! Pretty simple, isn’t it? We’ll see!
Scenario #1: The tired robot and toys
Let us assume the learner robot L… Well, I was told I need to name them better, so here goes nothing I’m going to call L, Jerry, and the teacher T, Tom, from this day (as a tribute to my favorite character and his companion, the mouse. Or, shall I call them by their original names, Jasper and Jinx?). So, Jerry seems to be exhausted from all the learning but Tom is adamant that Jerry must learn the color game before rest. Well, robots don’t get tired, do they? Computers do get exhausted! Memory, power, processing capabilities, and sensors all could get tired (or used up or fail), tiring a computing system and the brain of a robot is a computer, isn’t it?
Coming back to the training, the tired and “sleepy” Jerry doesn’t pay much attention when Tom is trying to teach the game. Tom takes different toys and shows them to Jerry and asks him to identify the toy as a part of the training, but no matter the shape, color, or how they look, Jerry responds, “It’s a toy!” That is, he doesn’t try to learn the differences between them. How would this end? This ends with the learner unable to distinguish the features of the toy and distinguish them. You can’t complain if Tom picks up a green toy dinosaur he asks if it is “an orange toy with wheels” and hears “yes” because for every sample toy, he only learned “it’s a toy” and Tom never corrected him. Jerry only learned one category and used this for all samples of toys he learned.
As there was no effort to learn the differences he learned a statement that covers all of the samples and irrespective of what’s shown, he correctly (or incorrectly?) classifies them as toys or not-toys only. That means for toys that do not look like the samples he saw, he is unable to classify them as toys even if they are (or the other way around, in other cases). This is UNDERFITTING. To project this to machine learning, underfitting refers to a model’s inability to capture the underlying patterns in the training data, resulting in poor performance due to oversimplification and failure to generalize to new, unseen data.
Scenario #2: Jerry’s so good at this! Or, is he?
Now assume Jerry is still in training and becomes a great detective. Let us say he sees a lot of red dinosaurs, blue tractors, and green army tanks. He gets so good at these examples and during the test given by Tom, he sees a yellow car replica toy and says “It’s a blue tractor!”, so confidently. He is so stuck on what he already knows and does not understand new things well. This happens with every case post-training where he correctly categorizes everything he already is familiar with but confidently fails at items he never saw. The fun thing is for every sample he learned, he correctly re-identifies them. This is OVERFITTING. Projecting this to machine learning, it refers to a model learning the training data too closely, capturing noise or random fluctuations, and thus performing poorly on new, unseen data due to an overly complex representation of the training set.
And the fix is…
How to fix underfitting?
Jerry fails to grasp the nuances and differences among various toys, generalizing them all as simply “toys.” This lack of attention to detail and failure to learn distinctions between different toy categories mirrors the underfitting problem in machine learning, where a model oversimplifies and fails to capture the complexity of the data it is trained on. Now, Jerry needs to fix this by being more attentive and curious. One good way is to show him a lot of different toys, varying in shapes, features, and colors, enabling him to focus and learn the unique features of each toy.
Another additional activity to do is to have a quiz by showing a subset of the toys selected at random that weren’t used in the training, evaluating the responses, and re-training till the error reduces as much as possible. This is how most machine learning algorithms get involved. It is important to ensure the samples used to test are not used in training. We will see this problem soon. :) The more Jerry pays attention, the better he will be at recognizing toys and hence reducing the underfitting!
How to fix overfitting?
To fix overfitting we need to teach Jerry to be a bit more flexible. What’s the best way to do this? Let’s say we ask Jerry to give more attention to details and see a variety of shapes, sizes, and memorable features of each shape and toy type. We need to encourage him to learn the general idea of what makes each shape and not just memorize specific examples of the toys. This way it learns what the shape of a tractor is, what a car looks like, and so on, compared to learning what looks like a tractor and is colored blue is a blue tractor and that’s a category. This way we ensure good classifications and avoid highly specific categories. So, now we show him a yellow car and he correctly classifies it.
So, by providing more variety to learn from and encouraging to learn general ideas over highly specific examples, we can remove overfitting. Now, Jerry, when shown a new shape, can make a good guess based on what he has learned about toys in general.
How do these connect?
If you don’t know what Bias and Variance are, I tried explaining them with an example in my previous article: Learning Throw-and-Catch — The (Re-)Awakening.
Underfitting and Bias:
Underfitting happens when a model is too “straightforward”, missing out on the subtle patterns in the data. Models that underfit often lean towards high bias, making overly simple assumptions and facing difficulty in grasping the intricate nature of the true relationships within the data.
Overfitting and Variance:
Overfitting comes into play when a model becomes “overly intricate”, adhering too closely to the details of the training data and picking up on noise and fluctuations. Models that overfit tend to exhibit high variance, being overly responsive to the specific quirks and fluctuations present in the training data, which can hinder their ability to generalize effectively to new, unseen data.
The balanced cradle
There’s a delicate balance between bias and variance. When we enhance machine learning model complexity, bias tends to decrease but variance goes up, and vice versa. The real challenge lies in finding that sweet spot, the optimal balance that minimizes both bias and variance. Striking this balance is crucial as it paves the way for solid generalization performance on new, unseen data.
Underfit models (high bias, low variance) and overfit models (low bias, high variance) represent the extremes, and the optimal model complexity lies in between.
So, the key is to find the right balance — make sure Jerry pays enough attention to learn well (avoiding underfitting) but not so much that he gets too stuck on specific examples (avoiding overfitting). It’s like finding the sweet spot for learning!