From "It Works" to "Why It Works": A Call for Deeper Understanding in Data Science
A Deep Dive in Using Convolutional Neural Networks

Sometimes, the most valuable lessons come from unexpected moments. I was attending a data science workshop recently, and a brief discussion served as a powerful reminder of a crucial question we must ask ourselves: are we content with knowing that something works, or do we strive to understand why it works? It's the difference between being a technician and an engineer, and it is crucial for building robust and reliable solutions.
This question feels more relevant than ever. It's never been easier to get amazing results in data science. We can build powerful models that were cutting-edge just a few years ago with only a few lines of code. But this ease of use brings a hidden risk. We're often tempted to treat these powerful tools like "black boxes," focusing only on the final accuracy score without really knowing what’s happening inside.
Don't Skip the "Why": The Soul of the CNN
Let's use a classic example, the Convolutional Neural Network (CNN). Too often, tutorials and talks jump straight into the architecture, talking about layers, filters, and code, but they skip the most important question of all: why do we even use them?
The reason we use CNNs for images instead of a standard neural network comes down to a couple of brilliant ideas:

Translation Invariance: A picture of a cat is still a picture of a cat, whether the cat is in the top left or the bottom right. A basic neural network would struggle with this, needing to learn what a "top-left cat" and a "bottom-right cat" are separately. This is incredibly inefficient. CNNs solve this by using sliding filters that spot features no matter where they are in the image.
Parameter Efficiency: By using these sliding filters, a CNN reuses the same weights across the entire image. This drastically cuts down on the number of parameters the model has to learn, which means it trains faster and is less likely to overfit.
Understanding this "why" isn't just for textbooks. It’s the very soul of the architecture. It helps you make better design choices and explain your work with real confidence.
The Anatomy of a Convolution: More Than Just Guesswork
This need for understanding goes all the way down to the basic building blocks. When we set up a convolutional layer, we have to pick its kernel size, padding, and stride. These are not just random numbers to guess. They are key design decisions that have a huge impact on what your model learns.
Let's quickly break them down:
Kernel Size: Think of the kernel as the network's magnifying glass.
A small kernel (like 3x3) is great for spotting fine details like sharp edges and textures. Most modern models use these to build up a complex picture from small pieces.
A large kernel (like 7x7) sees bigger patterns at once, like the general shape of an object. It’s less common now but can be useful for capturing broader strokes.
Padding: This means adding a border of pixels around the image.
Without padding, the image gets smaller with every layer, and information at the edges can get lost.
With padding, you can keep the image size the same. This lets you build deeper networks and makes sure the features at the borders are treated fairly.
Stride: This is the step size the kernel takes as it moves across the image.
A stride of 1 is very thorough, moving one pixel at a time. It captures the most information but is computationally slower.
A stride of 2 or more makes the kernel jump, shrinking the output size quickly. It’s a fast way to down-sample and helps the network see the bigger picture, but you lose some fine-grained detail.
Choosing these values is an act of engineering, not a lucky guess. You are actively deciding how your model sees the world.
The Case for Building from the Ground Up
So, how do we get this deeper knowledge? We can do this by fighting the urge to always use the fanciest, most automated tools first. This is why I'm a huge believer in trying to build models from a more fundamental level.
A framework like PyTorch is perfect for this. While it handles the heavy-lifting of calculus for you, it doesn’t hide everything. You still have to define your network layer by layer and write the training loop yourself, which includes the forward pass, calculating the loss, the backward pass, and updating the model.
Going through this process connects you directly to the mechanics. You see how the data changes shape as it flows through the network. You finally understand why certain steps are necessary. Your model stops being a magic box and becomes a logical system you created
Conclusion
At the end of the day, our job is to solve problems with tools that are reliable and that we can explain. That kind of work isn’t built on trial and error. It’s built on rigor, intention, and a real curiosity to learn.
So, the next time you start a project, I encourage you to ask "why." Why this model? Why this setting? The best models, and the best data scientists, are made when we step away from the easy abstractions and get our hands dirty with the fundamentals.



