Eyes, but with Code? Intro to CNN’s

Aryan Saha
5 min readMay 17, 2021

Convolutional Neural Networks

Humans are great at visually identifying things. We can identify certain features with relative ease, and designate what something is. It’s not hard for us to tell a dog apart from a building, a horse from an airplane, and usually a cat from a dog. What if a computer could do that?

With the power of a new self-learning computational tool, Convolutional Neural Networks can recognize and classify different images. For example, let’s look at the CIFAR-10 Dataset.

To our own eyes, it’s obvious of what each image is. But for Computers, all they really see are just arrays of RGB pixels. Using a CNN, I was able to train a network to classify the images to a 75% success rate overall.

Of course, with a few errors:

The fact that these error still occur, is still insane on how it can happen. Let’s dive right in on how CNN’s work.

In typical neural network fashion, there’s usually a couple nodes that lead to each other, and then into an output. In convolutional neural networks, the process involves more of compressing the network into a longer, yet shorter and thinner piece.

cs231 Image

If we break down this diagram into the convolutional layers and pooling, we get this:

A big forward before we start going over these parts of a CNN, is to understand how computer’s see these images. Images are converted into arrays of numbers. Each pixel has a value based on the intensity of brightness and such. 0 is the darkest, with 255 as the highest value.

Mathanraj Sharma


Throughout each of these convolutional layers, different filters are applied to extract different pieces of information. These filters range from detecting intensity changes, edges from different sides, horizontal and vertical edges, and many more filters. These filters, along with more and more convolutional layers, will learn to detect higher and higher complex features.

Let’s take this image of bread for example:

Now let’s add a custom edge detection filter, a Sobel operator.

And we get this image as an output.

What the filter has done, is apply the matrix onto the image. As you can see for sober_y, there’s a clear line of 0’s that will diminish the value of those pixels. The above image ran sobel_y, and applied the matrix over the image.

We can also try to get output of different sides of edges, such as left, right, top, and bottom edges. Using the same bread image, and different filters, we get these 4 outputs:

We also have the associated filter code:

Filter 1: 
[[-1 -1 1 1]
[-1 -1 1 1]
[-1 -1 1 1]
[-1 -1 1 1]]
Filter 2:
[[ 1 1 -1 -1]
[ 1 1 -1 -1]
[ 1 1 -1 -1]
[ 1 1 -1 -1]]
Filter 3:
[[-1 -1 -1 -1]
[-1 -1 -1 -1]
[ 1 1 1 1]
[ 1 1 1 1]]
Filter 4:
[[ 1 1 1 1]
[ 1 1 1 1]
[-1 -1 -1 -1]
[-1 -1 -1 -1]]

These filters fit in for the Convolutional Layers, and allow for features to learn. Let’s check out how these matrices are actually applied to the images.

Let’s remember how images are read as an array of values. The filter that we have, is here in blue (a.k.a. the convolutional layer). This is where the magic works. The filter will slide around the input, and calculate the product of each cell. In other words, the filter convolutes the input.

After these Convolutional Layers, we’ll encounter pooling.

Pooling allows for the depth to be created, and for the feature finding filters to output a stronger output. Let’s recall the edge detection that we carried out before. This is the top edge detection:

Now, after pooling the layer, we get this:

After many of these Convolutional Layers, pooling and other activation functions will be used to created a new matrix, one that will be turned into a tensor and computed using a classic neural network, such as the one used in the MNIST data set project.

Thanks for reading! I’m Aryan Saha, a 15-year-old working to use technology to help solve climate change. If you would like to contact me, reach out at aryannsaha@gmail.com. If you would like to connect, visit my linkedin.