In AlexNet, an innovative convolutional neural network, the concept of max pooling is inserted into a complex model with multiple convolutional layers, partly in order to help with fitting and to streamline the work that the neural network does in working with images with what experts call a “non-linear downsampling strategy.”
AlexNet is widely regarded as a pretty great CNN, having won the 2012 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge), which is seen as a watershed event for machine learning and neural network progress (some call it the “Olympics” of computer vision).
In the framework of the network, where training is split into two GPUs, there are five convolutional layers, three fully connected layers and some max pooling implementation.
Essentially, max pooling takes the “pool” of outputs from a collection of neurons and applies them to a subsequent layer’s values. Another way to understand this is that a max pooling approach can consolidate and simplify values for the sake of fitting the model more appropriately.
Max pooling can help compute gradients. One could say that it “reduces the computation burden” or “shrinks overfitting” – through downsampling, max pooling engages what’s called “dimensionality reduction.”
Dimensionality reduction deals with the issue of having an overcomplicated model that is hard to run through a neural network. Imagine a complex shape, with many small jagged contours, and every little bit of this line represented by a data point. With dimensionality reduction, the engineers are helping the machine learning program to “zoom out” or sample fewer data points, to make the model as a whole simpler. That’s why if you look at a max pooling layer and its output, you can sometimes see a simpler pixelation corresponding to a dimensionality reduction strategy.
AlexNet also uses a function called rectified linear units (ReLU), and max pooling can be complementary to this technique in processing images through the CNN.
Experts and those involved in the project have delivered abundant visual models, equations and other details to show the specific build of AlexNet, but in a general sense, you can think about max pooling as coalescing or consolidating the output of multiple artificial neurons. This strategy is part of the overall build of the CNN, which has become synonymous with cutting-edge machine vision and image classification.