Question

How does max pooling help make AlexNet a great technology for image processing?

Answer
By Justin Stoltzfus | Last updated: April 26, 2018

How does max pooling help make AlexNet a great technology for image processing?

In AlexNet, an innovative convolutional neural network, the concept of max pooling is inserted into a complex model with multiple convolutional layers, partly in order to help with fitting and to streamline the work that the neural network does in working with images with what experts call a “non-linear downsampling strategy.”

AlexNet is widely regarded as a pretty great CNN, having won the 2012 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge), which is seen as a watershed event for machine learning and neural network progress (some call it the “Olympics” of computer vision).

In the framework of the network, where training is split into two GPUs, there are five convolutional layers, three fully connected layers and some max pooling implementation.

Essentially, max pooling takes the “pool” of outputs from a collection of neurons and applies them to a subsequent layer’s values. Another way to understand this is that a max pooling approach can consolidate and simplify values for the sake of fitting the model more appropriately.

Max pooling can help compute gradients. One could say that it “reduces the computation burden” or “shrinks overfitting” – through downsampling, max pooling engages what’s called “dimensionality reduction.”

Dimensionality reduction deals with the issue of having an overcomplicated model that is hard to run through a neural network. Imagine a complex shape, with many small jagged contours, and every little bit of this line represented by a data point. With dimensionality reduction, the engineers are helping the machine learning program to “zoom out” or sample fewer data points, to make the model as a whole simpler. That’s why if you look at a max pooling layer and its output, you can sometimes see a simpler pixelation corresponding to a dimensionality reduction strategy.

AlexNet also uses a function called rectified linear units (ReLU), and max pooling can be complementary to this technique in processing images through the CNN.

Experts and those involved in the project have delivered abundant visual models, equations and other details to show the specific build of AlexNet, but in a general sense, you can think about max pooling as coalescing or consolidating the output of multiple artificial neurons. This strategy is part of the overall build of the CNN, which has become synonymous with cutting-edge machine vision and image classification.

Share this Q&A

  • Facebook
  • LinkedIn
  • Twitter

Tags

Artificial Intelligence (AI) In the News Machine Learning

Written by Justin Stoltzfus | Contributor, Reviewer

Profile Picture of Justin Stoltzfus

Justin Stoltzfus is a freelance writer for various Web and print publications. His work has appeared in online magazines including Preservation Online, a project of the National Historic Trust, and many other venues.

More Q&As from our experts

Related Terms

Related Articles

Term of the Day

Multi-Factor Authentication

Multi-factor authentication (MFA) is a security mechanism in which individuals are authenticated through more than one...
Read Full Term

Tech moves fast! Stay ahead of the curve with Techopedia!

Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.

Resources
Go back to top