Question

Why are artificial recurrent neural networks often hard to train?

Answer
By Justin Stoltzfus | Last updated: June 27, 2018

Why are artificial recurrent neural networks often hard to train?

The difficulty of training artificial recurrent neural networks has to do with their complexity.

One of the simplest ways to explain why recurrent neural networks are hard to train is that they are not feedforward neural networks.

In feedforward neural networks, signals only move one way. The signal moves from an input layer to various hidden layers, and forward, to the output layer of a system.

By contrast, recurrent neural networks and other different types of neural networks have more complex signal movements. Classed as “feedback” networks, recurrent neural networks can have signals traveling both forward and back, and may contain various “loops” in the network where numbers or values are fed back into the network. Experts associate this with the aspect of recurrent neural networks that's associated with their memory.

In addition, there's another type of complexity affecting recurrent neural networks. One excellent example of this is in the field of natural language processing.

In sophisticated natural language processing, the neural network needs to be able to remember things. It needs to take inputs in context, too. Suppose there is a program that wants to analyze or predict a word within a sentence of other words. There may be, for example, a fixed length of five words for the system to evaluate. That means the neural network has to have inputs for each of these words, along with the ability to “remember” or train on the context of these words. For those and other similar reasons, recurrent neural networks typically have these little hidden loops and feedbacks in the system.

Experts lament that these complications make it difficult to train the networks. One of the most common ways to explain this is by citing the exploding and vanishing gradient problem. Essentially, the weights of the network will either lead to exploding or vanishing values with a large number of passes.

Neural network pioneer Geoff Hinton explains this phenomenon on the web by saying that backward linear passes will cause smaller weights to shrink exponentially and larger weights to explode.

This problem, he continues, gets worse with long sequences and more numerous time steps, in which the signals grow or decay. Weight initialization may help, but those challenges are built into the recurrent neural network model. There's always going to be that issue attached to their particular design and build. Essentially, some of the more complex types of neural networks really defy our ability to easily manage them. We can create a practically infinite amount of complexity, but we often see predictability and scalability challenges grow.

Share this Q&A

  • Facebook
  • LinkedIn
  • Twitter

Tags

Artificial Intelligence (AI) Technology Trends Machine Learning

Written by Justin Stoltzfus | Contributor, Reviewer

Profile Picture of Justin Stoltzfus

Justin Stoltzfus is a freelance writer for various Web and print publications. His work has appeared in online magazines including Preservation Online, a project of the National Historic Trust, and many other venues.

More Q&As from our experts

Related Terms

Related Articles

Term of the Day

Wireless Bridge

A wireless bridge is a type of networking device that enables an over-the-air connection between two different segments of a...
Read Full Term

Tech moves fast! Stay ahead of the curve with Techopedia!

Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.

Resources
Go back to top