Part of:

Reinforcement Learning Vs. Deep Reinforcement Learning: What’s the Difference?

Why Trust Techopedia

Reinforcement learning and deep reinforcement learning have many similarities, but the differences are important to understand.

Machine learning algorithms can make life and work easier, freeing us from redundant tasks while working faster—and smarter—than entire teams of people. However, there are different types of machine learning. For example, there’s reinforcement learning and deep reinforcement learning.

“Even though reinforcement learning and deep reinforcement learning are both machine learning techniques which learn autonomously, there are some differences,” according to Dr. Kiho Lim, an assistant professor of computer science at William Paterson University in Wayne, New Jersey.

“Reinforcement learning is dynamically learning with a trial and error method to maximize the outcome, while deep reinforcement learning is learning from existing knowledge and applying it to a new data set.”

But what, exactly, does that mean? We went to the experts – and asked them to provide plenty of examples!

What is Reinforcement Learning?

As Lim says, reinforcement learning is the practice of learning by trial and error—and practice. According to Hunaid Hameed, a data scientist trainee at Data Science Dojo in Redmond, WA:

“In this discipline, a model learns in deployment by incrementally being rewarded for a correct prediction and penalized for incorrect predictions.”

Hameed gives the example: “Reinforcement learning is commonly seen in AI playing games and improving in playing the game over time.” (Read also: Reinforcement Learning Can Give a Nice Dynamic Spin to Marketing.)

The three essential components in reinforcement learning are an agent, action, and reward. “Reinforcement learning adheres to a specific methodology and determines the best means to obtain the best result,” according to Dr. Ankur Taly, head of data science at Fiddler Labs in Mountain View, CA.


“It’s very similar to the structure of how we play a video game, in which the character (agent) engages in a series of trials (actions) to obtain the highest score (reward).”

However, it’s an autonomous self-teaching system. Using the video game example, Taly says that positive rewards may come from increasing the score or points, and negative rewards may result from running into obstacles or making unfavorable moves.

Chris Nicholson, CEO of San Francisco, CA-based Skymind builds on the example of how algorithms learn by trial and error.” Imagine playing Super Mario Brothers for the first time, and trying to find out how to win: you explore the space, you duck, jump, hit a coin, land on a turtle, and then you see what happens.”

By learning the good actions and the bad actions, the game teaches you how to behave. “Reinforcement learning does that in any situation: video games, board games, simulations of real-world use cases.” In fact, Nicholson says his organization uses reinforcement learning and simulations to help companies figure out the best decision path through a complex situation.

In reinforcement learning, an agent makes several smaller decisions to achieve a larger goal. Yet another example is teaching a robot to walk. “Instead of hard-coding directions to lift one foot, bend the knee, put it down, and so on, a reinforcement learning approach might have the robot experiment with different sequences of movements and find out which combinations are the most successful at making it move forward,” says Stephen Bailey, data scientist and analytics tool expert at Immuta in College Park, MD.

Aside from video games and robotics, there are other examples that can help explain how reinforcement learning works. Brandon Haynie, chief data scientist at Babel Street in Washington, DC, compares it to a human learning to ride a bicycle. “If you’re stationary and lift your feet without pedaling, a fall – or penalty – is imminent.”

However, if you start to pedal, then you will remain on the bike – reward – and progress to the next state. Haynie says:

“Reinforcement learning has applications spanning several sectors, including financial decisions, chemistry, manufacturing, and of course, robotics.”

What is Deep Reinforcement Learning?

However, it’s possible for the decisions to become too complex for the reinforced learning approach. Haynie says it can be overwhelming for the algorithm to learn from all states and determine the reward path. “This is where deep reinforcement learning can assist: the ‘deep’ portion refers to the application of a neural network to estimate the states instead of having to map every solution, creating a more manageable solution space in the decision process.”

It’s not a new concept. Haynie says it has existed since the 1970s. “But with the advent of cheap and powerful computing, the additional advantages of neural networks can now assist with tackling areas to reduce the complexity of a solution,” he explains. (Read What is the difference between artificial intelligence and neural networks?)

So, how does this work? According to Peter MacKenzie, AI team lead, Americas at Teradata, it’s too much information to store in tables, and tabular methods would require the agent to visit every state and action combination.

However, deep reinforcement learning replaces tabular methods of estimating state values with function approximation. MacKenzie goes on to say:

“Function approximation not only eliminates the need to store all state and value pairs in a table, it enables the agent to generalize the value of states it has never seen before, or has partial information about, by using the values of similar states.” Much of the exciting advancements in deep reinforcement learning have come about because of the strong ability of neural networks to generalize across enormous state spaces.”

And MacKenzie notes that deep reinforcement learning has been used in programs that have beat some of the best human competitors in such games as Chess and Go, and are also responsible for many of the advancements in robotics. (Read 7 Women Leaders in AI, Machine Learning and Robotics.)

Bailey agrees and adds, “Earlier this year, an AI agent named AlphaStar beat the world's best StarCraft II player – and this is particularly interesting because unlike games like Chess and Go, players in StarCraft don't know what their opponent is doing.” Instead, he says they had to make an initial strategy then adapt as they found out what their opponent was planning.

But how is that even possible? If a model has a neural network of more than five layers, Hameed says it has the ability to cater to high dimensional data. “Due to this, the model can learn to identify patterns on its own without having a human engineer curate and select the variables which should be input into the model to learn,” he explains.

In open-ended scenarios, you can really see the beauty of deep reinforcement learning. Taly uses the example of booking a table at a restaurant or placing an order for an item—situations in which the agent has to respond to any input from the other end.

“Deep reinforcement learning may be used to train a conversational agent directly from the text or audio signal from the other end,” he says. “When using an audio signal, the agent may also learn to pick up on subtle cues in the audio such as pauses, intonation, et cetera—this is the power of deep reinforcement learning.”

And new applications of deep reinforcement learning continue to emerge. In determining the next best action to engage with a customer, MacKenzie says “the state and actions could include all the combinations of products, offers and messaging across all the different channels, with each message being personalized—wording, images, colors, fonts.”

Another example is supply chain optimization, for example, delivering perishable products across the U.S. “The possible states include the current location of all the different types of transportation, the inventory in all the plants, warehouses and retail outlets, and the demand forecast for all the stores,” MacKenzie says.

“Using deep learning to represent the state and action space enables the agent to make better logistic decisions that result in more timely shipments at a lower cost.”


Related Reading

Related Terms

Terri Williams
Terri Williams

Terri is a freelance journalist who also writes for The Economist, Time, Women 2.0, and the American Bar Association Journal. In addition, she has bylines at USA Today, Yahoo, U.S. News & World Report, Verizon, The Houston Chronicle, and several other companies you've probably heard of. Terri has a B.A. in English from the University of Alabama at Birmingham.