Modern-day data-driven applications are largely dependent on relevant insights derived from the enormous volumes of data they handle every day. To gain better insights every time, the applications need to be able to send complex queries and the database should be able to address complex queries. Traditional RDBMS systems that rely on SQL are unable to handle extremely complex queries. Graph databases have been able to solve this problem because they rely on objects and the relationships between objects. Based on this premise, it is possible to extract deep insights. The use of graph databases, however, is still limited, although there are definite signs that it is going to play an important role as businesses rely more and more on insights to power their business. (For more on databases in general, see Introduction to Databases.)
What Is a Graph Database?
To understand graph databases, let us use the example below:
Bill and his family want to plan a vacation to a place that offers great Asian cuisine. He has started planning early and one of the ways to find information is, of course, Google. While the information from Google is credible and good, for Bill, it is important to get as specific information as possible. So, he starts asking his friends, acquaintances and colleagues. Let us assume that Bill asks Ryan, Sheena and John, who are his primary contacts (contact level 1). All three promise to respond with information as soon as possible. Ryan asks his friend Greg, who asks his cousin Martin who has been to Bangkok a few times. Martin recommends the names and details of all his favorite eateries in Bangkok known for their Asian dishes. This information is relayed back to Bill.
You have just seen a real-life example of a complex query based on objects and relationships. The graph database works on the same principle. It is about the network, the objects and their relationships in the network.
Basically, a graph database is capable of extremely complex graphs and provides insights which SQL-query-based RDBMS systems cannot. And that is the unique selling point about graph databases.
How Does a Graph Database Work?
The above description of a graph database gives some idea about the principles that a graph database applies when it searches for information or insights. Basically, it traverses the network of objects and relationships based on the query, and returns the results.
If we take the above example of Bill, then how would a graph database go about its job? Obviously, there are a lot of relationships and nodes in the example. If we see the distance of the relationships, it would look like the following:
Bill = 0 (the origin)
Ryan = 1
Sheena = 1
John = 1
Greg = 2
Martin = 3
The distance between the origin (zero) and the node that provides the information could be even further in real life — that is how the network works.
Imagine an application sending a query based on Bill’s requirement. It would be something like:
Find all friends who are connected with five friends who like Asian food, who have visited Thailand and who live within 5 miles of Dallas.
There are a lot of graph databases available in the market, and Neo4j is the most popular among them. Neo4j can attribute its popularity to the facts that it is both efficient and open source. So, when you send a query to the Neo4j to solve Bill’s problem, the query could look something like:
// select friends and friends of friends, keyword of Asian food, keyword of Bangkok, order by depth of the relationship
String findFriendsQuery = "start n=node(*), person=node({userNode}) MATCH p = (person)-[:FRIEND*1..2]-(friend) return distinct p order by length(p)";
Based on the query, Neo4j is going to search through its available network and find the closest matches.
Difference Between Graph Databases and Relational Databases
The main point around which relational databases and graph databases are compared is the speed of transactions, that is, how fast can it process a complex query on a big data set.
Emil Eifrem, the CEO of Neo Technology, the company behind Neo4j, measured the performance of both relational and graph databases on multiple parameters. The query was: in 1,000 users with each user having 50 friends or more, find out if one user is connected to another in 4 or fewer hops. The results are given below:
- A popular open-source relational database took 200 ms to process the query while a graph database took 2 ms.
- When the same query was run on a user base of 1,000,000 users, the graph database took 2 ms while the relational database had to be aborted after a few days of never-ending processing.
The main reason the relational database was taking such a long time to process queries was that it was searching the data for every term provided in the query. No wonder that it was taking such a long time! On a bigger database, it would take even longer. The graph database, on the other hand, would only look at records that are directly connected to the records in the database. If the graph database is allowed a specific number of hops, then it would stick to that precisely. That was the reason a graph database was able to process complex queries on huge data sets relatively easily and achieve faster results. (To learn more about working with databases, see Database Administration Careers 101.)
Graph Database Case Studies
There have been many successful applications of graph databases in different industries. Big companies have led the way in building their world-class products with the graph database principles. Initially it was thought that since it was about nodes and relationships, certain industries like social media could benefit from this. However, other sectors such as online dating, manufacturing and online job portals have also benefited from it. Given below are a few examples:
- Facebook has successfully put to use a graph database in building up its world-class product. Today, you are able to search information by traversing across your network of friends and their friends and so on.
- LinkedIn has been working on its much-publicized Economic Graph. The Economic Graph plans to provide suitable opportunities to all its users by connecting the users with the companies and their profiles up to a certain level.
- The recommendation system, which is a very important tool for many online retailers, has been using graph database principles to provide effective, relevant recommendations to potential consumers. The recommendation engines basically search the network of customers who have made similar purchases over a period of time and assumes that the customer who is browsing similar products will have the same tastes and preferences.
Summary
For all the potential of graph databases, a lot of companies are still playing catch-up with the trend. So, it will be a while before graph databases are widely accepted. While the potential of the graph database in solving complex problems is no longer in doubt, the position of the relational database is not threatened in any way. The best thing going for the graph database is that it can be offered as an open-source technology. It is up to the industries to leverage the benefits.