Representing knowledge and the reasoning for the conclusions drawn has remained a cornerstone of artificial intelligence (AI) for decades. A knowledge graph (KG) is a powerful data structure that represents information in a graphical format. DBpedia, an open source knowledge graph defines a knowledge graph as "a special kind of database which stores knowledge in a machine-readable form and provides a means for information to be collected, organized, shared, searched and utilised." Most importantly, a knowledge graph, which may also be referred to as a graph database, facilitates relational reasoning between any of its data points.
Formally, a KG is a directed labeled graph which represents relations between data points. A node of the KG represents a data point. The entity of this data point could be a person, a place or a webpage, and an edge represents the relationship between a pair of data points (for example, a companionship relation between two people or links between webpages.)
In this way, KG quite resembles a relational database since both store data showing the relation between data points (e.g. entities). However, both technologies are arguably distinct in terms of their reasoning objectives; while relational databases center on reasoning with attributes of a data point (i.e. columns of data tables), KG focuses on reasoning with data points. (Read also: How Graph Databases Bring Networking to Data.)
A Brief History of Knowledge Graphs
In 1956, a semantic network, which is a well-known ancestor of KGs, was first developed as an "interlingua" for machine translation of natural languages. In the late 1980s, Groningen and Twenty Netherlands universities jointly initiated a project called “Knowledge Graphs.” These KGs are mainly semantic networks with additional constraints to enable algebraic operations. In 2012, Google named their knowledge base Google Knowledge Graph.
Key Uses of KGs
Over the last decade or so, big tech companies like Google, Amazon and Facebook have spent millions to build their KGs that enrich their search engines and to understand the context of a query and gain a sense for specific user intent.
Google uses KG to power its search engine results with information collected from varied sources. The information from KG is presented to users in the form of a knowledge panel next to the search results. When you perform a search, Google combines previous results from your query with what other people might have found, using KG, to better serve your query.
Facebook uses KG to monitor networks of people and links between socially relevant entities such as the things most chatted about by its users. Besides using KGs to discover social connections among users and give users recommendations about social interests, Facebook’s graph search feature uses KG to give answer to user’s natural language queries. An important reason that KGs have become so vital is the realization that the relations between data points are as valuable as the data points themselves, especially when we want to build social networks.
Netflix uses KG to arrange information on its huge catalog of content, inferring links between TV shows, movies and the directors, producers and actors, or who link them. The KG then helps infer what users might like to watch next, and nurture the "binge-watch" business model.
Siemens uses KG to construct models of the data it produces and stores; and employ it for risk management and process monitoring applications. They also use KG to build “digital twins” which is a simulated form of real-world systems and use the graph to design, prototype and train. KG are also being used in financial sectors for monitoring fraudulent transactions and for tasks such as investment analytics and marketing.
However, the storage and maintenance of most of the real-world KGs is becoming a challenging task due to their rapidly growing scale. The KGs that are available today are of unprecedented scale. For example, a recent version of Wikidata had over 80 million objects, and over 1 Billion relationships. Several industry knowledge graphs are even bigger, for example, a recent version of the Google knowledge graph had over 570 million entities, and over 18 Billion relationships. This large scale of knowledge graphs makes the efficiency and scalability of the graph algorithms paramount.
Examples of Knowledge Graphs
In addition to above mentioned special-purpose KGs, some commonly used KGs are:
WordNet is a collection of pairs of English dictionary and thesaurus words that are related in terms of relations such as: type_of and part_of and has_part, etc. It is typically used to improve the performance of NLP related tasks.
DBPedia covers an encyclopedia of entities such as places, people, books, films, organizations, diseases, species, etc. The KG leverages the in-built structure in Wikipedia infoboxes to build an ontology consisting of 4.58 million things.
Geonames is a KG of 25 million geographical entities and features.
KGs and Machine Learning
To deal with large-scale KGs, the AI community has been using machine learning (ML), not only to quickly build and structure KGs, but also to infer links between data points that would not be noticed otherwise. However, this is not a one-sided affair, as ML is also getting benefits from KGs to more deeply understand data such as text, video and audio, that cannot be fitted well into the relational database.
This marriage between KGs and ML is in the spotlight nowadays due to its many remarkable advantages. KGs based words embedding method has been a standard input representation method for symbolic data. KGs have been used on top of machine learning models to make AI systems more transparent and interpretable. KGs also bring in the ability to reason about things, which is vital to develop the ability to answer questions, understand images and retrieve information. (Read also: Why Does Explainable AI Matter Anyway?)
There is still an enormous untapped potential in combining KGs and ML. For example, a massive amount of knowledge (like resource description frameworks RDFs), which is represented on the Internet (e.g. Wikipedia), is available essentially for free, and is not being leveraged by current AI systems. A hybrid KG and ML system can hugely benefit from this knowledge to better understand the world, to organize and infer missing knowledge.