Lila Rajabion provides four examples of how KGs can help leaders advance their understanding of the business environment in which their company sits. These include merging data silos to create a company overview across divisions, connecting different types of data in meaningful ways, aiding informed decision making by narrowing searches and contextualizing information, and showing interconnections that help leaders gain perspective. Next, she dives into how Google, LinkedIn, eBay, and IBM are using KGs and explains how other companies could follow suit. She then addresses four challenges currently faced by companies looking to leverage KGs, followed by a look at specific business efficiencies enabled by KGs, including making data more accessible for employees, helping leaders make data-driven decisions, and assisting companies in deploying AI technology.
A majority of businesses collect and store a substantial volume of data, but many don’t adequately harness it to enhance their decision making or fuel new opportunities.1 The sheer volume of data makes it difficult for companies to manage; this is compounded by the multiple silos in which data is stored. In this article, we’ll look at how knowledge graphs (KGs) can help solve that problem, opening up avenues to improved decision making, better employee data access, and easier deployment of artificial intelligence (AI) technology. We’ll also examine some real-world examples of KGs (including Google and others) and look at some of the challenges faced by companies as they develop KGs.
KGs have been around for quite a while, but they didn’t receive much attention until Google began integrating them into its search engines. Today, large companies like Google, LinkedIn, and Amazon use KGs to optimize searches, but companies of any size can use them to improve data accessibility and searchability.
Today’s emphasis on searchability is forcing content marketing and search engine optimization (SEO) experts to create rich networks of informative and instructional materials to satisfy customers during the buyer journey. Companies that don’t excel at searching and retrieving data for their customers have trouble remaining competitive.2 Using a methodological system like KGs to more efficiently manage that data thus becomes a strategic advantage.
For example, if a person wants to search Google for his or her favorite place to eat but only knows the location and not the name of the restaurant, Google, with the help of its KG, can provide relevant suggestions in real time. Similarly, KGs can improve a company’s content marketing and SEO by: (1) unambiguously defining content for search engines and (2) building robust information environments around products and services for prospects and customers.3
One of the most important KG functions is creating linkages across multiple data sets. By providing a visual representation of the underlying connections between data nodes, KGs help leaders advance their understanding of their environment so they can make intelligent business choices.4 Here are four examples:
By providing a way to merge data silos, KGs create a valuable overview of all knowledge in a company, both within departments/divisions and across them. This is helpful for companies with multiple divisions, especially if they’re located in different regions or countries.
KGs have the ability to connect different kinds of data in meaningful ways.5 For example, academic graphs include people, papers, research topics, and conferences to help users detect connections between researchers and pieces of research.
By narrowing searches and contextualizing information, KGs can help business leaders make more informed decisions faster.6
By having each topic or item represented just once (with all its connections) in context with all other subjects and their relationships, KGs clearly show how each node is interconnected. This helps leaders gain perspective on how important ideas relate to one another.
The benefits of KGs are not limited to large tech companies. In fact, any company with a significant amount of data can benefit from them. Following are some examples of how companies are using KGs to improve content management and user-centric services — and how other companies could follow suit.
The search results page on Google responds to questions the company has already addressed with the help of its KG. Since Google does not develop content, the results it displays originate from credible sources that are organized and linked, yet dispersed over the Internet.7 Voice-activated assistants Google Assistant and Google Home use the same KG to answer verbal inquiries.
In other words, Google’s KG is a knowledge base designed to improve its search engine results using information acquired from a variety of sources. Following its launch in 2012, Google’s KG saw tremendous growth, more than tripling in a matter of months to reach 570 million entities and 18 billion facts by its most recent count.8
Rather than crawling through or indexing websites, Google uses its KG to organize the world’s information by topic; advantages for the company include scale, data integrity, and speed. Google can easily harness user behavior data to understand what topics are significant to individuals and suggest topics based on user history. Other companies could use this approach, leveraging data to better understand customer behavior in order to improve products and/or marketing.
Amazon Web Services (AWS) KGs are a mechanism for modeling and conveying knowledge about the company’s services. This concept has been around for a while, but the development of scalable graph databases has made it more applicable.9 Compared to data management systems like relational databases, KGs are extraordinarily adaptable, capable of accounting for the variety and heterogeneity of data in the real world.
Using a collection of ideas, the properties of those concepts, the interactions between those concepts, and the logical constraints that are expected to hold, AWS KGs can capture the semantics of a specific domain.10 Because this model includes logic, we can reason about graphs and the information included within them, making the information implicit in the graph readily available. The process of information asset consolidation includes integrating an organization’s information assets and making them easily accessible to all members of an organization.11
AWS KGs open the door to a variety of applications, most of which are helpful on their own, not only for the company but for its clients. For example, Amazon could turn the data it gathers into a more helpful resource by using an enterprise KG. Furthermore, it could develop corporate knowledge graphs by using the built-in federated query functionalities of the Amazon Neptune graph database.12 Public data from the Internet could be used to enrich the information already included within these graphs. Other companies can similarly use KGs to help them organize information from dissimilar data sources to enable more intelligent search. Ultimately, KGs can help organizations make their data more understandable by using business terms rather than ambiguous codes.
LinkedIn’s KG is an enormous knowledge base constructed from entities such as members, jobs, titles, skills, companies, geographical locations, schools, and the connections between them.13 LinkedIn uses this ontology to improve its recommendation system; search, monetization, and consumer product offerings; and business and consumer analytics.
Developing this type of comprehensive knowledge base proved extremely challenging. Websites like Wikipedia and Freebase are almost entirely dependent on user contributions.14 LinkedIn took a different approach. LinkedIn’s KG is primarily derived from the large quantity of content provided by corporate administrators, recruiters, advertisers, and other users.15
The KG grows constantly as individuals sign up for the platform, employment opportunities become available, new companies join, new skills are added, and new titles surface in user profiles and job ads.
Moreover, the company uses machine learning (ML) methods to help find solutions to its KG network challenges.16 This is essentially a process of data standardization on user-generated content and external data sources. ML is applied to entity taxonomy construction, entity-relationship inference, data representation for downstream consumers, insight extraction from the graph, and interactive data acquisition from users to validate inferences.17
New entities are continuously added to the KG, and new connections are forged between existing entities. Alterations to existing partnerships are also possible. For instance, when a member gets a new position, the mapping from her previous title to her present one is updated accordingly. It is necessary to perform real-time updates on the LinkedIn KG network whenever member profiles undergo modifications or when entities are added. Other companies could similarly take advantage of ML to help them improve their data quality and KGs.
eBay’s product knowledge graph encodes semantic knowledge about items, entities, and their connections. This information is vital to eBay’s marketplace technology, which automatically connects sellers and buyers. eBay uses KGs to describe products, schedule deliveries, and service customers through virtual assistants. eBay’s KG sometimes links items to real-world entities, establishing a product’s identity and value to a customer.
The KG also links goods. For example, if a person looks for Lionel Messi memorabilia, and the KG shows he plays football (soccer) for FC Barcelona, that person may also be interested in FC Barcelona items or items like signed jerseys from other Barcelona players.
For eBay, understanding product connections is as important as entity interactions, and the knowledge network must answer a search query in milliseconds. Because large graph queries can take hours to complete, eBay engineers built a flexible, universal architecture. The KG keeps track of every entry and change, and the data is organized in a log. This enables a variety of back-end data storage options, such as low-latency document storage and a graph store for long-running analysis. To keep the graph in chronological order, each store adds its operations to the write log, resulting in more consistent results for customers.
Other e-commerce companies could similarly use KGs, leveraging entity relations to better understand their products’ relationships (e.g., suggesting an iPhone case to someone who just purchased an iPhone and successfully modeling various phone sizes and cases in order to offer a case that fits the phone bought).
Watson Discovery services uses IBM’s KG framework in two ways. First, the framework directly supports Watson Discovery, leveraging structured and unstructured knowledge to discover new information. Second, it allows individuals to construct KGs based on the prebuilt KG. Discovery creates knowledge not present in existing documents or available data sources. Examples include connections between entities (e.g., drug side effects, acquisition targets, and sales leads), new important entities in the domain (e.g., an investor for a specific investment area), or changes in the significance of an existing entity (e.g., an increasing interaction between a person of interest and a criminal).18 Other companies could similarly leverage KGs to identify prospects, current customers who might be interested in other products, and potential investors.
KGs have been used to improve search results across a variety of search engines, including Google and Bing, and to provide support for a large number of applications. Amazon is developing a product graph that will serve as an official KG for all the items in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the enormous volume of new products we need to handle every day, and the number of applications (search, discovery, personalization, and voice) we wish to support present significant challenges when it comes to the construction of such a graph. KGs vary greatly in scope and design, but the challenges in creating them are similar for most implementations.
Disambiguation & Control of Individual Identities
Resolving ambiguity between entities is a significant difficulty in Semantic Web and KGs. Problems arise when an entity’s name or mention is not given its own normalized identity and type in the context of a conversation. Many autonomously generated things, such as people with similar names and book/movie titles, have similar surface forms to each other. Likewise, comparable products may be listed under various headings. A lack of appropriate linking and disambiguation can lead to inaccurate judgments about entities. The difficulty of identity management rises exponentially when dealing with many contributors on a large scale.
Resolutions & Membership
There are several types of entities in most KGs. For instance, Angelina Jolie is a human, an actor, and a humanitarian; she’s better known for acting than her humanitarian efforts. A KG might employ a particular set of attributes based on a user’s job, and early on, the criteria for being a class member might be easy to understand. As the number of instances grows, it becomes harder to enforce these criteria while maintaining semantic stability. For example, e-sports did not exist when Google set up the category for sports in its KG. So how does Google keep the sports category separate from e-sports while still including them?
A good entity-linking system must grow naturally based on the data it receives, which is constantly changing. Companies can merge or split, and new scientific discoveries can make one thing into more than one. KG frameworks are getting better at storing and managing changes, but they still cannot manage highly dynamic information.
Maintaining several stores (e.g., IBM’s polymorphic stores) may be difficult. There are many things to think about regarding the integrity of the update process, its eventual consistency, updates that conflict, and runtime performance in general. The answer may be different kinds of distributed data stores that are already set up to handle incremental cascade updates. It’s also essential to keep track of changing schemas and type systems without making the knowledge already in the system inconsistent. Google solves this problem by thinking of the metamodel layer as comprised of several layers. The lower layers stay mostly the same while the higher levels are made up of meta types, which are just instances of types that can be used to improve the type system.19
Extraction of Information from a Variety of Organized & Unorganized Sources
Although there have been recent improvements in understanding natural language, it’s still hard to pull out structured knowledge, which includes entities, their types, attributes, and relationships. To grow KGs at scale, we must use manual methods alongside unsupervised and semi-supervised methods to extract knowledge from unstructured data in open domains. For example, eBay’s product knowledge graph gets many of its graph relationships from the unstructured text in listings and seller catalogs. The IBM Discovery KG gets its facts from documents.
Training knowledge-extraction systems in traditional supervised ML frameworks is difficult and time-consuming. This can be mitigated by using fully unsupervised or semi-supervised approaches. Entity recognition, classification, text, and entity embeddings are all useful ways to connect unstructured text to graph entities.
Can KGs Improve Business Efficiency?
There are several ways KGs can improve business efficiency. The first is by creating an advanced way for business leaders to merge, sort, and view data.20 KGs create a web of information on a subject, pulling from multiple sources and merging various data types to help leaders better understand their company’s reality and make data-driven decisions.
The second way is helping employees quickly gain access to the information they need. KGs make it easier to understand internal assets, such as benefits, tax information, organizational structure, and more.
The third way is helping companies deploy AI technology, such as chatbots and advanced search. KGs can act as inputs for ML, since ML algorithms achieve better results if they include domain knowledge. KGs help capture domain knowledge, but ML algorithms require that any discrete structure, such as a graph, first be converted to a numerical format.
A KG could help chatbots memorize, associate, and reason about the semantic connections between entities, bridging the gap from perceived intelligence to cognitive intelligence. However, we must keep in mind that logical reasoning remains a challenge for KGs. For example, a medical chatbot could collect symptoms and offer basic medical advice but is not intended to replace a physician’s diagnosis or advice.
KGs can play an important role in business, particularly in improving decision making and boosting efficiency. By merging data silos, KGs create a valuable overview of all the knowledge in a company, both within and across departments/divisions. Similarly, by narrowing searches and contextualizing information, KGs can help business leaders make more informed decisions faster. Because each topic or item in a KG is represented just once in context with all other subjects and their relationships, KGs show node interconnections that help leaders gain perspective on how important ideas relate to one another.
Companies such as Google, AWS, LinkedIn, eBay, and IBM are already using KGs to improve searches, make data more accessible to leaders and employees, improve product suggestions made to customers, and much more. KGs can also help companies with their AI deployments, including chatbots. KGs act as ML inputs, adding domain knowledge to help ML algorithms achieve better results.
As customers become ever more accustomed to fast, accurate product searches and expect to consistently receive useful suggestions for ancillary purchases, KGs are an important tool for companies hoping to successfully satisfy them during the buyer journey.
1 Nickel, Maximilian, et al. “A Review of Relational Machine Learning for Knowledge Graphs.” Proceedings of the IEEE, Vol. 104, No. 1, December 2015.
2 Collarana, Diego, et al. “FuhSen: A Federated Hybrid Search Engine for Building a Knowledge Graph On-Demand (Short Paper).” On the Move to Meaningful Internet Systems: OTM 2016 Conferences. Springer, 2016.
3 Nickel et al. (see 1).
4 Kejriwal, Mayank. “Knowledge Graphs.” In Applied Data Science in Tourism: Tourism on the Verge, edited by Roman Egger. Springer, 2022.
5 Grangel-González, Irlán, Felix Lösch, and Anees ul Mehdi. “Knowledge Graphs for Efficient Integration and Access of Manufacturing Data.” Proceedings of the 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 2020.
6 Galkin, Mikhail, et al. “Enterprise Knowledge Graphs: A Semantic Approach for Knowledge Management in the Next Generation of Enterprise Information Systems.” Proceedings of the 19th International Conference on Enterprise Information Systems (ICEIS 2017). SciTePress, 2017.
7 Heist, Nicolas, et al. “Knowledge Graphs on the Web — An Overview.” Cornell University, March 2020.
8 Balog, Krisztian, and Tom Kenter. “Personal Knowledge Graphs: A Research Agenda.” Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR 2019). ACM, 2019.
9 Zheng, Da, et al. “DGL-KE: Training Knowledge Graph Embeddings at Scale.” Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2020.
10 Zheng et al. (see 9).
11 Baclawski, Ken, et al. “Ontology Summit 2020 Communiqué: Knowledge Graphs.” Applied Ontology, Vol. 16, No. 2, 2021.
12 Szekely, Pedro, et al. “Building and Using a Knowledge Graph to Combat Human Trafficking.” Proceedings of the Semantic Web — International Semantic Web Conference (ISWC) 2015. Springer, 2015.
13 Chen, Xi, et al. “How LinkedIn Economic Graph Bonds Information and Product: Applications in LinkedIn Salary.” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). ACM, 2018.
14 Dong, Xin, et al. “Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion.” Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2014.
15 Auradkar, Aditya, et al. “Data Infrastructure at LinkedIn.” Proceedings of the IEEE 28th International Conference on Data Engineering. IEEE, 2012.
16 Auradkar et al. (see 15).
17 Auradkar et al. (see 15).
18 Kejriwal (see 4).
20 Yahya, Muhammad, John G. Breslin, and Muhammad Intizar Ali. “Semantic Web and Knowledge Graphs for Industry 4.0.” Applied Sciences, Vol. 11, No. 11, 31 May 2021.