بروزرسانی: 24 خرداد 1404
How to Leverage your Schema.org Knowledge Graph for LLMs
You can use structured data, also known as Schema Markup, to describe the content and entities on each page. You can also use structured data to connect the different topics on your site or link them to external authoritative knowledge bases (i.e. Wikidata).
Here is how it works:
- Schema App application loads the content model from your content knowledge graph. These would be all the Schema.org data types and properties that exist within your website knowledge graph.
- Then the user would ask the Schema App application a question.
- The Schema App application combines the question with the content model and asks the LLM to write a SPARQL query. Note: the only thing the LLM does is transform the question into a query.
- Schema App application then executes the SPARQL against your content knowledge graph and displays the results or requests as a formatted response using the LLM.
Let’s illustrate how your content knowledge graph can train and inform your AI Chatbot.
A Large Language Model (LLM) is a type of generative artificial intelligence (AI) that relies on deep learning and massive data sets to understand, summarize, translate, predict and generate new content.
A healthcare network in the US has a website with pages on their physicians, locations, specializations, services, etc. The physician page has content relating to the specific physician’s specialties, ratings, service areas and opening hours.
In summary, the integration of knowledge graphs with LLMs can significantly enhance decision-making accuracy, especially in the realm of Marketing.
For example,\xa0 LLMs have token limits, which restrict the input and output number of words that can be included. This approach eliminates this problem by using the LLMs to build the query/prompt and using the knowledge graph to query. Since SPARQL queries can query gigabytes of data, they don’t have any token limitations. This means you can use an entire content knowledge graph without worrying about the word limit.
To mitigate this issue, businesses can use their content knowledge graphs to train and ground the LLM for specific use cases. In the case of an AI chatbot, the LLMs would need an understanding of what entities and relations you have in your business to provide accurate responses to your customers.
By using the LLM for the sole purpose of querying the knowledge graph, you can achieve your AI outcomes in an elegant, cost-effective manner and have control of your data while also overcoming some of the current LLM restrictions.
Optimizing LLMs by Managing Data in the form of a Knowledge Graph
For a business to thrive in this technological age, connecting with customers through their preferred channel is crucial. LLM-powered AI experiences that answer questions in an automated, context-aware manner can support multi-channel digital strategies. By leveraging AI to support multiple channels, businesses can serve their customers through their preferred channels without having to hire more employees.
You can machine learn Obama’s birthplace every time you need it, but it costs a lot and you’re never sure it is correct.” – Jamie Taylor, Google Knowledge Graph
This approach of generating answers through the LLM is less complicated, less expensive and more scalable. All you need is a content knowledge graph and a SPARQL endpoint. (Good news, Schema App offers both of these.)
By doing this, the LLM doesn’t have to hold the data in memory or be trained on the data because the answers exist within the content knowledge graph, which makes it stateless and a less resource-intensive solution. Furthermore, companies can avoid providing all their data to the LLM as this method introduces a control point to the knowledge graph owner to only allow questions on their data that they approve.
Instead of training the LLM, you can use the LLM to generate the queries to get the answers directly from your content knowledge graph.
The content knowledge graph is also readily available, so you can quickly deploy your knowledge graph and train your LLM. If you are a Schema App customer, we can easily export your content knowledge graph for you to train your LLM.
Using LLMs to Query Your Knowledge Graph
Unstructured data that the LLM is trained on can also cause inefficiencies in the retrieval of information and high inference costs. Therefore, converting unstructured data such as documents and web pages into a knowledge graph can reduce information retrieval time and produce more reliable facts.
That said, if you want to leverage an AI chatbot to serve your customers, you want it to be providing your customers with the right answers at all times. However, LLMs don’t have the ability to perform a fact check. They generate responses based on patterns and probabilities. This results in issues such as inaccurate responses and hallucinations.
The content knowledge graph is an excellent foundation to leverage schema data in LLM tools, leading to more AI-ready platforms. It’s an investment that could pay off handsomely, especially in a world increasingly reliant on AI and knowledge management.
At Schema App, we can help you quickly implement your Schema Markup data layer and develop a semantically relevant and ready-to-use content knowledge graph to prepare your organization for AI.
It’s no secret that the AI revolution is well underway. According to a report by Accenture, 42% of companies want to make a large investment in ChatGPT in 2023.
Businesses can reduce the inference cost of the LLM by storing the historical responses or knowledge generated by the LLM in the form of a knowledge graph. That way, if a question was asked again, the LLM does not have to exhaust resources to regenerate the same answer. It can simply look up the answer stored in the knowledge graph.
In comparison to a traditional query, LLMs like ChatGPT have to run on expensive GPUs to answer queries ($0.36 per query according to research), which can eat into profits in the long run.
Instead, you should create your Schema Markup in a connected, scalable way that updates dynamically. That way, you’ll have an up-to-date knowledge graph that can be used not only for SEO but also to accelerate your AI experiences and initiatives.
Synergy Between Knowledge Graphs and LLMs
However, as the adoption of generative AI accelerates, companies will need to fine-tune their Large Language Models (LLM) using their own data sets to maximize the value of the technology and address their unique needs. There is an opportunity for organizations to leverage their content Knowledge Graphs to accelerate their AI initiatives and get SEO benefits at the same time.
So what is an LLM?\xa0
This method is possible because the LLMs have a great understanding of SPARQL and can help translate the question from natural language to a SPARQL query.
Thankfully, you can use knowledge graphs to help mitigate some of these issues and provide structured and reliable information for the LLMs to use.
What is a Knowledge Graph?
Despite the efficiency and benefits it offers, however, LLMs also have their challenges.
LLMs are known for their tendencies to ‘hallucinate’ and produce erroneous outputs that are not grounded in the training data or based on misinterpretations of the input prompt. They are expensive to train and run, hard to audit and explain, and often provide inconsistent answers.
Most organizations are trying to stay competitive by embracing the AI changes in the market and identifying ways to leverage “off-the-shelf” Large Language Models (LLMs) to optimize tasks and automate business processes.
To develop your content knowledge graph, you can create your Schema Markup to represent your content. One of the new ways SEOs can achieve this is to use the LLM to generate Schema Markup for a page. This sounds great in theory however, there are several risks and challenges associated with this approach.
In addition to duplicate entities, LLMs lack the ability to manage your Schema Markup at scale. It can only produce static Schema Markup for each page. If you make changes to the content on your site, your Schema Markup will not update dynamically, which results in schema drift.
With all the risks and challenges to this piecemeal approach, the Schema Markup created by the LLM is static, unconnected Schema Markup for a page – it doesn’t help you develop your content knowledge graph.
If the healthcare network has a content knowledge graph that captures all the information on their site, when a user searches on the AI Chatbot “I want to book a morning appointment with a neurologist in Minnesota this week”, the AI Chatbot can deduce the information by accessing the healthcare network’s content knowledge graph. The response would be the names of the neurologists that services patients in Minnesota and has morning appointments available with their booking link.
One such risk includes property hallucinations. This happens when the LLM makes up properties that don’t exist in the Schema.org vocabulary. Secondly, the LLM is likely unaware of Google’s required and recommended structured data properties, so it will predict them and jeopardize your chances of achieving a rich result. To overcome this, you need a human to verify the structured data properties generated by the LLM.
Mark van Berkel is the co-founder and COO of Hunch Manifest and the creator of Schema App. Schema App is an end-to-end Schema Markup solution that helps enterprise SEO teams create, deploy and manage Schema Markup to stand out in search. He is an expert in Semantic Technology and Semantic Search Marketing. Mark built Schema App to solve his own challenges in writing and validating schema markup.