AllegroGraph 8.0 Incorporates Neuro-Symbolic AI, a Pathway to AGI
Franz has updated its flagship AllegroGraph triplestore graph database to include vector generation and vector storage capabilities. The amalgamation allows organizations to avail themselves of all forms of AI: statistical machine learning, non-statistical reasoning and large language models (LLMs) trained on the entirety of the internet.
With all of these approaches available within a knowledge graph framework, organizations can readily implement retrieval augmented generation (RAG) to heighten the accuracy of the results of language models. More importantly, they can employ these three branches of AI to counterbalance one another so that the strength of one method nullifies the drawbacks of another.
The result is a natural language querying system in which the grand vision of AI, statistical and non-statistical, is finally realized.
According to Franz CEO Jans Aasman, “The point of a neuro-symbolic system is you can do amazing things when you combine these systems, and get better results than you could with any of these systems alone.”
Thus, organizations can combine the explainability of logic and rules techniques with the vast information LLMs have learned, while adding the probabilistic pattern recognition of advanced machine learning to ensure accurate AI across any domain — or use case.
Checks and Balances
The synthesis of these three expressions of AI manifests most remarkably for the enterprise in terms of predictions.
In health care, those might involve a patient’s outcome to a specific form of treatment or series of maladies. In finance, it might entail the results of trading or loan opportunities in volatile markets. Or, it might be as simple as the ideal insurance rates for a prospect with a particular history and demographics.
For each of these applications and others, AllegroGraph’s semantic knowledge graph underpinnings support a simple event-based schema optimal for determining future events from present ones. Organizations can rely on the graph database for deep learning predictions about a patient based on his specific data or query the vector database about what might happen using natural language to ask an LLM like ChatGPT, “which is based on 36 million PubMed articles,” Aasman noted.
The best results occur when these AI branches are used in tandem.
“Say you’ve got a conclusion from machine learning that says something is going to happen,” Aasman postulated. “You can A, use ChatGPT to ask ‘Is this in line with what the medical literature says, or [is this] something new?’ Or B, ‘Explain to me why this might be the case.’”
The results of each of these AI techniques can be input into a knowledge graph (the 8.0 release includes a knowledge graph as a service offering) in the RDF triplestore, increasing the concrete knowledge organizations have for their domains.
This mechanism supports rules-based reasoning techniques that are 100 percent explainable. “Machine learning can be explained by LLMs, or computations by an LLM can be supported by machine learning or refuted by machine learning,” Aasman said.
Vector Generation, Vector Retrieval
There are several points of interest about AllegroGraph 8.0’s vector store. Not only can organizations embed content with external models from their own or third-party sources, such as Amazon Bedrock, LangChain, etc., but they can also do so within AllegroGraph.
According to Aasman, the database supports embeddings with LLaMA or ChatGPT, the latter of which delivers natural language querying capabilities for users with SPARQL. “You just tell AllegroGraph where the files are, how big your text fragments should be, what machine you want to use, and it does it all for you,” Aasman said.
In addition to creating the embeddings, AllegroGraph also indexes the content with FLAT indexes and Approximate Nearest Neighbor Oh Yeah (ANNOY). Significantly, the indices and the vectors don’t have to remain in memory. Costs in dedicated vector stores increase considerably when this information stays in memory.
“We can put these vectors on disk,” Aasman added. “But, if you have fast SSDs and use a technique called memory mapping, it will try to get as much in memory as it can.” Metadata about vectors is also generated during the embedding and indexing process in AllegroGraph, which is useful for filtering search results. The knowledge graph environs the database supports is highly effective in this regard.
With dedicated vector database solutions, “For every text fragment, you can have a limited list of metadata elements and you can, of course, filter before or after [vector searches],” Aasman noted. “But for us, we can literally use the entire graph for pre-filtering.”
Retrieval Augmented Generation
The 8.0 release enables users to employ LLMs to populate the knowledge graph and construct ontologies and taxonomies for particular domains. There’s also an API, SerpApi, through which SPARQL can directly access Google’s search engine, which is helpful for verifying the results of LLMs. The Google results come back as text fragments so, for a use case in which one is collecting prices for the most expensive cars for a particular demographic from Google via the SerpApi, “I ask the LLM, have it read each snippet of information from Google, and fish out the price and put it in the database,” Aasman explained. Organizations can check any disparities between the two sources and have people investigate as needed. Although AllegroGraph uses SerpApi with Google, the API supports several sources for search including Bing, DuckDuckGo, Yahoo and Walmart.
The chief means of implementing RAG in AllegroGraph 8.0, of course, is to have the language model issue natural language SPARQL queries against the knowledge graph itself, which can be populated with language models, Google and internal documents and ontologies — and vetted by humans.
Therefore, for a life sciences use case, “You can take unstructured text that you might have in your knowledge graph, like clinical trials, and you extract medical entities out of that text and relationships between terms,” Aasman said. “Because we have a vector store built in, you can actually ask questions about any clinical trial you might have in your knowledge graph, and you get an answer based on your own private documentation that’s in your knowledge graph.”
The Overarching Significance
AllegroGraph 8.0 also features a new web interface called AGWebView, enhanced sharding capabilities and an update to its visualization construct, which now enables visualizations of RDF* annotations for labeled properties in semantic graphs.
However, the true import of this version of the database is its propensity to consolidate facets of non-statistical AI, machine learning and information learned from LLMs in a knowledge graph environment augmented by a vector database. Each of these branches of AI can fortify the strengths of the others while redressing their shortfalls, which has always been the grand vision of true AI and not just that predicated on its statistical prowess.