Cockroach Labs Chief Targets LLMs with Vector Encoding

The co-founder and CEO of Cockroach Labs has said AI is incapable of writing a modern enterprise database but will help finally wean financial firms and other major users off aging legacy data platforms.
Spencer Kimball also said the firm was working on vector encoding capabilities that would help LLMs establish “context” in key applications.
Speaking to The New Stack in the heart of London’s financial district, Kimball said AI “seems to be getting quite good. We’re looking at it very intentionally.”
Reimagined Products
GPT-4 has set the stage for an AI boom, said Kimball, in which “a lot of products can be reimagined, I think, in ways that are not obvious yet.”
As with every other technological advance, “You see the immediate benefit. And then what happens in the next five years, you wouldn’t necessarily have been able to predict. I think the same will be true.”
This could mean the rapid rise and fall of successive players, as platforms come to prominence only to be quickly displaced by newer entrants.
By contrast, he continued, “I think it’s a good time to be a sophisticated operational database. There’s vector databases and things like that that are directly supported with the AI use case. But ultimately, you need to store all the normal metadata, all the stuff that you need to do for any of these use cases.”
Vector Encoding
He said that just as Cockroach Labs has added geospatial and analytics capabilities over time, “We will add the vector encoding capability as well.”
Kimball described these vectors as “A simplified representation of what is going through one of these big neural networks or LLMs. So, you can actually put something in there, and it has a certain state, and then you’re taking that state, and you’re getting it as a much smaller representation.”
He cited the example of someone typing in questions while dealing with a support ticket. “You can actually represent the successive states of that as these vectors. So, you can recreate that history. And so, the AI can be primed to understand this customer’s context as they ask a new question.”
LLMs in Context
There’s no way to keep that information in an LLM, Kimball said, but by having the vector encoding capability, “If they come back two days later, and ask a new question, you’re actually able to recreate a lot of that history.”
Vectors can be compared for similarity, he added, so “And then you actually have some really key suggestions for where there might be previous resolutions that can help answer the current question.”
The technology also offers a “more thoughtful way” to handle recommendations, instead of simply spotting a customer has bought a product and then highlighting other products purchased by other customers who’d bought the same product.
Cockroach Labs’ core financial and large enterprise customers were not necessarily asking for these capabilities yet, he said: “But they will be.”
For all the talk of coders being squeezed out by generative AI, Kimball said he couldn’t foresee the prospect of AI writing a database. However, he said, there were some key ways AI could augment current databases.
One obvious application is education and support, and he said Cockroach Labs was using AI for its education programs both to evolve lesson plans and to have them delivered by “human-like actors”.
“It’s kind of crappy compared to when you do it with humans. It’s not quite there yet,” he said. But the speed at which you can do it is pretty mind-blowing. So, we’re experimenting in areas like that.”
More directly, he said Cockroach Labs was examining how AI could be used for optimizing queries. “We often get issues where somebody writes kind of a boneheaded query. And when it actually goes to execute, even the optimizer can’t do a good job on it. And so, it provides really bad results.”
Sometimes the user can rewrite the offending query or escalate it to Cockroach Labs to help.
However, he said, this was a task that really lent itself to being taken on by AI. “What you can do is you can put the AI before the optimizer inside the database. Those queries come in, [and] can be rewritten by the AI to turn something boneheaded into something good that the optimizer can do the right thing with.”
Another major use case is in database consolidation, something most of Cockroach Labs’ big customers are looking to achieve.
“They want to move a bunch of old use cases that are increasingly long in the tooth and hard to support into the new world onto the cloud onto Cockroach, from Oracle from Db2 mainframes.” But the cost of moving software to a new database is not trivial.
The Limits of LLMs
AI code generation was “nowhere close” to being able to produce a database. “It can’t reason about how these complex systems should be put together. “
But, Kimball continued, “If you want it to do a point-by-point migration, from one database index to another as an example, that is a solved problem.”
“So, we’re thinking of this as a way to accelerate adoption pretty dramatically.”
This would be particularly appealing for the financial sector companies which are the key market for Cockroach Labs. They often have multiple legacy databases which are notoriously difficult to shake off.
They also appreciated the “open source exit ramp” Cockroach effectively offers, Kimball said.
At the high end of the market where Cockroach Labs operates, “When you’re choosing a database these days, you want to be careful about what kind of lock-in you’re going to entertain.”
Rubs the Wrong Way
When it comes to AI, by contrast, Kimball said it was “a little bit disturbing” how much dominance Microsoft and OpenAI have gained so quickly.
Despite the “open” moniker: “It’s very protected and not only that, they’re suggesting because of this AI ‘doomerism,’ that nobody else should be able to make these big models.”
He noted, “That doesn’t rub me the right way.”
But, he added, “I have extreme doubts about how imminent Artificial General Intelligence is. And that’s contra many super intelligent people in the field … I actually think there’s something more unique about human cognition that has not been anywhere near captured in these LLMs.”