Development / Machine Learning / Technology

Lambeq, a Toolkit for Quantum Natural Language Processing

12 Nov 2021 4:00am, by

Natural language processing (NLP) is becoming increasingly important to the world of high-performance computing (HPC) and enterprises in everything from automating routine tasks to gaining insights into the massive amounts of data that is being generated.

A component of artificial intelligence (AI), NLP essentially improves the relationship between people and their machines by enabling systems to more easily understand human language and decipher the meaning of the words and phrases used. According to market research firm Statista, revenues in the global NLP space are expected to grow from more than $17.5 billion this year to almost $43.3 billion in 2025.

Now a startup in the UK is taking steps to enable NLP in the quantum computing field. Cambridge Quantum recently released what company officials said is the first toolkit and library for quantum natural language processing (QNLP). Called lambeq — after the late mathematician and linguist Joachim Lambek — the software toolkit is designed to convert sentences into a quantum circuit.

Real-World Quantum NLP Applications

The goal is to help developers create real-world QNLP applications for such tasks as automated dialogue, text mining, language translation, bioinformatics and text-to-speech. Cambridge Quantum has released lambeq to the open source community for broader use by developers and researchers. It works with the company’s TKET quantum software development platform, which also is open sourced.

Lambeq “automates tasks that are necessary for the large-scale implementation of QML [quantum machine learning] pipelines designed in terms of compositional models for language,” Bob Coecke, chief scientist at Cambridge Quantum, told The New Stack. “Any researcher in academia or industry who is interested in exploring the potential of QC for NLP can use our toolkit to this end.”

After decades of discussion and experimentation, the worldwide quantum computing space is expected to grow rapidly in the coming years. According to Verified Market Research analysts, the space will jump from $252.2 million last year to almost $1.8 billion by 2028, driven in part by the rise in compute power, expanding data center workloads and the ongoing shift to software-as-a-service (SaaS).

Quantum vs. Classical Computers

Classical computers use conventional bits, which can be storage as a 1 or 0. Quantum systems rely on qbits, which can be stored as a 1 and 0, vastly expanding the computational capabilities of the computers. Established companies like Honeywell, IBM, Microsoft and Google are being joined by a range of startups in pushing the quantum computing field forward.

Cambridge Quantum, launched in 2014, is one of those startups. The company announced in June that it is merging with Honeywell’s quantum business, Honeywell Quantum Solutions, to form a new company that will focus on the ion trapping methodology of running quantum computers. Honeywell has been an investor in Cambridge Quantum since 2019. IBM has been another investor in the startup.

Along with its TKET software platform, Cambridge Quantum also has developed software for such industries as chemistry, machine learning, finance and cybersecurity. Now the company has taken a step into NLP. Converting an ordinary sentence into quantum is a complex process.

NLP on Quantum a Complex Task

NLP processing in a quantum system is no easy feat. According to a blog post by scientists at Cambridge Quantum, it involves first converting it into a syntax tree format and then converting the parse tree into a string diagram, which among other things expresses the grammatical structure of the sentence. The scientists used the sentence “We are explaining how lambeq works” as an example of a string diagram, seen below:

how lambeq parses a sentence string.

 

“The string diagram can be simplified or transformed by the application of rewrite rules,” they wrote. “One might want to do this for example to make the diagram easier to transform to a suitable circuit for the currently available quantum hardware. A rewritten string diagram is then converted into an actual quantum circuit or tensor network, depending on the choice of whether it is executed on a quantum or classical computer, respectively.”

Developers can then pass this to the TKET platform to be moved to a quantum simulator or a quantum computer.

The modular design of lambeq enables users to swap components in the model and drives flexible architecture designs.

“The language models that we use attempt to combine distributional (vector-space based and probabilistic) methods for embedding meaning with compositional (formal, symbolic, structural) methods, which describe how meanings flow and interact within text,” Coecke said. “The grammatical and syntactic structures present in our brand of NLP are abstractly described by the same mathematics as that which describes processes theories. One of the applicable process theories is quantum theory. Therefore, we can make formal mathematical analogies and construct quantum language models under this compositional framework.”

From the Sequential to Two-Dimensional

NLP in quantum computing is a complex undertaking that moves from the sequential nature of the spoken word to something less one-dimensional.

“The point is that humans evolved language after they evolved a mouth-hole for breathing and eating,” Coecke said. “This physical restriction forces us to speak one word at a time in sequence. This is how we write, too. However, the concepts that we express, the stories we tell, the information we convey to each other, form a dependency network whose connectivity is higher than one-dimensional. Even syntax trees … that you learn in school, that encode dependency information inside a sentence, are two-dimensional structures. Going further, connecting sentences together forms a large network of dependencies between meanings. Telling a story means doing a walk over this network, and this time-ordering gives rise to what I call a ‘language circuit.’”

Quantum computers are better suited than classical systems for running NLP workloads, he said.

“The differences regard in what type of models one uses and what type of hardware they are more natural for,” Coecke said. “To go back to the ‘quantum nativeness,’ we believe that a particular approach to language processing can be naturally mapped to quantum computations. As this is a new area of research in QAI [quantum AI], the potential gains are still being explored. Then they would simply impact organizations for which this approach to NLP is useful. Regarding developers, we repeat the point that releasing lambeq is important for developers interested in NISQ [noisy intermediate-scale quantum] applications.”

Lambeq has been released as a conventional Python repository on GitHub and the quantum circuits generated by lambeq on quantum computers from IBM, on which Cambridge Quantum scientists have run most of their experiments.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Real.