Test Run: Google’s Machine Learning-Based ‘Talk to Books’ Service
Google Research recently used machine learning to train an application to answer questions in a very unique way. It scans hundreds of thousands of books and identifies sentences which seem to offer the best response. And it does this by analyzing what it sees as the meaning of the question rather than just matching up keywords. It’s Google’s way of showcasing improved ways of processing natural-language queries.
But does it teach us anything about the current state of AI-enhanced applications?
The company’s new “Semantic Experiences” website seems to suggest that it’s all a clever approximation. “[T]he AI is simply considering what you type to be an opening statement and looking across a pool of many possible responses to find the ones that would most likely follow.”
First Google’s researchers identified billions of pairs of statements where the second one responds to the first, then used machine learning to develop an algorithm for identifying and selecting (or “predicting”) the best responses — that is, what would most likely happen next in an actual conversation. Its “Talk to Books” application applies that model to over 100,000 books, finding sentences which magically seem to respond to the semantic meaning of user questions.
Engadget calls it “a fun glimpse of how far natural language processing in artificial intelligence has come.”
The team behind the application includes 70-year-old famous futurist and inventor Ray Kurzweil, who became Google’s director of engineering back in 2012. “I’ve always worked to create practical systems that will make a difference in people’s lives,” Kurzweil said at the time, pointing out that back in the 1970s he’d invented the first print-to-speech reading machine for the blind.
“In 1999, I said that in about a decade we would see technologies such as self-driving cars and mobile phones that could answer your questions, and people criticized these predictions as unrealistic. Fast forward a decade — Google has demonstrated self-driving cars, and people are indeed asking questions of their Android phones.
“It’s easy to shrug our collective shoulders as if these technologies have always been around, but we’re really on a remarkable trajectory of quickening innovation, and Google is at the forefront of much of this development.”
This month Kurzweil co-authored a blog post with Rachel Bernstein, a Google Research product manager, to explain just how today’s computers are getting even better at understanding our human languages. Google has already used one recently-developed technique — “hierarchical vector models” — to improve Gmail’s ability to suggest pre-filled responses, and they’re currently exploring other uses.
So it was last week — on Friday the 13th — that Google Research finally showcased its latest capabilities on a new Semantic Experiences web site. Its tagline? “Experiences in understanding language.” An explanatory page on the site calls it “simply a demonstration of research that enables an AI to find statements that look like probable responses to your input…” adding “You may need to play around with it to get the most out of it.”
I was one of several brave web surfers who decided to accept accepted that challenge.
What Happens When You Talk to Books?
I’d recently read the original “Legend of Sleepy Hollow“ by Washington Irving, as well as a biography of the author’s life. So how would Google’s mysterious tool respond when I asked: “How would you describe Washington Irving?”
“Upright, straightforward, industrious and enterprising, he was highly respected by a very wide circle of the best people of Hancock County.”
Very good — except that’s a description of Jonathan Parker, an obscure figure in an obscure book called “History of Hancock County, Ohio.” (Which is nowhere near Irving’s beloved home in Tarrytown, New York.)
I was baffled. “Semantic search is based on searching meaning, rather than on keywords or phrases,” Kurzweil had explained in a blog post. But it seems to be missing a very key part of my meaning here.
The tool offered descriptions from other books — which were all also apparently of other people. (For example, “The Marble Man: Robert E. Lee and His Image in American Society” and “Pen Pictures of St. Paul, Minnesota, and Biographical Sketches of Old Settlers.”)
Google has suggested the site as a way of finding books you might want to read — and I have to admit that I did end up reading a few paragraphs from this unusual selection of books. Getting these oddball not-what-you’re-looking-for matches brought back some of the serendipity from the early days of the web — when web searches sometimes surprised me by offering up something delightfully random.
And I experienced the same thing in my next search — although the tool seemed to be performing a little better.
When I asked, “What can you learn from a book?” I found a surprisingly discouraging response from “The Ultimate Bushcraft Survival Manual.” “You can learn a great deal from books, but that only relates someone else’s experiences instead of learning yourself.”
The tool suggested five books without providing an excerpt, plus a few excerpts that didn’t seem to match the question. Though a consensus did seem to be forming when I read a surprisingly-relevant excerpt from “Difficult Conversations: How to Discuss What Matters Most.”
“There are limits to how much you can learn about human interactions from a book.”
I’m not the only one playing with Google’s new tool. The site Search Engine Journal asked it the question that torments every web marketer: “How do I rank first on Google?” — and received some appropriate responses.
“There is no one optimal set of proportions for achieving a top rank in Google.”
“You can bury your head in the sand and hope that Google rewards you with higher rankings…”
But maybe the application is better at handling certain kinds of questions. I have to admit that a Portland newspaper seemed to have much better luck when it asked the site, “What are the best memories of Portland?” The best of the results returned seemed vivid and appropriate.
“Portland was great: 500 suburban punks dancing, Nirvana trashing a few guitars through frustration, Tad thrashing with his usual aplomb, and the whole place like a Palm Springs pinball machine that no one could touch me on, which was from “Nirvana: The Biography,” by Everett True.
On YouTube, information systems professor James Gaskin described it as “Best literature review tool ever!” suggesting it could be very handy in one particular use case — assembling the “literature review” part of scholarly papers. He demonstrated this by asking it several legitimately academic questions, like “What is the relationship between burnout and turnover intention?”
“We used to go to the library and sort through physical copies of literature, which was a pain,” Haskins explains in the video. “And then we started using Google Scholar, which was way better but still kind of a shot in the dark using keywords. Now, all we have to do is ask our question…
“Type it here, and Google will produce a set of books that will essentially guide a response to your research question.”
Behind the Scenes
The “Semantic Experience” site also features a page for developers which includes more details about how the tool was put together. “These models are trained using English language examples, but the same approach can and has been used for other languages,” the page explains.
And Google’s blog post offers more hints about how the application actually works. “You may notice that being well-known does not make a book sort to the top; this experiment looks only at how well the individual sentences match up.”
Interestingly, the site’s page for developers also includes a section about biases in language understanding models, warning that language understanding models “can also reflect human cognitive biases.” The application takes a step to control for that: “In Talk to Books, while we can’t manually vet each sentence of 100,000 volumes, we use a popularity measure which increases the proportion of volumes that are published by professional publishing houses.” Google also acknowledges there are more “bias-impact mitigation” steps that are available, but which weren’t used for Talk to Books. “These experiences demonstrate the AI’s full capabilities and weaknesses.
“It will be possible to find offensive associations within these experiences. We encourage you to report offensive associations using the feedback tool so that we can improve future models… We don’t yet (and may never) have a complete solution to identifying and mitigating unwanted associations.”
Author Andrew Tobias also watched Kurzweil’s presentation of the technology at a TED conference in Vancouver and supplies some additional details. Tobias concludes that the service “is clunky: many of the searches you try will produce super dumb results,” but after listening to Kurzweil reports that “they’re just getting started. It will get better.”
It’s important to remember that the results are returned quickly — within half a second — which is why the application was limited to scanning just 120,000 books. It could just as easily have scanned a trove of one million books, but that would’ve required more time to return the results, and “Kurzweil told us they knew people wouldn’t put up with waiting six seconds.”
The official TED blog also notes a pithy observation from Kurzweil himself that that’s still faster than humans.
“It takes me hours to read a hundred thousand books.”
- The FDA approves its first cloud-based AI diagnostic device for doctors.
- Google uses AI to isolate voices from a crowd.
- A startup believes machine learning can bring high-yield farms into major cities — one of several “indoor vertical farming” startups.
- How Stack Overflow uses “a dusting of gamification” to engage its users.
- A developer podcast interviews the legendary Stack Overflow contributor with a karma over 1 million.
- Mozilla releases its first “Internet Health Report.”
- The 12th annual “State of Agile” survey results finds few organizations with a high level of organization-wide agile competency.
Yet 71 percent have a DevOps initiative or plan one within the next year.
- GitHub turns 10.
Git turns 13
- Whatever happened to the founder of MySpace?
- A TED speaker examines how smart homes “spy” on their owners.
Would you use a free internet-connected toothbrush from your dental insurance company?
Feature image: Powell’s bookstore in Portland, Oregon