UPDATED The nuances of human interactions and natural language have always been difficult for computers to grasp. That’s why, in conversations with virtual assistants, it’s always the people, rather than the AI, that are forced to adapt their speaking style and intonation.
Google CEO Sundar Pichai has unveiled the next stage of that often frustrating relationship onstage at I/O 2018. A new capability of Google Assistant that will allow the system to make phone calls on behalf of its user.
In itself, that’s not groundbreaking. More unusual is that the assistant will be able to do so without the person on the other end of the line realising, at first, that they are speaking with a robot.
In the demo (video courtesy of YouTube user Jeffrey Grubb), Pichai plays an interaction between a hair salon and the Duplex system, in which the assistant calls up to arrange an appointment for its ‘client’. What follows is an eerily smooth conversation, in which the assistant deploys natural-sounding intonation, the occasional filler, and even the reassuring “ums” and “mhhhmms” we are all familiar with.
An AI system for accomplishing tasks by phone
When receiving a phone call or voice message from a machine, it’s invariably easy to spot. The intonation is rigid, and there are none of the fillers, mistakes, or hesitations that fill human beings’ everyday utterances.
This lack of humanity can lead to frustration: a feeling of being under-valued by the company providing the service, along with a sense of remoteness and cost focus. Many of the interactions that people have with automated/interactive voice recognition or contact centre solutions prove the point.
These typical human conversational markers also make it difficult both for virtual assistants to understand natural language usage and, in turn, apply them in a way that doesn’t grate on the ear.
Machines also struggle to grasp implication, irony, or hidden meaning. Although we may not be consciously aware of it, humans can communicate a huge amount while saying very little. Often the content, emotion, and context that’s omitted from any transcript of a conversation are the real keys to understanding the meaning.
Google’s Duplex system promises to be the next step in making machine-to-human conversations more natural and, as a result, actually achieving more for the user. In a blog post on the project’s progress, the company says this is “thanks to advances in understanding, interacting, timing, and speaking”.
Part of the solution for Google was to train Duplex to handle specific tasks, honing in on certain things rather than expecting it to master the entirety of natural language. For example, it’s comfortable scheduling appointments and booking restaurants.
“One of the key research insights was to constrain Duplex to closed domains, which are narrow enough to explore extensively. Duplex can only carry out natural conversations after being deeply trained in such domains. It cannot carry out general conversations,” write Yaniv Leviathan, principal engineer and Yossi Matias, VP of engineering in a Google blog post.
Potential uses of Google Duplex
Google anticipates that Duplex could be used in a number of time-saving scenarios, from making calls to businesses to check office hours, to reducing appointment no-shows by reminding customers of their bookings. It could also be used to automate updates to Google maps.
The system is by no means perfect and still under development, but Google says that it is able to self-monitor and recognise when a task arises that it can’t complete on its own. In these cases, Duplex will hand over to a human operator and learn from ‘real-time supervised training’.
The goal is eventually to have a system capable of performing all of the dull but necessary tasks that take up our time every day, said Google. “Allowing people to interact with technology as naturally as they interact with each other has been a longstanding promise. Google Duplex takes a step in this direction, making interaction with technology via natural conversation a reality in specific scenarios.
“We hope that these technology advances will ultimately contribute to a meaningful improvement in people’s experience in day-to-day interactions with computers,” it added.
Internet of Business says
In itself, this is a promising avenue of research, even if the outcome is unsettling.
Often in AI, natural language processing, and robotics, huge resources are thrown at teaching artificial systems to understand human communication in the broadest sense, without focusing on specifics. By training systems in a limited, but deep way, rather than in a broad and general way, workable solutions may arise more quickly.
In some robotics experiments, for example, humanoid robots have been trained to mimic human conversations without having any basic understanding of human speech and meaning, using AI to recognise patterns and respond accordingly.
For example, if an AI-enabled robot in a shop has been trained with every possible customer query and answer, then it doesn’t need to understand human language at all to be able to use it convincingly. It’s just a matter of crunching the data.
A simulation of intelligence, it seems, is often enough convince most people: a problem that many of us have become all too familiar with in 2018.
However, all of this begs a number of important questions. For example, why not simply employ a person? Why design technologies to replace people, rather than augment their abilities? And shouldn’t bots make their ‘botness’ explicit, rather than engage in computerised deceit?
To answer at least one of these questions, Google has said that it will ensure that its AI identifies itself as a robot in future.
Google believes people will feel more comfortable engaging with bots that communicate more like us, and less like machines – despite the possibility that this may create the aural equivalent of the ‘uncanny valley’ problem that afflicts CGI characters and androids; the unease caused by something that is almost lifelike, but not alive.
The ‘replacement, not augmentation’ argument is a different matter, however. Most enterprise AI makers, including IBM and Microsoft, are adamant that their technologies are designed to complement human skill and ingenuity, and not offer users a binary choice between man and machine.
However, the overwhelming driver for many enterprise technology decisions is cost, and this is why many organisations may find it irresistible to choose solutions that are almost, but not quite human, moving wholesale into the uncanny valley to save a few bucks.