Knowing context in designing for voice

Enabling computers to hold a conversation is a hard problem to solve. Many assistants claim to be conversational. However, what they really do is ask a series of one-off questions to simulate a conversation. One-offs like ‘set an alarm for 7 AM’ can be really useful, but it would be really awkward to say ‘when should I leave for the airport for my wife’s flight number 747 from Bangalore.’ A much more natural conversation would go somewhat like this;
User: Tell me when my wife’s flight lands.Assistant: Oh the flight from Bangalore? It lands at 9 PM. She should be out by 9:30User: Cool. When should I leave?Assistant: It’s better you leave at 8:30. Should I book you a cab?User: Umm no. I’ll drive today.
… after a few hours
User: Hey can you book me the cab?Assistant: Sure! I’ll book it right away.
There are a few key components to a good conversation; contextual awareness, memory of previous interactions and an exchange of appropriate information. The computer needs to know the context in which the conversation is happening. If I shout to it, it could speak loudly. But if I whisper, can it whisper back?
Maintaining an illusion of awareness
Users would engage with the system more if they realize that it is aware of their presence. We need certain strategies for the assistant to maintain an illusion of awareness. Harry Gottlieb wrote “The Jack Principles of the Interactive Conversation Interface” in 2002. In this paper, Gottlieb outlines tips for creating the illusion of awareness in a conversational system; specifically, he suggests responding with human intelligence and emotion to the following:
- The user’s actions
- The user’s inactions
- The user’s past actions
- A series of the user’s actions
- The actual time and space that the user is in (time is obvious, place can refer to geographical space, which app the user is in or a place in the house like kitchen, living room, etc.)
- The comparison of different users’ situations and actions
Gottlieb also outlined tips for maintaining the illusion of awareness:
- Use dialog that conveys a sense of intimacy
- Ensure that characters act appropriately while the user is interacting
- Ensure that dialog never seems to repeat
- Be aware of the number of simultaneous users
- Be aware of the gender of the users
- Ensure that the performance of the dialog is seamless
- Avoid the presence of characters when user input cannot be evaluated
There are a few other ways of maintaining this illusion of awareness when designing modern VUIs. It is imperative to keep in mind the context of the user when designing any conversational interaction.
The user’s physical location
If the assistant knows where the user is and responds accordingly, it will seem more aware. Knowing the user’s location has multiple advantages. It helps the assistant answer to queries while respecting context. For example, when a user asks for “party places,” the assistant can suggest clubs near the user rather than doing a random search of irrelevant places around the world.
Type of users
It is a good strategy for the system to be aware if the user is interacting with it for the first time or uses it on a regular basis. For example a life logging app might require users to log their mood everyday. This is how the app could prompt different users:
Beginner
Assistant: “How are you feeling today? Make sure you take a minute to think about your day and select one or more options that correspond with your mood.”
Advanced
Assistant: “Hey there! How’s it going?”
It is important to count the frequency of usage rather than the number of times the assistant has been used to determine proficiency of the user. The user might have used the assistant only once every month, but the overall numbers might be great.
It is also important for the system to adapt to the user’s behavior rather than simply nudging to use the assistant. For example, the system can know when the user uses the assistant and prompt only during those times.
PrimingIn psychology, priming is a technique whereby exposure to one stimulus influences a response to a subsequent stimulus, without conscious guidance or intention. For example, you are more likely to answer ‘Mongolia’ when asked to name a place that starts with the letter ‘M’ if you’ve just seen a documentary about Mongolia.Letting the user know what to expect is also a form of priming. It informs users on how to prepare themselves. Priming however can be subtle, if the VUI responds to the query “Can you play me ‘Fix you’?” as “Playing ‘Fix you’ by ‘Coldplay.’” Next time the user can simply say “Play ‘Fix you’ by ‘Coldplay’”
Type of device
Voice interfaces have gone beyond IVR to smartphones, smart speakers, in cars, on the wrist watch and soon will be an integral part of our lives. When designing for voice, we are designing for two different things: an input mechanism through voice and an output mechanism which is not necessarily through voice. Although the input is voice, the output would depend on the context in which the solution is being used.

If you ask the an assistant on your phone to tell you “the top ten movies of 2017”, there would be a cognitive load on the user if it reads out everything. It would be much better to just show the list of movies.
Doing this has advantages beyond reducing cognitive load: the assistant can present much more information about the movie apart from the name like the actors, directors, awards received, etc. which would otherwise be difficult to capture with voice alone. It is important to remember that there exist interfaces beyond a speaker and a microphone.
References:
1. Being Digital — Nicholas Negroponte
2. Designing voice user interfaces — Cathy Pearl
3. Design for Voice Interfaces — Laura Klein
If you liked this article, please click the 👏 button (once, twice or more).
Share to help others find it!