How Nielsen’s 10 usability heuristics apply to Voice UI

With the improvement in speech recognition technology, voice user interfaces have gained a lot of popularity recently.

Published in

UX Collective

6 min readApr 2, 2018

“40% of adults now use voice search once per day” according to Location World
“41% of people using voice search have only started in the last 6 months” according to MindMeld
“50% of all searches will be voice searches by 2020” according to Comscore

As the usage of voice search increases, applications with the voice as their output modality will witness huge growth. This shift from visual to auditory output brings with it a whole new interactive experience for users. Voice User Interface designers need to design for this new experience. Does that mean we need to reinvent the wheel and come up with new design methods and principles?

Human needs haven’t changed over the years but what has changed is technology and context. Voice interfaces will not free us from the most substantial problems of user interface design. Thus, the same interaction design principles still hold true for voice interfaces. As designers, what we need to understand is how to tweak these principles to apply them in different contexts.

Jakob Neilsen laid down 10 usability heuristics for interaction design. Let’s see how these apply to the Voice User Interface (VUI) design.

01. Visibility of system status

The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.

For VUIs: The user should know when the system is listening, when it is processing and when it is speaking. Since voice interfaces have to deal with a lot of misunderstandings, the system should also inform users when something is wrong. Visual cues (like an animated light ring on Alexa) and nonverbal audio (like earcons) can help in communicating the system status.

The animated light ring in Alexa and dots in Google Home communicate the system status https://goo.gl/images/vpULzf

02. Match between system and the real world

The system should speak the users’ language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order.

For VUIs: Humans speak intuitively without even realizing the complexity of their language but machines need to account for this complexity. Voice systems should be trained to understand the basic rules of conversation. However, they shouldn’t raise user’s expectations by sounding too natural. It is better to design voice interfaces with genuine but limited cognitive, linguistic and behavioral abilities than one which gives the appearance of having the requisite human level abilities.

03. User control and freedom

Users often choose system functions by mistake and will need a clearly marked “emergency exit” to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.

For VUIs: Voice systems should allow the users to barge-in when they say something by mistake and wants to exit or initiate correction. In Alexa and Google Mini, users can exit the current dialogue anytime by saying “Stop”.

04. Consistency and standards

Users should not have to wonder whether different words, situations, or actions mean the same thing.

For VUIs: While having consistent vocabulary helps GUIs, users will start getting frustrated if VUIs repeat the same sentence every single time. This is the reason why the voice system should understand input that’s phrased in many alternate ways.

05. Error prevention

Even better than good error messages is a careful design which prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action.

For VUIs: To prevent errors, the voice system should confirm its hypothesis with the users before taking major decisions (like ordering something without confirming the order). This can be done in two ways:

Implicit confirmation: Letting the user know what was understood, but do not ask them to confirm. Ex. “Ok, setting an alarm for 4:00 am”

Three-tiered-confidence: Use the confidence score to decide whether the confirmation is needed. If confidence is high, don’t confirm but if it is low and above the reject threshold, confirm with the users, otherwise reject.

06. Recognition rather than recall

Minimize the user’s memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.

For VUIs: In GUIs, it is recommended to chunk information in the blocks of 7+_2. However, in VUIs, because of the ephemeral nature of speech, this value is 3+_1. To help focus the user’s attention on what is important, new information should be placed at or near the end of the sentence. This is called End Focus Principle. For example:

Remaining time in your timer is 4 minutes and 5 seconds (New information is at the end)
vs
You have 4 minutes and 5 seconds remaining in your timer

07. Flexibility and efficiency of use

Accelerators — unseen by the novice user — may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions.

For VUIs: GUIs have shortcuts (like advanced filters and keyboard shortcuts) for power users. VUIs can have shortcuts too like skipping a welcome message for someone who uses the system more than 5 times a day and avoiding unnecessary confirmations. Amazon introduced a brief mode for Alexa in which it speaks less and for some simple messages play a short sound instead of a voice response. To improve efficiency, Amazon also introduced a follow-up mode for Alexa which allows users to ask follow-up questions without saying the wake word. Error recovery can be handled differently for beginners and experts. For advanced users, rapid re-prompts (not providing the detailed instructions about what the user should say right away) might work better than escalating detail (providing examples of what the user should say).

08. Aesthetic and minimalist design

Dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility.

For VUIs: Paul Grice defined four basic rules of cooperative conversation:

Quality — Do not say that for which you lack adequate evidence
Quantity — Don’t be more or less informative than needed
Relevance — Only say things relevant to the topic
Manner — Be brief, get to the point, and avoid ambiguity and obscurity

Aesthetic and minimalist design in VUIs lies in the clarity as well as the brevity of the dialogues being delivered.

09. Help users recognize, diagnose, and recover from errors

Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.

For VUIs: To deal with recognition errors, voice systems should have robust error recovery strategies to handle them. They should be crystal clear as much as they can in informing the users what went wrong(this goes back to the heuristic 01, visibility of system status) and what they can do in order to keep moving forward in conversation. See my article on error handling strategies for VUIs.

10. Help and documentation

Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large.

For VUIs: If there are more than 3 errors in a row and error recovery strategies aren’t working in getting the user back on track, the system should give them “you can say/ask” messages. It can provide escalated help by informing users of the documentation they can refer to like intent examples in the description and sample interaction of the app.