Privacy in voice-based user interfaces: introductions to guest users

Tom Bäckström
UX Collective
Published in
4 min readOct 22, 2019

--

Photo by Chris Liverani on Unsplash

Voice assistants in smart devices are sometimes referred to as personal digital assistants. This highlights a feature of their design philosophy: they are personal, that is, they were initially envisioned as single-user services. Recently however, smart devices have increasingly started to include also multi-user devices such as smart speakers and TVs. In terms of privacy, this presents a significant problem.

The first stumbling block is the EULA. The owner/manager of the device clicks “Ok”when installing the device. But what about the other users? Even if it would be a legally passable contract, in terms of users it certainly is not acceptable. For example, suppose a friend of yours has bought a smart TV and you go for a visit; Would your friend then require you to sign the EULA before entering the house? If not, then the service provider is potentially storing and analysing your voice, your opinions and your behaviour, without your knowledge and without your consent. Your friend is thus essentially giving away your voice to the smart TV manufacturer, without asking for your permission.

Photo by Christopher Campbell on Unsplash

The above description however remains vague and abstract. To get an intuitive response to the situation, we can make a mental exercise, where we replace the device with a person, whom I will call “Greg”. So suppose that you arrive at your friends home, and there, in his living room, Greg is standing silently at the wall. Your friend pays no attention to Greg and behaves as if the two of us were alone in the living room.

How would you react to such a situation? A silent third person sitting there, observing? I certainly would feel uncomfortable. Why is he there? Why is he taking notes of our discussion? What is he going to do with the notes? It seems very odd that my friend did not introduce me to Greg once we arrived, doesn’t it?

The same approach can thus be used to find a solution, to design a feature to the user-interface of smart devices such that the awkwardness is resolved. In other words, in the scenario where the device is replaced by Greg, how should Greg behave to minimize awkwardness? Well, first of all, he should introduce himself when I arrive, obviously. “Hey, my name is Greg, I’ll be here in the corner. Let me know if I can help you.” In the same way, a smart device should introduce itself when it observes that a new person has arrived. “Hey, I am a personal digital assistant. Let me know if I can help you.” Much better already.

In other words, if smart devices would behave like nice people, they would introduce themselves. They would offer a metaphorical handshake.

This does however raise an obvious question; wouldn’t it be very impractical and obtrusive if all voice-operated devices would introduce themselves once you step into your friends house. “Hey, I’m Greg, I’m the smart TV over here. Hey, I’m Sophie, I’m the computer over here. Hey I’m James, I’m the microwave in the kitchen..” We would be overwhelmed with the amount of introductions. That is a concern, but I think that instead of a user-interface problem, it highlights a problem of privacy. Currently you are not aware that your privacy is compromised. All voice user-interfaces are potential eavesdroppers. If we would require them to introduce themselves, it would highlight this problem. We should not hide this problem, by not-making-introductions, but instead, we should fix the underlying problem.

The underlying problem is that we have many independent voice interfaces, which are not aware of each other. In addition to adding introductions, the problem we should fix is collaboration between voice user-interfaces.

Photo by Adam Solomon on Unsplash

In conclusion, I think that current implementation of EULAs in smart devices with voice user-interfaces are non-intuitive. They do not take into account multi-user scenarios. In fact, though I am not a legal expert, my guess is that some devices might be illegal. A user-interface solution would be to implement a metaphorical handshake “Hello, I’m device X”, every time the devices notices an unknown user in the vicinity. To avoid overflowing the users with introductions, a practical implementation of this would however also require that if there are multiple devices are present, they have to be aware of each other.

--

--

An excited researcher of life and everything. Associate Professor in Speech and Language Technology at Aalto University, Finland.