Voice User Interfaces (VUIs)

Check out this post on Voice User Interfaces that I wrote for the University of Basel blog Sci Five.

View story at Medium.com


Speech Interfaces like Alexa are trying to change the way we use language. They won’t succeed.


Alexa, are you listening?

She is.

She’s listening so closely, in fact, that you can’t talk about her.

I was over at a good buddy’s house, and he has an Amazon Echo. I had the gall to ask: “How do you like Alex—“

“Shhh!” he interrupted me, darting his finger to his mouth.


“You can’t say her name. The other day, she mistakenly ordered me some chocolate because we were talking about her.”

“We just call her ‘the robot,’” his wife added. “I’m uncomfortable with robots getting more advanced than her. In fact, I don’t like assigning her a gender. It.”

There’s a lot to unpackage here. First and foremost, at least in my friend’s house, Alexa is a presence. She’s a family member—the family member that nobody wants to talk about.

Let me phrase this another way. Alexa has impacted the language that my friends use at home. They can’t talk about her. She’s a new toy, and they can’t talk about her in the house. This is a linguistic user experience problem.

The problem stems from the trigger word: Alexa.

It’s a problem on two levels. First, it prevents people from being able to talk about Alexa. Second, it’s extremely unnatural from a linguistic perspective.

The Alexa Elephant in the Room Problem

Users want to talk about their toys. Amazon wants users to talk about their toy. Amazon has given her a name, an identity. But this same identity is actually serving as a pain point to users. Very few people like tiptoeing a subject in their own home, let alone a presence. Nobody wants to avoid a subject/person in their home, ESPECIALLY if it’s something they’re excited to talk about.

In essence, then, Amazon is presently shooting themselves in the foot.

Is this problem presently hurting Amazon sales? Sure doesn’t sound like it. But as customers become more aware of the power of Alexa (as well as other speech interfaces), the more sophisticated their expectations will become. The less, in other words, they will tolerate having to avoid talking about something in their own home.

This segues nicely into the second problem.

The Name Problem

On the surface, using the name “Alexa” as a trigger word makes sense. After all, when I want someone’s attention, I use their name.

Consequently, designers on the Amazon UX team are doubling down on this idea, encouraging all developers that integrate Alexa into their apps to use “Alexa” as a trigger word and create a consistent user experience across platforms. As quoted from a recent Wired article:

“That’s why Amazon is developing guidelines for third party developers. It already requires everyone to use the wake word ‘Alexa.’ It also encourages simple, explicit language in their commands.”

Developing a seamless user experience is, of course, a great idea. However, it comes at the cost of our natural linguistic experience. Beyond a few specific purposes, we simply don’t use people’s names very often. Think about it. How often do you really use the names of people around you?

Here’s a daily scenario. You’re sitting on the couch, watching a movie.

“Chuck, can you hand me the remote? Chuck, I can’t find anything to watch. Chuck, what do you suggest we watch? Chuck, can you grab me something while you’re in the kitchen? Chuck, is there any ice cream left?”


In reality, the above conversation plays out more like this:

“Remote? Nothing on. Me, too? Ice cream?”

We use context, expectations, routine, and even intonation when engaging with people. Names? Not so much.

So, the problem is that speech interfaces like Alexa are actually encouraging us to change our fundamental conversational habits. This trend comes in a long line of platforms that are trying to change the way we use language—just think about the ridiculous queries that you type into google.

This battle, I predict, will not be won by machines.

Why? We are language experts. We love talking. Some people think that language is what separates humans from other animals. So, why we will certainly make certain concessions to make a new app work, we will unlikely change the way we speak to accommodate it—at least not for long.

In other words, after becoming professionals at not using people’s names when we talk to them, we will not likely decide that we like using names to talk to machines. It’s unnatural. It’s clunky. It’s a bad user experience.

Moreover, this fact directly contradicts the goals of Amazon’s user experience team. From the same Wired article:

“‘Our core goal is to make Alexa’s interactions with a customer seamless and easy,’ says Brian Kralyevich, vice president of Amazon’s user experience design for digital products. ‘A customer shouldn’t have to learn a new language or style of speaking in order to interact with her. They should be able to speak naturally, as they would to a human, and she should be able to answer.’”

If this is the case, and Amazon does not want people to have to adopt a new style of speaking, they’ve got to drop the name-calling as a trigger word.

A Possible Solution

A suggestion: Drop the trigger word ‘Alexa.’ From a marketing perspective, it’s brilliant. Everyone knows who—sorry—what Alexa is. However, from a functional language perspective, using a name as a trigger word is a terrible idea, creating both the Alexa Elephant in the Room problem and the Name Problem.

So, what should Amazon do instead?

People should choose their own trigger words. This solution is not dissimilar from creating an avatar when you start a video game, and it makes intuitive sense.

First, if people are already treating Alexa like a person in the home, it gives them some affection for that person. They choose how to activate it.

Second, this strategy would tap into people’s limitless linguistic creativity. For example, people could make trigger words like safe words. Pumpernickle. Wobblegones. This would at least solve the first problem.

Others might opt for discourse markers, like ‘dude’ or ‘yo.’ These might not solve both problems, but they’d be more natural.

Most importantly, I think this approach would essentially crowdsource the problem, and people would naturally arrive at a solution that worked best for them, and maybe even works best for other people. After all, that’s how language works. It’s creative. It’s adaptive. It’s fun.

It seems pretty clear that we are about to be surrounded by speech interfaces. Let’s start this conversation about how to integrate them into our basic linguistic habits, not how to adapt our habits to them.

With the Rise of Bots, Linguists are Critical to User Experience


A robot lied to me.

And like T-100, it was relentless and had no remorse.

I am presently at my parents-in-law’s house, and they have an older printer that they want to connect to their iPad. They also asked me for help with their VCR—you didn’t misread that, it’s an actual VCR—but one step at a time.

I googled options, and a particularly helpful site popped up, but it will remain nameless. Since millennials lose interest with websites that don’t immediately provide assistance, a chat window quickly popped up in the right hand corner of the window.

A pleasant picture of a helpful man. 6113 helped customers. An impressive background helping people in my exact situation.

Okay, I thought. Finding an answer to this problem is going to be a pain. So I’ll bite.

Him: Welcome! What’s going on with your Apple device?

Me: I need to hook up my iPad to a Pixma MP6000R.

Him: How old is your system?

Me: The iPad is 2014 and I’m not sure about the printer.

Him: Could you estimate how old the system is?

Wait. Something’s wrong here. And how is this guy typing so fast?

Me: 2012 maybe. Not sure.

Him: How long have you been having an issue with the system?

Okay. We’re done here.

The Problem

Was I naive to think that I was chatting with a real person? Probably. Is it wrong to use bots in this circumstance? Could we improve the user’s experience? Well, let’s talk about it.

First, It’s okay to use a bot. But don’t tell me it’s a real person. That’s lying. You just violated my trust as a user, and I’m no longer going to use your service. Ever.

Lean into it. Own it. Tell me it’s a robot. Have it bloop and bleep. Give it an animation like IBM’s Watson. I’m not afraid of robots. Okay. I’m a little afraid of robots.

Watson isn’t trying to convince me it’s human.

Second, let’s talk some Linguistics, shall we?

Conversation, as it turns out, is a big deal. There are a lot—A LOT—of unspoken rules that we learn about conversations in our native language. Granted, a chat is not a spoken conversation. But chats, too, come with an implicit set of conventions, largely based on speech patterns, that native speakers simply take for granted.

I don’t mean to go all Turing-testy here (trademark: Turing-testy), but let’s say (to appease me) that we show that the user is communicating with a bot. In order to make the exchange user-friendly, there are some basic principles that bots need to take into account:

Pauses. There’s a huge literature on the importance of pauses in conversation. This applies to chatting, as well. If you shoot language at me faster than a reasonable person can type, it feels fundamentally wrong. Use pauses to your advantage—don’t assume you need to get the message to me as quickly as possible.

Anaphora. Once humans mention a noun the first time in a conversation, they tend to stop mentioning that noun. In the conversation above, a human wouldn’t repeat the word ‘system.’ A human would say ‘it’ or introduce a new noun. This is a difficult problem in Natural Language Processing, but it’s an extremely important one.

Taking pauses and anaphora into account would go a long way to improving the user experience of this particular user, and I might even use this service again. However, between lying to me and putting me through a very uncomfortable interaction, we’re done here.

The Takeaway

User Experience is more than visual design. It’s about humans’ interactions with everyday products. Since bots are becoming an ever-important part of that experience—maybe even in influencing our votes—computer scientists and designers need to start looking to linguists to help resolve the problems that are simultaneously cropping up.

After all, we are creatures of conversation. Let’s not forget that.