Speech Interfaces like Alexa are trying to change the way we use language. They won’t succeed.

51xen2uyoyl-_sl1000_

Alexa, are you listening?

She is.

She’s listening so closely, in fact, that you can’t talk about her.

I was over at a good buddy’s house, and he has an Amazon Echo. I had the gall to ask: “How do you like Alex—“

“Shhh!” he interrupted me, darting his finger to his mouth.

“What?”

“You can’t say her name. The other day, she mistakenly ordered me some chocolate because we were talking about her.”

“We just call her ‘the robot,’” his wife added. “I’m uncomfortable with robots getting more advanced than her. In fact, I don’t like assigning her a gender. It.”

There’s a lot to unpackage here. First and foremost, at least in my friend’s house, Alexa is a presence. She’s a family member—the family member that nobody wants to talk about.

Let me phrase this another way. Alexa has impacted the language that my friends use at home. They can’t talk about her. She’s a new toy, and they can’t talk about her in the house. This is a linguistic user experience problem.

The problem stems from the trigger word: Alexa.

It’s a problem on two levels. First, it prevents people from being able to talk about Alexa. Second, it’s extremely unnatural from a linguistic perspective.

The Alexa Elephant in the Room Problem

Users want to talk about their toys. Amazon wants users to talk about their toy. Amazon has given her a name, an identity. But this same identity is actually serving as a pain point to users. Very few people like tiptoeing a subject in their own home, let alone a presence. Nobody wants to avoid a subject/person in their home, ESPECIALLY if it’s something they’re excited to talk about.

In essence, then, Amazon is presently shooting themselves in the foot.

Is this problem presently hurting Amazon sales? Sure doesn’t sound like it. But as customers become more aware of the power of Alexa (as well as other speech interfaces), the more sophisticated their expectations will become. The less, in other words, they will tolerate having to avoid talking about something in their own home.

This segues nicely into the second problem.

The Name Problem

On the surface, using the name “Alexa” as a trigger word makes sense. After all, when I want someone’s attention, I use their name.

Consequently, designers on the Amazon UX team are doubling down on this idea, encouraging all developers that integrate Alexa into their apps to use “Alexa” as a trigger word and create a consistent user experience across platforms. As quoted from a recent Wired article:

“That’s why Amazon is developing guidelines for third party developers. It already requires everyone to use the wake word ‘Alexa.’ It also encourages simple, explicit language in their commands.”

Developing a seamless user experience is, of course, a great idea. However, it comes at the cost of our natural linguistic experience. Beyond a few specific purposes, we simply don’t use people’s names very often. Think about it. How often do you really use the names of people around you?

Here’s a daily scenario. You’re sitting on the couch, watching a movie.

“Chuck, can you hand me the remote? Chuck, I can’t find anything to watch. Chuck, what do you suggest we watch? Chuck, can you grab me something while you’re in the kitchen? Chuck, is there any ice cream left?”

Nope.

In reality, the above conversation plays out more like this:

“Remote? Nothing on. Me, too? Ice cream?”

We use context, expectations, routine, and even intonation when engaging with people. Names? Not so much.

So, the problem is that speech interfaces like Alexa are actually encouraging us to change our fundamental conversational habits. This trend comes in a long line of platforms that are trying to change the way we use language—just think about the ridiculous queries that you type into google.

This battle, I predict, will not be won by machines.

Why? We are language experts. We love talking. Some people think that language is what separates humans from other animals. So, why we will certainly make certain concessions to make a new app work, we will unlikely change the way we speak to accommodate it—at least not for long.

In other words, after becoming professionals at not using people’s names when we talk to them, we will not likely decide that we like using names to talk to machines. It’s unnatural. It’s clunky. It’s a bad user experience.

Moreover, this fact directly contradicts the goals of Amazon’s user experience team. From the same Wired article:

“‘Our core goal is to make Alexa’s interactions with a customer seamless and easy,’ says Brian Kralyevich, vice president of Amazon’s user experience design for digital products. ‘A customer shouldn’t have to learn a new language or style of speaking in order to interact with her. They should be able to speak naturally, as they would to a human, and she should be able to answer.’”

If this is the case, and Amazon does not want people to have to adopt a new style of speaking, they’ve got to drop the name-calling as a trigger word.

A Possible Solution

A suggestion: Drop the trigger word ‘Alexa.’ From a marketing perspective, it’s brilliant. Everyone knows who—sorry—what Alexa is. However, from a functional language perspective, using a name as a trigger word is a terrible idea, creating both the Alexa Elephant in the Room problem and the Name Problem.

So, what should Amazon do instead?

People should choose their own trigger words. This solution is not dissimilar from creating an avatar when you start a video game, and it makes intuitive sense.

First, if people are already treating Alexa like a person in the home, it gives them some affection for that person. They choose how to activate it.

Second, this strategy would tap into people’s limitless linguistic creativity. For example, people could make trigger words like safe words. Pumpernickle. Wobblegones. This would at least solve the first problem.

Others might opt for discourse markers, like ‘dude’ or ‘yo.’ These might not solve both problems, but they’d be more natural.

Most importantly, I think this approach would essentially crowdsource the problem, and people would naturally arrive at a solution that worked best for them, and maybe even works best for other people. After all, that’s how language works. It’s creative. It’s adaptive. It’s fun.

It seems pretty clear that we are about to be surrounded by speech interfaces. Let’s start this conversation about how to integrate them into our basic linguistic habits, not how to adapt our habits to them.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s