Voice interaction design – Designing human conversations

Voice interaction and user interfaces have become extremely popular. The number of devices with a voice component has been growing rapidly over the past few years. A recent estimate puts the figures of voice driven devices that will be shipped this year to approximately 25 million, four times what was delivered last year.

It’s evident that speech user interfaces are here to stay and they are fast becoming mainstream, especially with the growing number of smart home and car devices such as thermostats, lights, kettles and many others.

Advancements in voice recognition and smart technology have taken voice interaction to another level and their usefulness has increased. This means they are now a viable platform for UX designers to start to explore and exploit.

The three layers of voice interaction design

There are three layers that interact together to enable voice driven communications and interfaces. They are:

  • the voice app, which in this case can be represented by Amazon Skills or the action from Google
  • the artificial intelligence (AI) such as Alexa, Siri or Microsoft’s Cortana
  • the device – such as an Amazon Echo or Google Home

A voice driven device is constantly in listening mode, waiting to be prompted with a wake up word for it to jump into action.

After activation the audio input from the user is sent to the artificial intelligence platform.

The AI platform in combination with speech recognition and a language processing component decipher the intention and send a message to the supporting application.

The application processes the command and responds through visual or text. The message is converted to speech mode and played through the device.

Understanding the voice interaction

Amazon Echo Dot

The first step involves asking yourself what value you want to provide to users through creating a voice interface. You need to consider the reason why people use voice apps and in what context they find them useful.

Voice interfaces can be easier to use in certain contexts such as at home whilst you have dirty hands from cooking or while in the car where your attention is rightly focussed on other things.

The aim here is to identify user needs that are met better, or more safely through voice interaction than a traditional graphical user interface.

This part of the design process for voice interfaces is not massively different from a usual discovery phase in any UX project. You need to understand what your users are trying to achieve, what they need along the way to achieve it and what channel or solution might help them achieve it best.

Personality and tone of voice

Apple Siri

When designing the visual elements for mobile and web interfaces we have many elements at our disposal to help us in showing personality. These can include colour palettes, typography, imagery, style guides and so on. However, when it comes to a voice interface we have far fewer choices.

Personality has to come from tone, voice and the content of the communication. Depending on what voice platform you are designing for we can be constrained since we sometimes do not have a choice in how the voice content is delivered or sounds.

For example, if designing for the Amazon Echo* or Echo Dot* then we will be constrained to using Alexa’s voice. The same for Google Home and it’s overly upbeat character or Apple’s new HomePod with the sometimes sarcastic Siri.

To work around this we need to invest design time around toning and wording to convey the personality that we need. Luckily much of the ground work has likely already been done.

There are usually brand and tone of voice guidelines designed for text that we can use to leverage our voice user interface. This will help to maintain consistency across multiple channels.

Designing the conversation

The third step in designing a voice user interface is designing the conversation flow.

First, let’s review what we are trying to replicate in the digital world:

(a) talk between two or more people in which thoughts, feelings, and ideas are expressed, questions are asked and answered, or news and information is exchanged
– Cambridge dictionary

Until our voice interfaces develop further, currently they are better suited for the latter part of that definition: questions are asked and answered or news and information is exchanged.

It’s important to understand that your voice interaction UX can only support the capabilities that you have defined. But users are not limited as to what they may ask in terms of the content, format and phrasing.

To get around this we need to properly explore the conversation flow so that we can direct the voice interaction to help guide the user towards what the interface can do.

For each of the identified capabilities, you will need to develop and map out conversational dialogues which are clear and could take several different paths. You can use standard user journey techniques as part of this, understanding the key decision points and divergent paths a user may take to reach their goal.

The conversation flow

Human Conversation

So how do you come up with a list of questions a user might ask?

Start by putting up the end goal or need. We are going to take the example of a voice interface for a thermostat. Let’s look at the user need of “I want to be warmer”.

Grab a bunch of post-its with your team and spend 5 minutes writing out different ways you could convey that intent. Some examples might be:

  • “I am cold”
  • “I want to turn the heating up”
  • “Heating up”
  • “Make it warmer”
  • “It’s too cold in here”

All of these are different potential ways the user might want to interact with your voice user interface. Next, work out how your interface would respond to each of those.

we are narrowing and focussing the conversation from a vague statement of need, into something actionable which we can deal with

For example: “I want to turn the heating up” might be met with “What would you like to set the temperature to?”. Here we are narrowing and focussing the conversation from a vague statement of need, into something actionable which we can deal with.

The content should start with what is referred to as a happy path – this is a conversation flow that voice interfaces can engage in with minimal or no errors.

This should be followed with details where the user may not give full information required to complete the action or where the system may not completely understand the responses.

Where the interface cannot deal with the users question or input it is important to capture this so that you can start to learn more about how users want to interact with your interface.

If lots of people are asking the same thing that you haven’t yet designed an interaction for it’s a good candidate to start to build into your voice interaction design in future.

It’s important to read the conversation loudly with real users so that you can ascertain that there will be a natural flow and get more input as to how people want to communicate with your interface.

As you determine those dialogues bear in mind the tone and voice characteristics.

These tips should help you get started with designing voice interfaces. This is a fast developing field and techniques will evolve over time as new technologies such as machine learning come into play. As always, test regularly with your users, this is an interaction like any other and you should design for needs first.

*We make a small commission if you purchase via these links