A Multimodal approach is the answer to voice.

Voice services have had a massive growth over the last few years, with a quicker adoption curve than even mobile experienced years earlier
10 minute read
By Gary Jobe

Voice services have had a massive growth over the last few years, with a quicker adoption curve than even mobile experienced years earlier. The growth of voice is not only based on the adoption of smart speakers, such as Amazons Alexa, but voice embedded in many devices from cars to homes and cities. Human interaction with voice services has driven the need for clearly defined models to handle the user interactions, this is where conversation design comes into play.

Conversation design is the principle of designing experiences without the traditional means of a keyboard, mouse or graphic user interface. So how do we create experiences using voice? Conversations need to flow, be human like, respond in the way users expect and ultimately can’t have errors. Human to human interactions don’t have errors so neither should peoples voice interactions with a system.

The term conversation is normally understood as the spoken word; however, it has wider scope than that and should be seen as Multimodal. Multimodal refers to the flow of conversation and the underlying logic of a system, multichannel services and human interaction experience.

However Voice alone is a short term approach but won’t be sustainable longer term, a Multimodal experience is the ideal product to aim for, allowing a robust user focused experience that has many functions and is as useful as possible.

So, an ideal multimodal approach, could start with voice conversation on Alexa but hand off to a visual interface for say a mapping service or deliver a document via email based on a user request, the true power of multimodal experiences is embracing the wider ecosystem.

We are now developing multimodal experiences for clients, however I’d love to say we started a few years back with a fully defined model, but with all new processes it takes a few failures and pivots in approach to get to this stage in our journey.

How we got to Conversational design

We started building voice services a few years back before smartspeakers had even hit the UK market, sourcing the first Alexa device from the US. We used it in its most simplistic form for client prototypes and pitches, most executions triggered responses on keywords, opposed to training any machine model. They were very singular in their responses and at the time were siloed from wider ecosystems.

However what we have seen since those early days is a shift away from what we’d see as bottom up approaches with technology as the first step, to more of a top down approach. We are now centring our approach around the User Experience and the need, integrating with our clients wider eco-systems. In some instances, we have started with developing content hubs that we can not only easily abstract the content for a voice service but the clients wider ecosystem can benefit be that their customer services, web or email marketing. Having this wider access to the eco-system enables us to deliver true purpose driven multimodal solutions.

Our conversation design approach allows us to map out what users can do in the space, whilst understanding the user needs and the technical constraints of the platform. As we move deeper into a project using this methodology, we can guide and develop the conversation, defining the flow and the logic developing a robust user experience. This approach can and in most cases employs other aspects of the wider clients ecosystems, making the solution the best it can be. The conversational design model encompasses copywriting, user interaction, audio design and development skillsets to get to the desired solution.

What also becomes vital is analytics and measurement. The beauty of any conversational model is we have a real-time user feedback loop. We can see the ways users are interacting with the service, identifying the needs, using this to redefine the product to make it better and more useful moving forward.

The future.

Conversation design has helped us with developing and designing better products for our clients. As voice service matures, simple interactions become less useful, gimmicky experiences have no useful purpose. Rich engaging services that have purpose and fit with a company’s wider eco-system are now the norm. True Multi-model experiences handing off content to the right interface at the right time will only enhance product development.

Voice platforms provided by Google, Microsoft and Amazon mature, with addition features added frequently the user experience becomes more vital. The market will mature with synthetic voice offerings, voice SEO and embedded voice becoming more common across smart homes and cities.


Conversation design is about teaching computers to be more human like in their conversation and conventions. This is easier said than done, but following the principles of can help:

Do what humans do: Conversations with systems should feel natural, frictionless and above all intuitive.

Adapt the technical constraints: We want to replicate human to human behaviour but computer systems aren’t human so errors will occur, we need to work around this and other constraints such as invocation wake words ie Ok google, Alexa*.

Make the most of the technical strengths: Computer systems can be more powerful than humans as they never tired of questions. They can also easily find and share information and can exceed expectations.

As we embrace voice devices the relationship between humans and bots will develop, and for that to be successful, they need to learn how to communicate with each other. That’s where the Conversation Design model will show its true value and purpose.

**These may well be phased out over the next 12 months. If so it will make SEO more critical.

Gary Jobe
Head of Technology and Innovation, ekino.