Trying to get your voice heard? Well, you are not alone.
We’ve all had our moments with voice assistants, some freaky and some downright silly. ‘Importance of being heard’ has a whole new meaning nowadays! As if being ignored by our governments, families and friends was not enough, we are now treated the same even by our voice assistants! So hurtful. Psychologists might soon need to conduct sessions like:
‘Heal Your Soul - Listen and be heard by Siri’
‘Mend the communication gap with Alexa’
‘When everything is not OK with Google’
It’s a love-hate relationship with our dear voice assistants, we can’t live with or without them. But even with all our frustrations, the entertainment quotient that voice assistants (VAs) bring to our lives is undeniable.
We are being surrounded by VAs, be it Google Assistant, Siri or Alexa. In fact, if we start counting all the devices used by our family and friends, there are probably more VAs around us than humans. Everyone one wants to own them, talk to them and feel like they are part of some futuristic setting. But how often do we really use them and for what? I honestly don’t. ‘Voice Match’ (the perpetually listening mode) on my phone is always off since it drains the battery. It also becomes annoying when activated with anything even remotely matching ‘Hey Google’. I prefer to just tap the mic in case I need Google to hear me. In fact, mostly I just end up executing the task myself, rather than nurturing a relationship with The Voice. Even the Echo speaker in our office gets used only to play music. It usually doesn’t even understand the songs we want to hear, so we control the music ourselves via Bluetooth. Alexa might start feeling left out soon enough.
Never mind my world, the numbers show that more and more people are collecting voice assistants in one form or another! There is a supernova-like boom in the VA market. There are so many of them nowadays, with more on the way. To get an overall picture, we have divided the VAs as follows:
The boundaries of usage are blurred and really depend on the need of individuals, often leading to a collection of VAs for the home and on the move. Let’s look at some numbers to get a sense of the global rankings and their standing in the Indian market.
And then there are the smart speakers! While Amazon’s Alexa occupies the largest piece of the pie in the US market, the story is quite different for worldwide rankings. Google leads the smart speaker market worldwide and is consistently growing.
In a recent I/O 2018 developer conference, Google said that its Google Assistant has more than 500 million active users globally! As for India, Google Assistant usage has grown three-fold since the start of this year. In December 2017, Google disclosed that 28% of search queries in India are made via voice and Hindi voice search queries have grown at a rate of 400% year-on-year. According to a separate estimate by Statista, 39% of the Indian population will have a digital voice assistant by end of 2018.
The future looks bright for this industry, but here are some key questions that we should be asking:
Voice assistants in India are making progress but they still have a long way to go. There are 22 scheduled languages in India in multiple accents and dialects, while 90% of the current digital voice assistants only communicate in English, they alienate a majority, who either don’t know English or don’t speak it in an accent which the present set of voice assistants understand. As a result, people get frustrated and eventually give up (a situation similar to the one in the video below).
Another aspect that impacts the comfort level is the location of use. As per a Creative Strategies study conducted in the US, people prefer using VAs mostly in their cars or homes – more of a personal space.
There are numerous benefits of VAs, such as helping us navigate, controlling smart home devices or reciting recipes. However, the limited use of VAs in public spaces leads us to speculate about the discomforts that people have:
Safety & Security concerns: No one wants strangers to overhear their conversations, talking to a VA in a public space can make everyone privy to one’s personal appointments, places they want to go to, their interests etc. This could lead to immense safety and security concerns if the wrong people hear it.
Source of embarrassment: Getting tasks accomplished by VAs are enjoyable, but struggling to get a VA to carry out a task or even to simply understand what one is saying can often become as source of embarrassment. Most people avoid getting into a verbal battle with their VAs in a public space. This massively reduces the scope of interaction, especially when one is out or even at their workplace.
The biggest concerns around Artificial Intelligence (AI) are privacy and security. In the case of voice-activated VAs, which are always “listening” in order to get triggered by their wake-word like ‘Hey Google’ or ‘Alexa’, people are worried that their conversations are being monitored. Although, companies try to clarify that the recordings start only when the wake word is mentioned, there is apprehension about how much privacy their smart speakers really allow.
In these times of hacks, data breaches and cyber-attacks, it is easy to imagine the dangers. People are aware that providing complete access to any AI based service comes with its own set of risks. For example, VAs have permissions of contacting people, making purchases, controlling home devices – all this activated just by the user’s voice. Now researchers at China’s Baidu have confirmed that they have created a system that lets an AI mimic someone’s voice after analyzing less than a minute of their speech; which leaves us with a possibility that our voice (our only control) can also be created to instruct on our behalf.
Companies recognize this fear and are working towards providing solutions that help alleviate these concerns, like controlling recording history, voice purchasing, making devices self-sufficient so as to limit data exchange via internet etc. However, the dilemma that comes with this is that greater the restrictions, lesser is the functionality of the VA. It’s a difficult trade-off for users to make.
Another concern with AI-based voice assistants is their impact on child development. It is so easy for a child to use a wake-word and ask a VA anything or even command it to carry out a task. Early 2018, Amazon released a Kids Edition of the Amazon Echo Dot, which is designed to help children use its Alexa smart speaker. This edition includes safety features, special content and the requirement to say “please” after requests. But imagine having Alexa always available at their beck and call, will this actually help children learn manners or will it lead to social withdrawal and a sense of entitlement? The long-term effects that AI will have on children is an unexplored territory, it could be both positive and negative, but the children are clearly getting fond of it. Alexa was in fact the first word for a baby in UK, even before he said Mum or Dad, as was reported by the New York Post.
VAs without visual interfaces are discriminatory in the sense of being inaccessible to the speech-impaired population. Voice interface in general too has limited functionality – whether it is the inability to access photos and videos, view products or going through a long list. We would agree that even today, a picture speaks a thousand words. Our brains understand complex information better when we see it visually. As per studies, sight contributes the most in processing stimuli (83% sight, 11% hearing, 3% smell, 2% touch and 1% taste). We understand each other not just with words but also with nonverbal cues like body language, gestures and micro-expressions.
Thus, voice alone cannot provide the same experience when compared to one coupled with a visual interface. Visual display is what enhances the appeal and experience of using Amazon Echo Show or Google Home Hub vs. just smart speakers.
Voice assistants are evolving, they continue to learn with every use, this gives way to a host of possibilities. Here are some predictions for the future of voice assistants:
For the purpose of this study, to ensure each VA had a visual interface support, we focused on the following voice assistants:
Detailed UX Scorecard
This UX Scorecard ranks each VA across 9 variables and each variable is scored out of 5 points.
VA’s language, accent, tone, speed of speaking, choice of words, emoticons and intonations.
The ability to decode the user’s speech into words.
The ability to understand what the user is talking about, especially with respect to the user’s environment (culture, location etc.)
The quality of VA’s responses – whether voice only or accompanied by visual cues and interactive possibilities.
The ability to accomplish the tasks requested by the user.
The ability to explain what is done with the data and recordings to help ease the privacy concerns and allay the AI freak factor.
The ability of the VA to explain its skills and capabilities to the user
The ability of the VA to handle misrecognized speech or commands.
Possibility of multi-modal interactions with the VA – visual cues, option to type, suggestive buttons.
The interaction with Google Assistant [GA] seems more natural, especially with the use of emoticons, which are absent in interactions with Siri or Alexa. The responses from Google Assistant seem closer to what a person would say, consider the following examples:
Question: Will it rain in Delhi the day after tomorrow?Answers:
GA: No. It won’t rain on Saturday in Delhi. It’ll be foggy, with a high of 29 and a low of 14.
Alexa: No rain is expected in Delhi the day after tomorrow.
Siri: Checking the weather for Delhi; there is no rain in the forecast for New Delhi on Saturday.
Question: What do you think of [Siri/ Alexa/ Google Assistant]?Answers:
GA: It'd be nice if my home was as tall as Alexa, I'm not complaining though I like how cosy this is. / She seems funny.
Alexa: I’m partial to all AIs / I like all AIs
Siri: I think the acquisition of information and intelligence by human beings through virtual assistance is a very good thing. / I offer no resistance to helpful assistants.
You get a sense of GA’s “personality” – there is a clear answer or an opinion. Whereas, Alexa and Siri give a plain-Jane answer, which really does nothing to help users connect.
What makes human language so complex is understanding the context in which the speech is made. For example, ‘Do you have a mouse?’ could mean the animal or the device, depending on the surroundings or the conversation topic. The participants of the conversation would easily understand this, but this is the trickiest part for voice assistants.
For the questions asked in this research, Google Assistant performed exceptionally well in decoding the context. It understood and incorporated local and cultural references. To elaborate:
Besides, just understanding local contexts, Google Assistant carried out simple tasks efficiently, decoding even complicated speech to correctly interpret the requirement.
User: Could you remind me to pick up my laundry?
GA: When do you want to be reminded?
User: 5pm, day after tomorrow
GA: Sorry, when or where do you want to be reminded?
GA: [saves reminder as 5pm tomorrow] Do you want to save this?
User: No, I want to be reminded day after tomorrow.
GA: [changes date] Ok, do you want to save this?
Google Assistant app allows three ways of interacting with the system at all times:
This flexibility helps improve the user experience and accessibility of Google Assistant. The user is not limited to only voice interaction.