Apps That ‘Speak’: Mobile Voice Technology To Replace Typing?

By | November 11, 2016

On average, less than 1 out of every 3 smartphone users worldwide were familiar with mobile digital assistants (like Google Now and Siri) in 2013. Fast forward two years and a bit, and that figure is currently nudging up towards the 70% mark. The overall adoption of voice technology on mobile phones is on an upswing – jumping up by a remarkable 700%+ since 2010. Rather predictably, voice-enabled apps are rapidly growing in popularity as well. In today’s discussion, we will look forward to how voice technology is molding the way in which average users interact with mobile applications:

  1. The speed factor

    Typing on a mobile phone – no matter how good a touchscreen keyboard is – takes time. There is the small matter of keyboards hogging a section of the screen real estate as well. Speech-based software offers a way out of this problem. A recent research experiment revealed that, voice-based typing was around 3x times quicker than typing on keyboard (the experiment involved typing 100-odd common English phrases). Even for Mandarin Chinese, voice commands were found to be nearly 2.7 times faster than traditional typing. What’s more – the voice typing was found to be more accurate than using keyboards (20% more accurate for English; a whopping 63% more accurate for Mandarin Chinese). Clearly, for faster and more accurate typing – mobile voice technology is the way to go.

Note: The speech recognition software from Baidu was used for this experiment.

  1. Excellent for apps that require user inputs

    Apart from being quicker, speech recognition technology saves time for users in yet another way. Over the next couple of years or so, people will make a definite move towards using voice technology for interacting with applications that require user inputs (fitness trackers, diet apps, etc.). Instead of having to manually type the details of each meal taken and every activity/workout session, a person will be able to simply ‘tell’ his/her smartphone to record the information within the concerned application. Time-constraints will cease to be an excuse for the non-regular updation and usage of such apps.

  1. Size matters

    Yes, Apple has upped the smartphone screen standards with iPhone 6, and Android has its own slew of ‘large-screen’ phones (from various manufacturers). However, the fact remains that keyboard size is small, the clickable items are smaller, and it can be a real pain to correctly type out something quickly. The issue become more complicated when a user multitasks – say, tries to compose an email, write a text message, and search for something on the mobile web simultaneously. With voice-enabled applications, the need to use keyboards will go down, and multitasking on a smartphone (for the latest handsets, multitasking is one of the biggest features) will become that much easier.

Note: The technology is not quite there yet, for making mobile multitasking easier for users with disabilities. In the foreseeable future though, there might be breakthroughs in this regard.

  1. Collaboration with other technologies

    Voice-enabled apps – or mobile voice technology in general – cannot survive on its own. For the technology to be of any practical use, due emphasis has to be placed on implementing efficient Natural Language Understanding/Processing (NLU/NLP) – a key cog in making voice recognition on smartphones more advanced. Apart from speech recognition software, the spotlight will also be on devices with accurate and user-friendly text-to-speech (TTS) and speech-to-text (STT) speech synthesizer services. With these, smooth and mutually understandable two-way communication (between the random Joe and his smartphone) will be facilitated.

  1. Typing on the go made easy

    One of the biggest advantages of voice enabled chat apps or social media applications. Instead of engaging your hands to type on a chat app (or for posting a FB update) – voice-enabled technology will allow you to just ‘converse’ with your phone for the same purposes. In note-taking apps, typing out entire paras is already possible, while people have already started writing full-blown articles and blog posts (both on mobiles as well as on PCs) with hands-free speech recognition technology. Admittedly, writing with voice technology does lead to a fair number of typos – but expect such rough edges to be ironed out soon enough.

Note: The recent improvements in Siri have made iPhones much better, more intelligent ‘listeners’ to users. If anything, the built-in speech recognition tools in Android phones are even more user-friendly.

  1. Models for speech technology deployment for apps

    There are two different models for speech deployment in mobile applications – and both are expected to be heavily used by app developers in the forthcoming quarters. First up, there is the ‘embedded voice tech model’, where the complete voice recognition process happens within the smartphone (i.e., locally). The other is the ‘cloud-driven voice tech model’. To use this technique, apps need strong internet-connectivity at all times…since the speech-to-text transcription (and the reverse) happens on the cloud. The second is the more common deployment model at present, but embedded technologies are also on the fast track of growth.

  1. Voice typing to become more contextual

    And that would, hopefully, minimize the chances of punctuation errors, typos and other such irritating factors. Mobile digital assistants like Siri and Google Now already provide a high degree of context awareness – making it easy for users to ‘speak’ to their phones without worrying too much about spelling mistakes and grammatical errors. For instance, even if the digital assistant listens to something like ‘Text Ralphet 6 pm’, when you have set a reminder for ‘Text Ralph at 6 pm’ – it will be ‘intelligent’ enough to understand that only the latter command makes sense, and would make the corrections accordingly. People searching for the ‘next Knicks game’ can follow up with a query like ‘What’s their points tally?’. Voice technology is smart enough to ‘know’ that over here, ‘their’ refers to the ‘Knicks’. A contextual voice interface will add an extra layer of user-friendliness to mobile apps.

Note: Understanding and distinguishing between homonyms (say, ‘pole’ as in North Pole, and ‘pole’ as in ‘pole vault’) remains a challenge for voice-enabled apps. Once again, as the technology becomes more sophisticated, this will gradually cease to be an issue.

  1. Adoption in wearable technology

    A missing link in the usage of speech recognition technology has been its absence (to be fair, inadequate presence) on smart wearables. Going forward, as Apple Watch and Pebble smartwatches and Samsung Gear (maybe Google Glass too, with a strong comeback) become more commonplace – there will be a high demand with custom voice-enabled applications for these devices. A smartwatch, for instance, has very little screen space – and having to actually tap on it can seriously mess up the overall user-experience design (UX) on it. With voice technology, people will be able to interact with their wearable devices at any time, without obscuring their display screens in any way.

Note: The onus will be on developers to come up with such apps for smartwatches, that blend in gesture commands with voice commands.

    9. The security issue

There are many apps that store sensitive personal information (e.g., any personal finance application or mobile budget tracker). These need an additional level of security, and voice biometrics – an integral feature of voice technology on mobile phones – delivers just that. For enterprise apps too, the use of voice biometrics is on a definite upward trend. The percentage of apps that store user-data on the cloud (with the help of backend-as-a-service (BaaS)) is increasing swiftly, and biometrics is an effective tool to rule out possibilities of unauthorized access.

    10. Not a one-shot game

Many software and mobile app developers make the mistake of trying to incorporate speech recognition in their applications in one go. Instead, they should start off with identifying the core features of a new app (and the likely behaviour-flow of users on it), and voice-enable that portion of the application first. The rest of the app can be voice-enabled later, in an iterative manner (say, in a later update). In a fairly large app, if everything is voice-based from the outset – a large section of target users might become confused.

Note: For most apps, speech recognition currently works optimally only when the supported vocabulary is somewhat limited. This constraint should also go away with time, as mobile voice technology makes more progress.

    11. Voice technology and IoT

Over the next five years or so, voice technology and Internet of Things (IoT) will become increasingly intertwined with each other. The experience of using seamless voice-enabled mobile apps will give rise to expectations of operating on a similar autonomous interface, within automobiles. In ‘smart cars’, most functions – right from changing music tracks and surrounding temperature, to door and window opening/closure – will be done with the help of voice commands to the dashboard. Car APIs will also make apps for smart cars more usable than ever. Voice technology will totally revolutionize the concept of what a ‘connected home’ can do too, in the next few years. Forget switching off/on lights and thermostats with voice, you might be able to order food or a warm bubble bath, simply by giving out voice instructions. Now that will be something, right?

    12. Cloud-based mobile voice to become real-time

In order to ensure that implementation of voice technology does not put additional pressure on device resources (battery drain, for instance), many developers prefer the cloud deployment model. In essence, this means that voice commands to an app are transferred to the backend server for the speech-to-text conversion. The responses are generated and displayed to users after that. The entire process takes some time (in seconds, obviously) – and that introduces a lag between a voice command and a response. Network latency is a big factor in determining the extent of this lag, while the quality of the phone mic and the strength of data/wifi connectivity also make a difference. In future, typing and performing other regular tasks on cloud-based voice-enabled apps will become a more ‘real-time’ process. Lags will continue to grow shorter, as network capabilities become more powerful.

    13. Artificial intelligence on the rise

Voice-enabled smartphone applications will typically be reliant on advanced ‘artificial intelligence’ – to deliver the optimal results to users. With pre-designed algorithm sets, futuristic applications will take machine learning to an altogether higher level. Apart from ensuring greater accuracy during voice-typing and greater reliability of speech recognition, A.I. will also drive up effective two-way conversational interaction with apps, by brushing up ‘contextual thinking’ capabilities of mobile software. For practically every element in the Internet of Things, voice technology adoption is growing – and proliferation of artificial intelligence has got a lot to do with that.

Note: Virtual and augmented reality are also likely to become core elements in select voice-enabled mobile apps. Already, new-age smart devices like music players and TVs, lights and smart helmets are available in the market – and most of them will be controllable by voice within the next 3-4 years.

    14. The big players

Voice technology has come a long way since the days of Nuance’s ‘Dragon Naturally Speaking’ (DNS). There have been major improvements in Siri – the smart digital assistant for iPhones – this year (Apple had introduced the ‘Hey Siri’ feature in iOS 9). Siri already supports 20-odd languages, and the number is expected to grow higher in future. Android’s ‘Google Now’ is, if anything, a more high-utility voice-based mobile digital assistant – with its ‘OK Google’ feature (it also supports Google Maps now) being regularly used by people across the world. On Windows Phone, there is the ‘Microsoft Cortana’ tool. Each of the 3 digital assistants are becoming more sophisticated over time, making it easier for voice-supported apps to expand their functionalities with them.

The integration of voice technology in mobile apps is closely related to the presence (and performance) of high end sensors on mobile devices. While speech recognition is indeed a relatively ‘new’ technology – a fairly large cross-section of older users are using it for typing and other form of app interactions. Right from launching an app, to navigating through it – mobile voice technology makes everything hands-free, and hence, more convenient.

By 2017, voice recognition will be a buzzing $133 billion industry, with a CAGR in excess of 20%. Use in mobile devices and software applications has propelled the growth of this technology in a big way. In the coming years, speech recognition has all the potential to completely change the way we interact with our phones.

Maybe, just maybe, typing on keyboards is on its way out?


Hussain Fakhruddin
Follow me

Hussain Fakhruddin

Hussain Fakhruddin is the founder/CEO of Teknowledge mobile apps company. He heads a large team of app developers, and has overseen the creation of nearly 600 applications. Apart from app development, his interests include reading, traveling and online blogging.
Hussain Fakhruddin
Follow me

Latest posts by Hussain Fakhruddin (see all)


Leave a Reply

Your email address will not be published. Required fields are marked *