Voice control is not the future so many people imagine — the Amazon Echo-like one where we no longer need buttons or apps because we can speak to our devices whenever we need them, the one where technology will understand our subtleties, our preferences, our desires, and take action based upon our casual voice commands. It’s a bit more complex than that in reality. Here’s what we need to think about.
Mark Zuckerberg, the founder of Facebook, recently put together his own home AI system he called “Jarvis”. In his post about his findings after a year, he clearly noticed something that so many people tend to miss when talking about voice recognition tech — it is not always appropriate to use voice commands to control your tech.
“One thing that surprised me about my communication with Jarvis is that when I have the choice of either speaking or texting, I text much more than I would have expected. This is for a number of reasons, but mostly it feels less disturbing to people around me.” — Mark Zuckerberg (emphasis mine)
“If I’m doing something that relates to them, like playing music for all of us, then speaking feels fine, but most of the time text feels more appropriate. Similarly, when Jarvis communicates with me, I’d much rather receive that over text message than voice. That’s because voice can be disruptive and text gives you more control of when you want to look at it. Even when I speak to Jarvis, if I’m using my phone, I often prefer it to text or display its response.”
I’ve had a similar experience with my Amazon Echo and smart light in my bedroom. When all of my family are sound asleep and I want to turn on my light, saying “Alexa, turn on my light” has a few drawbacks:
So in the end — late at night, I get out my phone and press the button on the LIFX Android widget that’s on my phone. The ability to still use my phone as a backup way to provide commands is hugely important. There is definitely huge potential for voice commands in our Internet of Things, home automation and artificial intelligence systems of the future, but we need to ensure we develop these systems with alternatives to voice commands in place as well.
If Mark’s Jarvis-assisted home peaks your interest, here’s a video:
Mark mentions text messaging AI as an alternative way to issue it commands, as it allows you to have a back and forth conversation while keeping it quiet and personal. He also mentions an important point — people aren’t moving towards making more phone calls and communicating over voice more frequently… they’re turning to texting and online messaging much more regularly. So getting them to switch habits back to speaking at their devices might not be the way people want to work:
“This preference for text communication over voice communication fits a pattern we’re seeing with Messenger and WhatsApp overall, where the volume of text messaging around the world is growing much faster than the volume of voice communication. This suggests that future AI products cannot be solely focused on voice and will need a private messaging interface as well.” — Mark Zuckerberg
I’ve got a bit of an issue with the idea of my AI being solely controlled by voice and chatbot — is typing out “Alexa, turn on my light” any more convenient than just flicking a physical switch? I think we will need three control aspects in our systems going into the future:
The visual buttons/gestures are important for those common, quick actions — like turning on a light really late at night. The buttons could be provided in any number of ways, they could be a widget on an Android phone or a button in a mobile app, but they could also be a button floating in the air in a pair of augmented reality glasses. It could be a button on a touchscreen panel that has shortcuts to all sorts of functionality for the whole house.
If our AI can see us, or interpret our movements, then gestures could also be an alternative. If I could shush my Echo by putting my finger to my lips, or turn on my light with a particular wave of my hand, that could solve my late night smart light problem. Either way, we need to have some form of physical interaction still in play.
This is a whole area I’ll be exploring in detail in a later article (Update: 30th Dec 2016, I’ve now got a whole piece exploring the question “Is the Amazon Echo always listening?” which explores this and more in detail!), but there is still a huge (and very vocal) part of society that feels very unsettled by the concept of having microphones listening to them all the time, even if the devices only submit your voice data to the cloud after a wake word like “Alexa” or “Ok Google” is heard. News headlines show the public’s concern, including “Amazon Echo may be listening in on your conversations in your Vegas hotel room“, “Amazon Echo ‘Always Listening’ Feature Worries Security Experts“, “The FBI Can Neither Confirm Nor Deny Wiretapping Your Amazon Echo” and “Goodbye privacy, hello ‘Alexa’: Amazon Echo, the home robot who hears it all“. You can’t wait to have one in your home right?
Having a microphone in your home that is always listening for a wake word is understandably worrying if you don’t trust the likes of Amazon, Google, Microsoft, Apple, governments, law enforcement and others to be responsible and ethical with how they run these systems. Legally, according to Joel Reidenberg, director of the Center on Law and Information Policy at Fordham Law School in New York City, there’s a danger in having this sort of tech in your home too:
“These devices are microphones already installed in people’s homes, transmitting data to third parties. So reasonable privacy doesn’t exist. Under the Fourth Amendment, if you have installed a device that’s listening and is transmitting to a third party, then you’ve waived your privacy rights under the Electronic Communications Privacy Act,” — Joel Reidenberg
With all of those concerns, it seems unreasonable to assume that everyone in society will eventually want to use voice commands as their go-to way of working with tech. That’s not to say people aren’t interested, a report from the Consumer Intelligence Research Partners estimates that Amazon has sold “5.1 million of the smart speakers in the U.S. since it debuted two years ago”. The Echo also completely sold out this Christmas season. That’s plenty of homes ready and eager for voice commands! However, it can’t be the sole method for human-computer interaction.
Leading on from the privacy concerns, our AI control systems and speech recognition need to get good enough that they can run on our own devices and networks. Not only that, but we need to have control over that data, where it is stored and how it is used. The cloud is a huge limitation for my Amazon Echo, as my Internet connection can be a bit iffy. There are times of the day when my Echo just can’t answer me, despite the fact that my questions were simple enough that they should be possible without an Internet connection. While it’s a tough ask right now as AI and speech recognition is complex and requires a lot of computational power, eventually we need to get to a local, personal assistant which is truly personal. I’d love a personal AI whose memories are solely within it and whose ability to understand what I say is also completely contained within it. That’d be a great place! (Maybe I’ll have a way to back it up at times though… just in case my robot runs into trouble some day?).
One of the issues I have with Alexa right now is that I don’t always want it to speak back to me — waking up the whole family is just one case, but there are others, if the room is really noisy and I somehow manage to ask it a question, what happens when I can’t hear its response? Having a way of it messaging me would be great for these situations! I think this issue will be short lived as headphones like Apple’s AirPods, Hear One smart earbuds, Vinci headphones, Vi biosensing earphones and many other smart headphones all will soon bring the concept of portable AI voice feedback to the masses! These headphones allow AI to talk to you privately and wherever you are. I love that concept. The harder part — responding! Even the voice control on Vi headphones, which is targeted at fitness use, would be awkward to use in a gym. I personally wouldn’t want to hear everyone’s personal requests for their heart rate or a song change made over voice in a gym. That’d drive me insane. Voice control just doesn’t work here.
Vinci, pictured above, also has a screen on the side which you can swipe and interact with, so that could be a good way to provide some sort of response to AI at times when its not appropriate to talk?
Right now, there’s very little to stop someone entering my home and saying “Alexa, what’s on my calendar for today?” or any number of other statements that could provide them access to services I’ve linked to my Echo. Google seems a little better (only a little!) as it does tend to only recognise my voice the majority of the time. The voice control of the future needs to be certain it knows who its talking to when responding to potentially sensitive queries like “can you open my front door?” or “what’s my bank balance?”. Things could get very messy if everyone in an office all have their own AI they’re controlling via voice — we don’t want everyone’s Echo going off at once with every question right? This is one point I think we’ll manage to achieve quite soon, either by setting unique wakewords or by training voice control systems to be able to recognise only certain people.
While I think we’ve got a long way to go with voice recognition and voice control, I do think there is huge potential in the technology — we just need to design our systems in a way where it is feasible and works in all situations. Let’s avoid going all in and taking away every other interface for interaction. Let’s be cautious with security and privacy. I’m already used to saying “Alexa, what’s the time?” so much that I miss not being able to do that when I’m not home!