Is the Amazon Echo always listening? How does it work? Does it store its recordings? If so, can you delete them? Does it really only send audio to Amazon after hearing the wake word? I decided it was time to do some research and give a thorough answer to those questions and more.
This piece is a tad long and will be updated with more info as time goes on, so I’ve put together a table of contents for those looking to jump to specific questions:
The Echo has seven microphones it uses to keep a virtual ear out for when you say a particular wake word (by default, it is “Alexa” but you can change it to “Echo” or “Amazon” too). Amazon explain how they handle your voice data when you ask a question like so:
“Amazon Echo and Echo Dot use on-device keyword spotting to detect the wake word. When these devices detect the wake word, they stream audio to the Cloud, including a fraction of a second of audio before the wake word.” — Amazon
Basically, the Amazon Echo device itself doesn’t understand you. It only understands a few wake words which it recognises and then begins recording you to send your question to Amazon. Amazon then will work out what you asked and provide Alexa with a response over the web.
It’s a great question — how in the world can they send a fraction of a second of audio without always recording and storing audio from you? Kieren McCarthy from The Register luckily covers this point. It turns out, they’ve got a really small buffer of stored audio — a fraction of a second. They don’t seem to store it after that point, so from what I understand, there shouldn’t be a huge history of recordings stored on the Echo itself.
“The machine constantly stores a fraction of a second of audio in a buffer and when it hears “Alexa” relays everything from that buffer until there is a pause in your speech of roughly half a second to Amazon’s servers. The servers then make sense of what you just said and respond accordingly.” — Kieren McCarthy
How much potential logging and voice data could they be storing? About 250MB of data can be temporarily stored on the device, but it doesn’t sound like the buffer uses all of this space, so I’d look at it as a worst case scenario for locally stored data:
“The good news for Echo owners is that there’s very little data on the device itself. The hardware includes only 4GB of storage, and that’s mostly taken up by device firmware. The ephemeral data in the device’s 250MB of RAM is more tempting for forensic experts, but the data is quickly wiped by restarting the device.” — The Verge
Jonathan Rajewski has looked at a range of IoT devices and found that some store less data than others. According to a statement by the Verge on his research, he also looked at the Google Home app, which “stores almost no local data at all”.
Even still, getting to that data is tough, even if it is there:
“the Echo has no data ports and storage can only be reached by physically removing the components or connecting directly to a pinout on the circuit board. The result is delicate enough that most analysts steer clear of the Echo unit entirely.” — The Verge
The Amazon Echo visually shows you if it is recording you and sending data off into the cloud to Amazon by lighting up the ring on the top of your Echo. Amazon explains it that “the light ring around the top of your Amazon Echo turns blue, to indicate that Amazon Echo is streaming audio to the Cloud”.
Are Amazon bugging your house with secret listening devices? The consensus is no, they aren’t, and I think a lot of tech people have checked on their own — but not many have published their findings! I decided to run my own tests to answer one question — is Amazon always listening and sending conversations to the cloud even when I’m not asking it questions?
No. It doesn’t look like it is.
Wireshark, an open source and rather popular network packet analyser, wouldn’t quite work for me as much of my network was over Wi-Fi and running Wireshark would only show me traffic going through my laptop. That wasn’t showing me much! So instead, I set up a Raspberry Pi 2 as a router using OpenWRT and connected my Echo to it over Wi-Fi. Basically, I isolated the Echo on its own network of sorts*. The main advantage of using this form of router — I could see graphs of network traffic and stats on what devices are doing what. My home router sadly couldn’t do this on its own (disappointing right?).
Looking at the realtime logs from OpenWRT, before and after asking Alexa the question “Alexa, how are you?”, there is a clear spike of data when I ask the question and it drops back down to a pretty calm state after that. There’s another small spike a bit later, but I don’t think it’s big enough to be more voice data and was more likely to be other background tasks on my network:
I also ran bmon, another network analyser, to get a bit more detail. Initially, the network graph looks very similar:
Looking further into the bmon tool, I could look at where in particular the traffic was going. There appears to be one particular IP address that voice data was being sent to at Amazon. Before asking Alexa a question, that address is mostly silent. There is a small amount of data sent to Amazon AWS but it’s quite small (804 bytes), so it’s not likely to be voice data or anything related to your conversations.
After asking Alexa a question, that traffic spikes for that particular IP address and we have a clear chunk of data sent to Amazon:
There was also one study entitled “A Smart Home is No Castle: Privacy Vulnerabilities of Encrypted IoT Traffic” which found similar results. They watched the Amazon Echo’s network traffic and could see traffic spikes around when they asked it a question, but relative quiet before and after that:
There appears to be a little bit of data that gets sent over every so often, but it does not look like any conversations are being streamed when the Echo’s ring is not blue. So for now, it appears Amazon is doing what it says it is doing — sending voice data to the cloud only once it hears the wake word.
Alex Vanderpot, a developer who has been looking into Echo’s software, appears to have had more luck with Wireshark and came to the same conclusion:
“I don’t personally believe the theories that Amazon is always listening (Wireshark strongly implies that they’re not)” – Alex Vanderpot
However, that doesn’t resolve all privacy concerns from the public. There’s more people are worried about. Firstly, what if you don’t want the Echo to listen to you at a certain point of time?
Yep. Rather easily actually! There’s a mute button on the Echo that’ll let you do just that:
“you can turn Amazon Echo or Echo Dot’s microphone off by pushing the microphone on/off button on the top of your device. When the microphone on/off button turns red, the microphone is off.” — Amazon
Or, if you’re especially worried, you can unplug the Echo when not using it. Both unplugging it and turning off the microphone do take away a large part of the Echo’s convenience though.
WIRED’s has some good advice for those not eager to have always listening devices:
“If you’re really freaked out by the concept of something always listening to you in your home, your best bet is a push-button voice assistant. Things like the Amazon Tap, the Alexa remote for Fire TV, or your phone with its “always listening” mode turned off.” — WIRED
Amazon stores all the voice recordings on its servers, so if you have been recorded saying something you’d rather people didn’t hear, it’s currently in the cloud and waiting to be found. In fact, it’s also being used to help improve Alexa’s voice recognition capabilities. WIRED says that “several times a day, Amazon uses the entire stack of Alexa queries to educate its A.I. about dialects and casual speech.”
There isn’t a way to prevent voice recordings from being stored, but you can view and delete them.
You can see what voice data is stored in the app. I go to alexa.amazon.com to use their web app. Sadly, if you are looking to delete them, you can only delete one at a time in the Alexa app (want to delete everything? Scroll further down!).
To see your voice recordings, go to “Settings” and click “History”:
Then on the history settings page, you’ll be able to see all the recordings listed along with what Alexa thought you said. Let’s find one you’d like to remove. Choose the one you’d like to delete (you can click it to choose to listen to it also):
On the page for the individual recording, you can play the track to hear what what was recorded, you can tell Amazon whether or not it got it right (I couldn’t resist and clicked “No” before taking the screenshot…) and you can delete it. To delete it, click the “Delete voice recordings” option at the bottom.
You can in fact delete all recordings in a different part of Amazon’s settings. To do so, go to amazon.com/myx, click the “Your Devices” tab and select your Amazon Echo. Then click “Manage voice recordings”:
A pop-up will appear — click the very clear “Delete” button to delete it all. It could potentially reduce how well Alexa can understand you though:
Security wise — The voice recordings sent to Amazon itself and the responses are encrypted, so it would be difficult for any hackers listening in on your network to hear what you’re saying. That’s a positive for sure. However, because the Echo stores recordings of your questions on the cloud, hackers could very well gain access to that if they can get into your Amazon account. However, that’s the same deal as hackers being able to get into your email or other online services. It isn’t unique to the Echo. To be certain that your recordings won’t be heard by hackers, it’d be wise to check the following:
Overall though, if you know you’ve asked Alexa something that could be awkward if hackers found it, I’d recommend deleting it instead. Better safe than sorry. In this day and age, every service is likely to be breached eventually, so be cautious about what you allow Alexa to remember.
Just this month (December 2016), police in Bentonville, Arkansas want to know if an Echo belonging to James Andrew Bates overheard anything in relation to a murder case in his home. Bates is set to go to trial for first-degree murder for the death of Victor Collins in his home next year. They’ve already used another IoT device for some of their evidence:
“Police say Bates had several other smart home devices, including a water meter. That piece of tech shows that 140 gallons of water were used between 1AM and 3AM the night Collins was found dead in Bates’ hot tub. Investigators allege the water was used to wash away evidence of what happened off of the patio.” — Engadget
With all of this raises a new question — while the Echo isn’t always recording, there’s a (rather small) chance that it recorded something that could help work out what happened at his home that night. Is this something which the police should have access to or is it a breach of user privacy?
According to Business Insider, “The report says Amazon refused to hand over the voice-data on two separate occassions, although it did share Bates’ account information and purchase history. The police were able to take some information out of the device, but it’s unclear what that included.” I’m not sure what information might have been stored on the device itself, but if police had his account information, surely they could retrieve his Echo’s recordings in the Settings screen?
Hackday’s Mike Szczys makes a good point on this — “It’s not surprising that Amazon would be served a warrant for this data. You would expect phone records (although not recordings of the calls) to be reviewed in any murder case.” He points out that even though it only records when you say the wake word, police are looking for any requests made to Alexa that might lead to clues:
“In this case, police aren’t just looking for a recording of someone saying “Alexa, help I’m being attacked by…” but for any question to Alexa that would put the suspect at the scene of the crime at a specific time.” — Mike Szczys
If you are concerned that governments, hackers and others may get into your recordings of your questions put to Alexa — you’ll want to make regular use of the deletion functions mentioned above. Or not use a voice activated service at all.
You can watch a home network and see when people are using their Echo, but you cannot see what is being said or transmitted as that is all encrypted. There is still plenty of potential for it to be misused, even if you can’t tell what’s in those transmissions. In the study entitled “A Smart Home is No Castle: Privacy Vulnerabilities of Encrypted IoT Traffic” from earlier, they point out that,
“SSL traffic spikes clearly indicate when user interactions occurred […] simply learning the times of day when customers interact with a particular device could have unwanted advertising implications.” — A Smart Home is No Castle: Privacy Vulnerabilities of Encrypted IoT Traffic
This one is hard to overcome, however you can set it to make a sound when it starts to record which could ensure you notice the Echo has begun to listen in and send the audio to Amazon. To do so, go to your Amazon Echo app or the web app (all screenshots are from the web app: alexa.amazon.com). In the settings page, choose your echo (mine is just called “Patrick’s Echo”) on the first option:
Then on the next page, click “Sounds & Notifications”:
Finally, down the bottom there are two switches. If they’re both switched to the left and greyed out, then they are off! Turn those on for “Start of Request” and “End of Request” and your Echo will make a sound when it starts to record you and when it’s done:
Just this month, it was announced that the Wynn Las Vegas hotel is adding an Amazon Echo to every one of its 4,748 rooms. This raises new privacy questions — how do you notify every guest that there is an Echo in their room? Should the Echo be muted by default for each guest?
My biggest question — whose account are the Echo devices set up under? Will the hotel delete each guest’s voice history once they leave? That’s a LOT of rooms to clean up voice data for every day!
It’d be great to see a policy from Amazon on how public Amazon Echo devices like these should be handled. Will they have an automated set of functionality that culls voice data? When the guests are changing so frequently, will stored voice data for each room really be necessary? Is this a valid reason for us to have the option to turn off the storage of any voice data?
We know it isn’t recording us at all times by default. It appears pretty reasonable and only sends to the cloud when the wake word is heard. However, one question I don’t have the answer for yet is — can Amazon turn an Echo into a wiretapping device? They can theoretically update an Echo over the Internet to change its functionality, so if a government or other agency desires to, can they turn it into a wiretapping device? The FBI apparently can “neither confirm nor deny” this, which worries me a little:
“Back in March, I filed a Freedom of Information request with the FBI asking if the agency had ever wiretapped an Amazon Echo. This week I got a response: ‘We can neither confirm nor deny…'” — Matt Novak from Gizmodo
Amazon have released some of the source code for their Amazon Echo devices for people to check out here — https://www.amazon.com/gp/help/customer/display.html?nodeId=201626480. It isn’t the whole source code (I’m pretty sure they’ve left a lot of proprietary code out of that) but could be fascinating to those interested! It won’t likely answer a lot of the privacy and security questions on our minds, however Alex Vanderpot is partaking in some deep analysis of the parts of the software that aren’t publicly available. It’ll be interesting to see where his research leads.
At the moment, it seems like Amazon are being relatively trustworthy with how they handle the Amazon Echo’s data and I’ll still be having an Echo in my room readily assisting me in all sorts of queries and connected home functionality. However, it would be good to see an option to prevent any voice data from being stored in the cloud, even if it reduces the accuracy of the device.
As with any connected device in the home, there are huge questions about whether the potential for invasion of privacy outweighs the benefit/convenience of having the device at all. In my case, as a technology enthusiast and developer, I’ll likely have many of these such devices! However, I’m not sure what the mainstream public’s view on it will be. For now, they seem to be pretty keen on the device, with estimations that Amazon has sold “5.1 million Echo devices in the U.S. since it debuted two years ago” — they were completely sold out this holiday season!
In the end, I think a lot of this will be largely about education. The mainstream public will need to understand how to turn off the microphone and how to delete voice data if there is anything they’d like to remain private. The rest of it relies entirely on trust — do you trust Amazon? If not, I wouldn’t recommend putting an Echo in your house. If so, enjoy it. Alexa can be an incredibly valuable addition to many smart homes. I’m pretty happy with mine.
This is a living document which I’ll be gladly updating with more info as it comes in from various sources. Do you have any info on how it all works? Any additional insight or thoughts you think need to be covered? Let me know in the comments, over Twitter (I’m @thatpatrickguy and my DMs are open to all) or get in touch via email.
*A note about my Echo test: It was still a part of my wider Wi-Fi network, so other packets and device stuff did still come through. I couldn’t withdraw internet access for the rest of my family just for this test 😉
Learn to build for the Amazon Echo, Google Home, Facebook Messenger, Slack & more!