A recent study about smart speakers revealed that they are, in a word, crap.

The study, entitled, WHEN SPEAKERS ARE ALL EARS: Understanding when smart speakers mistakenly record conversations, revealed that these smart gizmos are very often activated by words and phrases that have little to do with anything.

Of course, that means they are recording and sending audio clips when they really shouldn’t be.

Not to worry, though – we are reassured by the researchers that it’s no big deal!

1984

Just a few months ago, we learned all about how smart devices are often sending private data and even video footage of privates moments across the world:

Amazon’s AI-based home security system is sending footage of users’ private moments to dozens of algorithm trainers halfway around the world, according to former employees – not unlike its Alexa “smart” speakers.

Amazon’s Cloud Cam home security device regularly sends video clips to employees in Romania and India, who help “train” its AI algorithms, according to five current and former employees who spoke to Bloomberg. The workers review the clips in order to help the system distinguish pet from threat – benign movement from malignant intruder. The only problem? Cloud Cam users have no idea they’re being watched by human eyes.

And that’s the part we know about.

The other part we often don’t think of is how all this data and metadata is certainly not flushed down the toilet by the likes of Google and Amazon.

Instead, it’s kept – usually forever – in a giant treasure trove of personal information that can be scanned, filtered, correlated, and processed in just about any way you can think of for a whole variety of reasons.

It would be naive to think that a company like Google would NOT use or share all that data in any way that they – or the NSA – see fit. This is the same Google which started as a DARPA-funded PhD thesis and turned into a global giant with the motto, “Don’t be evil.”

Well, I think we can call, “Oops!” on that one.

The Study

Back to our study. In summary:

The main goals of our research are to detect if, how, when, and why smart speakers are unexpectedly recording audio from their environment (we call this activation). We are also interested in whether there are trends based on certain non-wake words, type of conversation, location, and other factors.

One of the ways the researchers at Northeastern University and Imperial College London investigated this “unexpected recording” is by playing clips of TV shows:

Instead, we came up with a much simpler approach: we turn to popular TV shows containing reasonably large amounts of dialogue. Namely, our experiments use 125 hours of Netflix content from a variety of themes/genres, and we repeat the tests multiple times to understand which non-wake words consistently lead to activations and voice recording.

And now for the results…

Their Exciting Conclusion

You’re gonna love this!

First, the initial results:

Are these devices constantly recording our conversations? In short, we found no evidence to support this.
How frequently do devices activate? The average rate of activations per device is between 1.5 and 19 times per day (24 hours) during our experiments. HomePod and Cortana devices activate the most, followed by Echo Dot series 2, Google Home Mini, and Echo Dot series 3.
How consistently do they activate during a conversation? The majority of activations do not occur consistently.

Well, that’s not so bad, right?

But wait, there’s more!

Are activations long enough to record sensitive audio from the environment? Yes, we have found several cases of long activations. Echo Dot 2nd Generation and Invoke devices have the longest activations (20-43 seconds). For the Homepod and the majority of Echo devices, more than half of the activations last 6 seconds or more.

And now the good stuff. Audio that bears little or no resemblance to “activation words” caused all the devices to light up.

For the Google Home Mini (“Hey Google”, “OK Google”), activations commonly occurred when the dialogue included phrases such as:

“A-P girl”
“Okay, and what”
“I can work”
“What kind of”
“Okay, but not”
“I can spare”
“I don’t like the cold”

Right, because “Hey Google” sounds EXACTLY like “What kind of” and “I don’t like the cold”… Nothing fishy going on there!

For the Apple Homepod (“Hey Siri”), all that was needed was words rhyming with Hi or Hey, followed by something that starts with S+vowel, or a word that includes a syllable that rhymes with “ri” in Siri. Examples:

“He clearly”
“They very”
“Hey sorry”
“Okay, Yeah”
“And seriously”
“Hi Mrs”
“Faith’s funeral”
“Historians”
“I see”
“I’m sorry”
“They say”

What in the name of… “Hey Siri” is the same as “Faith’s Funeral” and “He clearly”?!

For Amazon Echo and related devices (“Alexa”, “Echo”, “Computer”, “Amazon”):

words that contain “k” and sound similar to “Alexa,” such as:
- “exclamation”
- “kevin’s car”
- “congresswoman”
words containing a vowel plus “k” or “g” sounds such as:
- “pickle”
- “that cool”
- “back to”
- “a ghost”
words containing “co” or “go” followed by a nasal sound, such as:
- “cotton”
- “got my GED”
- “cash transfers”
words containing combinations of “I’m” / “my” or “az”:
- “I’m saying”
- “my pants on”
- “I was on”
- “he wasn’t”

OH FOR CRYING OUT LOUD!!! How on Earth does “congresswoman” sound anything like “Alexa”? Also, it seems that “Amazon” = “my pants on”, and “Echo” is the same thing as “pickle” and “a ghost”!

I’m speechless…

Although I have to give credit to dear Jeff Bezos for making sure that his home spy device had no less than FOUR activation phrases. That’s one sure way to hoover up as much audio as possible! 👍

Finally, not to be outdone by anyone in the Stupid Department, the Invoke (powered by Cortana) was activated by words starting with “co”, such

“Colorado”
“consider”
“coming up”

😱😱😱 No words… Just… no words.

My Exciting Conclusion

Oh, how pretty!

Now, I understand that the tech involved here is fairly new. It’s still evolving.

There’s no magic inside these smart speakers and other gizmos with the “digital assistants”. They’re relatively cheap because they’re essentially an inexpensive smartphone stuffed in a little box. Most of the actual hard work happens in The Almighty Cloud.

That means that yes, they must listen to everything because the computational smarts to do Star Trek-like voice recognition simply aren’t there yet.

So, by definition, they’re going to be listening to a lot, all the time. They have to.

It’s also naive to think that just because the box doesn’t light up like a Christmas tree, it’s not listening. Of course it is! It may not be sending data all the time, but it must listen to hear your speech whenever you happen to open your mouth. Devices can also store data for a time and send it off later. That’s exactly what Google smartphones do.

It is then trivial to use such gizmos as listening devices just as former CIA chief David Patraeus infamously noted way back in 2012. As we learned from Snowden and other whistleblowers, the likes of the CIA and NSA never spy on you, dear American! They just get the Brits to do the heavy lifting for them. Shh!

So, do yourself a favor and just don’t buy or use any of these gizmos. If you do, don’t complain when you finally admit to yourself that indeed, “It has gone too far!”

“It” went too far decades ago…

2 Comments

Rozmund on 31 March 2020 at 14:35

Scottie: your the handle that opens the door that shows us the way out of the “Labyrinth” of mind and body control…hope they leave our Souls alone.what next..Love your emails..never stop.
Rafa on 31 March 2020 at 15:59

Naive is to think that the speakers activate when you see the little light. They are always on come on…