Smart-Speaker Makers May Be Recording Users To Improve Natural Language Processing At Expense Of Privacy
Natural Language Processing is a very hot topic these days because it allows for machine learning algorithms to analyze large amounts of data in order to improve voice functionality and response to voice commands over time. For bots to recognize and be able to interpret to natural language is no easy task. There are a huge number of ways to ask a questions, for instance, as well as a large number of different accents, voices and other variables in human speech.
This is why there are currently hurdles to overcome in this field related to speech recognition, natural language recognition and natural language generation. The latter refers to computers being able to respond to commands with a voice that will be understandable and related to the query being asked by the end user. It can also mean reading text out loud, summarizing reports with a voice or reading aloud ideas. This, however often requires natural language understanding.
Consumer-based voice bot systems that offer speech recognition and voiced responses today come in the form of three main consumer products: Apple’s Siri, Amazon Alexa and Google Assistant (Microsoft’s recently repositioned Cortana as a complimentary service to the others). Each of these three companies — Amazon, Google and Apple — offer their own smart speakers in addition to other products users can talk to and be responded back to via voice. They are powered by machine learning algorithms and constantly gather user data in order to improve their underlying technologies.
Gathering user voice data and data related to how users ask questions in various ways is one way to improve these services and natural language processing on voice bots as a whole. I recently ran into an article that mentioned some ways these companies are doing this.
According to an article published by Buzzfeed News, originating from a Bloomberg report, Amazon actually has a dedicated team of employees that listen in on at least parts of conversations Echo users are having. The Echo records parts of the various speech it hears and sends it back to Amazon for further analysis. However, what makes this controversial is that it does not do it anonymously as Apple and Google. It knows the customer and the data going to Amazon is known that it is coming from them.
“Seven people, described as having worked in Amazon’s voice review program, told Bloomberg that they sometimes listen to as many as 1,000 recordings per shift, and that the recordings are associated with the customer’s first name, their device’s serial number, and an account number,” according to the article.
Other employees further clarified to Buzzfeed News that only a small number of recordings are annotated. This still begs the question if these users’ privacy is violated because who knows exactly what is recorded and when? It could be someone’s private moment I imagine, in the bedroom, for instance, that employees at Amazon’s headquarters are listening on. Who is really to know?
The positive spin on this is when crime can be deterred or stopped due to the Alexa being present and recording an incident taking place. Furthermore, suspects can be identified and apprehended easier if their name or other identifiable information was said during a crime that occurred near an Echo. In fact, the Echo has seen a number of potential police investigations either use it or attempt to use it as evidence to convict suspects.
Late last year, for instance, it was reported that a New Hampshire judge has ordered Alexa recordings to be released tin a double murder case that occurred within an Echo’s presence.
Like many technologies, what can be used for good can also be used for bad. Cybercriminals may also be able to tap into speakers and voice technologies. Although most of the recordings, or all, should be kept on company servers and not the actual speakers themselves, being able to spy on users’ voices in real time or talk through such devices I image could have negative consequences for the end users. Imagine a company meeting taking place with sensitive information shared and Amazon employees listening on this or, in an even worse scenario, actual cyber criminals.
The other thing to keep in mind is that smart speakers are often connected to smart homes or the idea of all household appliances and various security systems being controlled by voice. Thus, a hacker can potentially disrupt a whole perimeter of a residence or business if they can trick the speaker that their voice is the one of the owner. A Techworld report mentioned some of these and other concerns related to voice assistants and smart speakers, particularly synced to smart homes, that can arise if the right safeguards are not in place.
“Hackers need only a short audio sample to synthesize or replay a human voice convincingly enough to trick people and security systems.,” according to Techworld. “Another danger is that companies could use people’s voice to personalise advertising.”
A good sign is that according to Amazon’s Device Support FAQ page, the company does not record and analyze all speeches that occurs in the midst of its Alexa-compatible devices or its Echo speakers. In fact, it is only certain words that trigger Alexa to possibly start recording. Here is how the FAQ describes it:
Is Alexa recording all my conversations?
No. By default, Echo devices are designed to detect only your chosen wake word (Alexa, Amazon, Computer, or Echo). The device detects the wake word by identifying acoustic patterns that match the wake word. No audio is stored or sent to the cloud unless the device detects the wake word (or Alexa is activated by pressing a button). With Alexa Guard, you can also configure supported Echo devices to detect specific sounds, such as the sound of smoke alarms, carbon monoxide alarms, and glass breaking.
Additionally, the FAQ explains that a visual indicator appears when recordings are sent to the cloud — although I am not sure if this is the same thing as Amazon analyzing speech on their own servers or what cloud is it talking about (the users’ cloud accounts or the cloud as a whole including any data storage on remote servers?). Either way, voice recognition and smart speakers are here to stay and may speed up the overall development of natural language processing albeit at the expense of privacy.