There has been a lot of discussion as of late that advances in AI (artificial intelligence) will translate into better voice assistants. There was even a documentary that went as far to say how AI technology is influencing our lives and how the data behind it is driving even bigger trends in digital transformation.

We’ve all heard and seen massive growth number predictions for AI to soar well beyond $400 billion by 2025. One key factors driving growth in this market is advances in voice recognition, which is also going to help drive the market for voice assistants.

All month long I have been reporting how Apple Siri, Amazon Alexa, Google Assistant, and Cortana virtual assistants have been navigating their way into our world in an effort to make our lives more convenient.

For the makers of these voice assistants, they are finding even more ways to cash in by creating more products, which means even more revenues. It’s interesting to note while these devices have yet to satisfy us in what we say, or how we say it to them, but if sales of these voice assistants are a measure of success, these devices are performing exceptionally.

Perhaps we need to understand how the technology actually works. What is the tech behind voice assistants? The first step is to understand, “speech to text.” Speech-to-text software breaks the speaker’s speech down into tiny bits called phonemes. The software uses the order, combination, and context of these phonemes to figure out what words the speaker is speaking. It must account for the fact that a lot of words sound really similar.

It must also account for any noise that’s going on in the background that might create confusion in the speaker’s words. Of course, it must also account for the fact that everyone pronounces the same words a little bit differently.

The next step is “text to intent,” which goes a step further and figures out what the speaker actually means. Natural language processing is kind of like magic. It feels pretty seamless most of the time. Consider for a moment IBM’s tech, DeepQA, which is a great example of this.

DeepQA can figure out what someone is really asking, and then starts coming up with potential responses to this question and creates a thread for each possibility. IBM says each thread uses hundreds of algorithms to figure out how likely each possible answer is to be relevant, then it creates a ranked list of answers. The final step, according to USC, is “intent to action.”

This is the step that directs the voice assistant to actually complete the request or command. As we’ve already discussed, the commands most people currently give voice assistants are relatively simple, like asking what the weather or traffic is like, or asking for information that can be found from simple web search.

Someday, these tasks will become more complex. We will rely on voice assistants to set appointments for us, to buy items on our behalf, and to control our homes. So what will it take to get there? Which step in the process I just laid out for you is the weak link? Should it be the technology’s ability to understand language and interpret the meaning? Is it the ability to formulate responses or answers, or is it the ability to take action on user requests? And finally, how will advances in AI make a difference in the realm of voice assistants?

We are beginning to see voice-assistant technology become ubiquitous, but with this comes all sorts of issues. For instance, rather than using home cameras for surveillance more people are finding they are becoming voyeurs, watching on their nanny, gardener, pool guy, and the list goes on.

But the real goal behind all this technology, at least, is the AI. Amazon hopes it can improve the algorithm in the voice recordings. The company says it’s “not listening” to our discussions, but it does want to dominant how Alexa responds to all of us. How Amazon seeps into our worlds is really through these devices everyday and how we speak to them and how it learns what we each of us says each and every day to a device.

For instance, the “wake word” is responsible for firing up all the other devices via the right command that is given. The real debate heard around the world is who is listening to us as we give that command to these voice assistants and what happens to the data?

We can’t forget once the light goes on, the device is awakened and it’s recording. These devices have become such a staple in our lives we forget they are there, they awake, and they are recording and we don’t even recognize it.

The other problem is how frustrated we can all get with these devices and what we say to them and how that is recorded as well. Patience is a virtue and sometimes that is lost as well.

Perhaps this is indicative of a bigger problem. If you can’t get Amazon Alexa or a Google Assistant to do what you want without getting frustrated, at what point can we move these devices into the enterprise?

With advances in AI technologies, voice assistants are going to get better at every step in their process. We might all be concerned about Big Brother once again, but it all comes down to the data. The larger devices companies making the AI algorithms are going to have reveal what is happening to the data they are recoding. It’s all about transparency whether it’s regulated or not, but once we know just think how fast, and clearly, we will be talking to our voice assistants.

Want to tweet about this article? Use hashtags #IoT #Internet of Things #AI #artificialintelligence #5G #cloud #edgecomputing #voiceassistants #machinelearning #bigdata #digitaltransformation #cybersecurity #blockchain #sustainability #edge

Click here to read more blogs.