
So, what did I learn? In a nutshell, I learned how ambient, multi-modal computing powered by artificial intelligence will make a seismic shift in our lives forever. I also learned a whole new set of vocabulary.
Ambient computing links IoT (Internet of Things) with an ecosystem of devices and artificial intelligence to support humans as they carry out their everyday lives. To work effectively, ambient computing responds to the “utterances” of humans to determine their “intent”. “Intents” are the things that users are trying to accomplish such as finding out the time, the weather, turning on the lights or ordering dog food.
Smart devices are not listening for commands. Once they hear the “wake word” they listen for “utterances” in order to determine the humans’ “intent”. The wake word varies by device, but Alexa responds to her name, Google wakes with “Hello, Google” and Samsung’s Bixby activates with “Hi, Bixby”.
The “utterance” is the phrase, instruction or question that humans give to their smart devices. Since people converse differently, the AI running the device must be able to extract and interpret multiple versions of the “utterance”. For example, when one person enters their home they may say “lights on” to their smart device and another person may say “turn on the lights” and a third person may say “lights please”. To be effective, the AI running the device must able to understand many versions of a single “intent”.
No one is writing programs or apps for voice technology, they write skills, actions, and capsules. While there are Alexa, Google Assistant, and Bixby apps, they should not be confused with the codes and content found in skills, actions, and capsules. The key concept here is content and code. There are over 90,000 Alexa skills and 4,200 Google Assistant Actions and Samsung is actively encouraging developers to create capsules for Bixby. On Alexa, there are skills covering everything from business and finance to weather. As an example, you can listen to daily reports from popular newspapers, play jeopardy, or learn how to meditate using Alexa skills.
Developing skills, actions, and capsules requires that developers design for a lake and not a river. Voice technology is conversational, and you cannot force a user to follow a specific defined path, the conversation can flow anywhere, just like a lake.
Conversational design is about context and it is not like IVR (Interactive Voice Response). The best designs narrow down the context properly. For example, if someone’s “intent” is to get the due date on their bill, it is not an appropriate context to suggest other products they can purchase.
Designers can expect that humans are multi-modal meaning they will jump from one mode such as voice to another such as a screen. A good example would be if a user asks about cockatoos. They may simply ask what type of bird they are which can be answered with a voice response. If they ask what a cockatoo looks like, that context calls for imagery and it is best to show an image to the user.
The exhibit hall was another treasure trove of companies that offer voice innovations. There was a section exclusively dedicated to start-ups and I saw many promising technologies for the voice ecosystem.
I took so many notes that I could go on and on, but I think I covered the main points of what I learned at VoiceSummit.ai. The event that was both educational and fun with an impressive roster of speakers. I will definitely attend next year’s event.
If you are interested in learning about voice technology, I highly recommend attending this conference and you can learn more about it at https://www.voicesummit.ai/
Leave a Reply