Mike Elgan Says Voice Assistants Will Deliver Augmented Humanity

Mike Elgan’s Twitter Profile lists him as, “The world’s only lovable tech journalist.” That may be a low bar, but we will go with it. Mike spent most of the 1990s as Editor of Windows Magazine. A dozen years ago he founded Elgan Media and today is a prolific blogger, tweeter, and podcaster. He also writes for Computerworld and Fast Company. More important, Mike has a unique perspective on voice computing, virtual assistants and artificial intelligence and how these technologies will have an even greater impact on the world than mobile and the web.  

Tell me about your background.


Mike Elgan

Mike Elgan: I started as a newspaper journalist out of college. After a few years I fell in love with newspapers and computers. It started when I was transitioning a newspaper, South Coast Community newspapers, to a digital edition.

Before that, we were using a Mac and a printer to print out the articles and then cutting them out with razor blades, fixing them to a board and photographing them. I was transforming that system into one that was all digital. The electronic layout went straight to the printer. It made me love computers and computing as a subject matter more than local water politics and baseball that we were reporting on most frequently.

That made me realize I could become a technology journalist. The interest turned into Windows Magazine where for several years in the 1990s I was the managing editor.

How did that interest in technology translate into your interest today in voice assistants?

Elgan: I’ve always been obsessed with voice command. In the last five years, I’ve been obsessed with virtual assistants.

I’m a writer so my first thought was that I could use Dragon by Nuance to write faster. I imagined I could speak and it would transcribe it. I tried a lot of different things. None of them worked for writing. I discovered that the writing process was not conducive to speaking. Writing is a lot slower than talking and the quality of the writing was lower if I started with speech.

I recall later having an epiphany because I was following the progress of speech recognition. At some point, I became obsessed with the idea of a large screen desktop — something akin to the Microsoft Surface Studio but much bigger. Every time I wrote about and predicted it everyone said they could never live without their mouse and keyboard. They don’t understand that so much of our computing will be done with voice that the idea you will be overly reliant on a keyboard or mouse is overlooking voice.

Voice is the key to radically democratizing computing. It will radically change our assumptions about hardware. When you are talking to a virtual agent that understands what you are saying then that changes absolutely everything.

How do you think the changes brought about by voice computing compare to other technology trends such as the advent of mobile or the web?

Elgan: I think they are more different than similar. The AI part and the voice part are separable. Imagine you have this massive super computer data center that does artificial intelligence. I learned this at Microsoft Research in 1990s. One of the fundamental problems of AI is that humans understand things that computers do not. If I say, “I saw the bird with a telescope,” the AI doesn’t know if you had the telescope or the bird did. If you are going to use AI to simulate human intelligence there is no end to the variation. They knew even then that for AI to do anything, the AI actually has to know about the world. It has to have knowledge.

What we are getting at is AI that understands the world. That is in this data center out there in the cloud. Part of what it knows about the world it knows about me specifically. A way to access that AI is through a voice virtual assistant. The computer does voice analysis to figure out what I said and feeds it into the AI in text and it tells me the answer and it happens in a second. That is an interface that has incredible implications for computers.

I figure there is not enough emphasis on this. It is the inevitability of virtual assistant software integrated into glasses. One of these companies is Vue Glasses. They have ordinary glasses, except in the back they are thick with the battery on one side and the electronics on the other side. They use bone conduction to hear and recognize voice. I believe it is a 100% certainty that glasses that enable you to speak with a virtual assistant will be common in 3-5 years. In every moment in your day you will be able to talk to your virtual assistant from the time you wake up. There is nothing embarrassing about it.

The other one is a high school kid that is making something called Kai [click to listen to Mike’s Food and Technology podcast interview of the founder, Dylan Rose]. He is using the Houndify platform. Kai clips on your glasses and uses a similar approach.

You mentioned that voice is going to democratize computing. How?

Elgan: Once we get use to relying on virtual assistants and they have agency and understanding, it will change things in so many different ways. One is democratization of technology. Even technology luddites can already use voice technology. They know how to talk. Someone can say, “please make an appointment at my doctor sometime tomorrow afternoon.” It will come back in seconds and say I made an appointment tomorrow at 2:30. At 1:00 pm the next day it will say, “you need to leave for the doctor in 15 minutes because traffic is heavy.” This is something everyone can use.

Imagine a world when anyone can have 24 hour access to an assistant. It’s not about computing anymore. It is about augmented humanity.

The other big implication of this is that people are concerned that robots will take over our jobs. I don’t think so. Virtual assistants still give us direct access to the same AI. Therefore, a person with AI will be better than AI alone in almost every case. AI is centuries away from thinking and acting like a human. The killer combination is a human being with AI access.

So you don’t think AI will impact on jobs or displace workers. What about the Japanese insurer that replaced a bunch of claims adjusters with IBM Watson?

Elgan: People could be much better at insurance. The insurance business is fraught with errors and ignorance. No one has all of the facts. You could try to save money by taking a 400-person workforce and cutting it down to 25 with AI or you could be much better at it by giving the 400-person company AI. There was a time when we saw costs come down. Now you see all of these startups coming out and charging for products. Everyone is charging for stuff because it makes products higher quality. There is a Darwinian angle here. I suspect using far fewer people won’t always be the best path to take. You want people to be much better at their
AI is centuries away from thinking and acting like a human. The killer combination is a human being with AI access.
On the other hand, a lot of journalism has been replace by AI already. There are two types: financial reporting and sports. In financial reporting, there is a need to have information out there in words. What were their profits and losses? What did they say? Sports as well. Who played? What was the score? You can churn that out. It’s formulaic. Pretty much everything that can be done by a computer should be. It’s a terrible idea to try and save jobs at the expense of technology that does it equally well or better. You go to Google News and there is no scarcity of journalism out there. Things should qualitatively get better.

Is voice really the key here or is it the AI?

Elgan: Voice is an interface. Think about a smartphone. It has a touch interface that has certain characteristics. Voice is a different interface. The application and compute power behind it are different. What is transformative is access to AI and virtual assistants. The voice assistant’s job is to understand what you want, contextualize it in the world and your world personally, and return with the right response that is a human-like response.

There are two parts to this. There is the voice part — technology that listens to the sounds and figures out the words — and a virtual assistant is an artificial human that understands and contextualizes and responds in a human-like way with variation in tone and humor. Everything behind the scenes, all of those applications are not part of the revolution. The same software goes to the same servers and same databases as with other computing interfaces.

The revolution is not the voice part and not the computing part. It is the  voice assistant part. That is the really exciting part.

Do you believe voice assistants will change people’s lives more than mobile or web?

Elgan: Absolutely. The ability to take advantage of an iPhone is dependent on the skill and knowledge of the device user. I know people who spend time studying how to do something amazing with a smartphone. The average person is a 3 out of 10. A smartphone can do a million things, but the average user is doing five things. If an AI-based voice assistant can do a million things then the average user can use a million things. It turns every user into a power user because with the right kind of agency a voice assistant can make decisions. You don’t have to study and learn how to use the technology. It is thinking about these things for you. That kind of pre-emptive agency turns every user into a power user.

Let me give you a practical example. War Dogs is about this guy that figured out this system in how to win government contracts. He figured out that by finding certain types of low level acquisition requests, he could game the system and make a fortune. He had a unique insight that enabled him to game the system because of the new rules. He put in enormous work to go through a mind numbing chore to look for things he learned how to look for. He made a very good living because he developed a unique perspective, had unique knowledge and went to unique effort. The way virtual assistant technology should function is you start looking at military stuff and it should notice you are trying to get into the arms dealing business and it should figure out that system. It should go to that effort for you automatically.

There are a number of emerging models for accessing voice assistants such as in-ear, on person or in the immediate environment. Which do you think will be most common?

Elgan: I think the best place for virtual assistant appliances are glasses and lightbulbs. Lightbulbs are everywhere. I would love to see every lightbulb in the future have connectivity to home wifi and have a microphone and a speaker. AI should recognize our voice so it knows who we are when we speak to it. The other place is glasses or something that goes over the ear. It’s a social norm [to wear glasses] and provides a place for hardware. Anything in our ears is good and everything in our living space will also be good. Kids won’t care where the microphone or speaker are. They will just assume they are there.

What is your favorite voice assistant application today and why?

Elgan: I’m an iPhone user who tends to have his hands full all the time so I use Alexa when I’m in the kitchen and Siri when I’m not. I use them, but don’t love them. They’re obtuse. I love x.ai’s Amy, which has one job and does it well.