Siri Gets Smarter at Recognizing Names of Local Businesses and Attractions
Researchers at Apple have implemented a update that makes Siri better at recognizing the proper names of local points of interest (POI) and businesses. A post last week in Apple Machine Learning Journal starts out with describing how speech recognition performance improvement has been remarkable in recent years, but the progress has been asymmetric. Automated speech recognition (ASR) for common speech has improved a great deal.
However, speech that includes words that are less common because they include proper names and represent locations as opposed to standard nouns and verbs have continued to be a challenge for voice assistants. The Siri Speech Recognition Team wrote:
“The accuracy of automatic speech recognition (ASR) systems has improved phenomenally over recent years, due to the widespread adoption of deep learning techniques. Performance improvements have, however, mainly been made in the recognition of general speech…Generally, virtual assistants correctly recognize and understand the names of high-profile businesses and chain stores like Starbucks, but have a harder time recognizing the names of the millions of smaller, local POIs that users ask about. In ASR, there’s a known performance bottleneck when it comes to accurately recognizing named entities, like small local businesses, in the long tail of a frequency distribution.”
Addressing the “Long Tail” Problem
The “long tail of frequency distribution,” refers to the far end of a normal distribution whereas the center represents the words used most commonly. “Long-tail” terms are difficult for machine learning systems because there is less training data, and sometimes not data, and therefore less material to learn from. A voice assistant can’t learn if it isn’t taught. Apple researches comment, “To understand this challenge, think of the variety of business names in your neighborhood alone.” If you train a language model and your local Florist or town founder’s name isn’t in the data, then the speech recognition engine isn’t likely to correctly understand your request when you speak. Or, if there are common phrases that sound similar to what you say, then the system is likely to default to those.
Geographic Language Models
To address this problem, the Siri team designed geographic language models (Geo-LM) based on various regions in the United States. The U.S. Department of the Census identifies 169 combined statistical areas (CSA) that represent communities with strong ties. These represent a person’s local area. So, the Apple team built 169 Geo-LMs to cover the local communities which account for 80% of the population and then one more Geo-LM for everywhere else. When a Siri user speaks, a Master Language Model (LM) first interprets the speech and then determines if some words should be passed off to a Slot LM, in this case a Geo-LM, for further natural language processing. If the slot is a Geo-LM, then the system takes user coordinates into account.
Siri determines whether to pass the user off to a Geo-LM based on a number of factors, including the type of speech detected and the device. For example, the researchers assume, “users are more likely to search for nearby local POIs with mobile devices than with Macs, for instance, and therefore uses geolocation information from mobile devices to improve POI recognition.” There is not perfect correlation with this assumption, but it is logical that an approach of this kind will improve Siri performance. The flow diagram below depicts how this works in practice.
The Master LM determined that directions to a point of interest (POI) is being requested. That recognition initiates a hand-off to a slot with a Geo-LM where the uncommon speech is more likely to be properly matched to the user’s intent of a restaurant near Harvard University.
The researchers’ data showed that using Geo-LM’s on POI names that are unique to a location led to more than 40% reduction in speech recognition word error rate compared to using the Master LM alone.
There has been a lot of discussion about how Siri needs to improve. Much of that criticism has focused on what Siri can do. However, it is clear even to casual users of Alexa and Google Assistant that Siri had also fallen behind in ASR. The Siri performance improvement for local POI recognition is definitely a step in the right direction. It may also be why Siri’s performance rose so significantly compared to Google Assistant for local queries in a recent test by Loup Ventures. Siri trailed Google Assistant by only 1% and beat out both Alexa and Cortana by 24%.
Now we know at least one reason why Siri is doing better. Technical changes in how Siri operates is having a positive impact on performance. Given the recent shake-up in Siri leadership, I wouldn’t be surprised if we see more progress along these lines over the next year. You can read the entire article here.