Microsoft recently reached a new milestone in speech recognition: its system achieved a 5.1% error rate. In other words, this new technology recognizes words in a conversation as well as professional human transcribers. Reaching human parity has been a research goal for the last 25 years.
This great achievement comes only one year after the Redmond giant passed another milestone: researchers built a speech recognition system that had a word error rate of 5.9%. A few months later, Microsoft’s Artificial Intelligence and Research department managed to further improve on these results, reducing the word error rate even more.
Better neural net-based acoustic and language models
Researches managed to achieve these results by improving the neural net-based acoustic and language models already used. More specifically, they introduced an additional convolutional neural network combined with bidirectional long-short-term memory model for improved acoustic modeling. At the same time, the new speech recognition system combines predictions from multiple acoustic models both from the frame and word levels.
Moreover, we strengthened the recognizer’s language model by using the entire history of a dialog session to predict what is likely to come next, effectively allowing the model to adapt to the topic and local context of a conversation.
Microsoft is planning to incorporate this new technology into its products and services such as Cortana, Presentation Translator, and Microsoft Cognitive Services.
Better human-machine interaction
Of course, this achievement opens the door to new challenges. Speech recognition systems don’t work very well in noisy environments with distant microphones or in recognizing accented speech.
Researchers still have a long way to go in teaching computers to understand word meaning and intent. Indeed, the next major frontier for speech recognition software and artificial intelligence is understanding of intent and meaning.
When researchers will have reached this milestone, we will be able to dream of creating human-like robots. Until then, our robots will only have a human-like physical appearance, but they won’t be able to interact with us in the same manner other humans do.
YOU MAY ALSO LIKE: Replika is an AI chatbot that you can use to create your own virtual cloneFollow The AI Center on social media:
Join me as I track the latest progress in AI research.
Latest posts by Maddie Blau (see all)
- True Emoji is an AI app that uses your expressions to create animated emojis - November 21, 2017
- Canada’s first AI exchange-traded fund enters the market - November 2, 2017
- Are you curious to see who’s the smartest AI in the world? - November 1, 2017