AI speech recognition software reaches level of parity with human transcribers

AI research

Microsoft recently reached a new milestone in speech recognition: its system achieved a 5.1% error rate. In other words, this new technology recognizes words in a conversation as well as professional human transcribers. Reaching human parity has been a research goal for the last 25 years.

This great achievement comes only one year after the Redmond giant passed another milestone: researchers built a speech recognition system that had a word  error rate of 5.9%. A few months later, Microsoft’s Artificial Intelligence and Research department managed to further improve on these results, reducing the word error rate even more.

Better neural net-based acoustic and language models

Researches managed to achieve these results by improving the neural net-based acoustic and language models already used. More specifically, they introduced an additional convolutional neural network combined with bidirectional long-short-term memory model for improved acoustic modeling. At the same time, the new speech recognition system combines predictions from multiple acoustic models both from the frame and word levels.

Moreover, we strengthened the recognizer’s language model by using the entire history of a dialog session to predict what is likely to come next, effectively allowing the model to adapt to the topic and local context of a conversation.

Microsoft is planning to incorporate this new technology into its products and services such as Cortana, Presentation Translator, and Microsoft Cognitive Services.

Better human-machine interaction

Of course, this achievement opens the door to new challenges. Speech recognition systems don’t work very well in noisy environments with distant microphones or in recognizing accented speech.

Researchers still have a long way to go in teaching computers to understand word meaning and intent. Indeed, the next major frontier for speech recognition software and artificial intelligence is understanding of intent and meaning.

When researchers will have reached this milestone, we will be able to dream of creating human-like robots. Until then, our robots will only have a human-like physical appearance, but they won’t be able to interact with us in the same manner other humans do.

YOU MAY ALSO LIKE: Replika is an AI chatbot that you can use to create your own virtual clone

Follow The AI Center on social media:

Maddie Blau

I strongly believe that Artificial Intelligence is the future of technology. AI research has yielded significant advancements in recent years and this is only the beginning.

Join me as I track the latest progress in AI research.
AI stock exchange
AI in real life
Canada’s first AI exchange-traded fund enters the market

  Canada launched the first artificial intelligence exchange-traded fund with the ticker MIND. The Horizons Active A.I. Global Equity ETF is an investment strategy run by an AI system. The AI algorithm analyzes financial data to extract patterns in order to determine what are the best investments to make. MIND …

smartest AI
AI research
Are you curious to see who’s the smartest AI in the world?

  The AI world is dominated by fierce competition. Well-known tech giants such as Google, Microsoft, Intel, NVIDIA and other are working at full speed to push AI research forward and innovate the field. The question is: who’s the smartest AI in the world? This is a very difficult question …

google assistant natural sound voice
AI research
Google Assistant’s voice is now more realistic thanks to WaveNet

Getting speech synthesis algorithms to sound natural is a hard nut to crack. AI researchers have been struggling to make digital assistants sound more natural for years, and the results are indeed impressive. However, there’s still something that doesn’t sound right and users can always tell if they’re talking to …