|Deep Speech 2 architecture|
(click to enlarge)
The paper is, "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin" by Baidu Research – Silicon Valley AI Lab∗, Andrew Ng, etal, pushed to arXIV recently.
Google with Geoffrey Hinton, Andrew Ng, and others started much of the ball rolling with outstanding ImageNet results that are now human competitive. The father of getting Fukushima's Neocognitron from 1980 to evolve to learning as a convolutional neural net, Yann LeCun, has his team at Facebook beating human level performance on facial recognition. So, automatic speech recognition (ASR) getting to human level performance is not unexpected, but it is still a major achievement with enormous ramifications for the way we will work and interact within the world.
From the introduction,
"The Deep Speech 2 ASR pipeline approaches or exceeds the accuracy of Amazon Mechanical Turk human workers on several benchmarks, works in multiple languages with little modification, and is deployable in a production setting. It thus represents a significant step towards a single ASR system that addresses the entire range of speech recognition contexts handled by humans."
|A nice grab from the paper showing human competitive performance.|
(click to enlarge)
"End-to-end deep learning presents the exciting opportunity to improve speech recognition systems continually with increases in data and computation. Indeed, our results show that, compared to the previous incarnation, Deep Speech has significantly closed the gap in transcription performance with human workers by leveraging more data and larger models. Further, since the approach is highly generic, we’ve shown that it can quickly be applied to new languages. Creating high-performing recognizers for two very different languages, English and Mandarin, required essentially no expert knowledge of the languages.
We believe these techniques will continue to scale, and thus conclude that the vision of a single speech system that outperforms humans in most scenarios is imminently achievable."Much still to be done, but that is just work. Exciting times.