IBM strives for super-human speech recognition

We recently caught up with Dr. David Nahamoo, IBM’s speech technology guru to hear about what he calls “super-human speech recognition.” No, he’s not talking about Spidey or Superman, but rather a project meant to substantially improve the quality, dialog management, and usability of speech technology by the end of the decade — for dictation, call centers, cars, and a broad set of other applications with embedded computing power. One of his goals is to surpass a human for real-time dictation such as a lecture, phone conversation, or broadcast — and he would like to do that for 50 languages with the same computer.

Before fully speech-enabled applications become ubiquitous, Nahamoo says that the technology must cross a simplicity threshold that would open it up to more developers. The speech recognition community converged around Voice XML about 5 years ago, thereby abandoning proprietary interfaces. Nahamoo feels that the next step is for providers like IBM need to encapsulate design principles and behaviors in templates. Sound familiar? Like client/server and web development tools (think Visual Basic and Dreamweaver, respectively), we’d say that speech needs its own GUI-based development environment.