How Math is Improving Speech Recognition Technologies

Mathematical simulation reshapes how we make sense of speech technologies

Automatic speech recognition is an enormously successful application of statistical pattern recognition. Every day, millions of people use applications based on this technology to solve problems that are most naturally accomplished by interacting with machines through speaking. However, the most successful of these applications have continued to be rather limited in scope, because, although useful, speech recognition can be maddeningly unreliable. Dr. Steven Wegmann, of the International Computer Science Institute (ICSI), hopes to understand in a deep, quantitative way, why the methodology used in nearly all speech recognizers is so brittle. His research will have broad impacts on society. For example, pervasive and accurate automatic speech recognition has the potential to transform society in many positive ways, including providing better access to information for those who find it difficult or even impossible to interact with computers using keyboards such as the elderly, the physically disabled, the vision impaired, or the hearing impaired, to improving technologies we use each day that rely on speech recognition technologies, like Siri, to improving literacy.

As a mathematician, Dr. Wegmann's unique perspective uses simulation and novel sampling processes to generate pseudo test data that mimic true data in order to measure recognition performance. The results of such research are startling enough that they should provoke future studies and a reexamination of where to improve statistical models that we use in speech recognition in order to create more robust recognition performance abilities. As a pure mathematician, Dr. Wegmann uses complex theoretical tools to create solutions for real-world applications. Aside from research, he is committed to educating future scientists. As the leader of the Speech Group at ICSI, Dr. Wegmann continues to lead students and academics in revolutionary work in the subfields of diagnostic research, speaker recognition, and spoken keyword search. Through the combination of his research and his work within the academic community, Dr. Wegmann hopes to one day use statistical models to make the field of speech recognition more accessible for communities outside of academia.

Current research includes:

  • Diagnostic Research: Dr. Wegmann is looking at why applying deep neural nets to the speech recognition problem has been so successful. He and his team hope to gain insights that will further assist the development of speech recognition technologies.

  • Spoken Password Detection: Spoken password detection technologies allow a speaker to verbally submit a password while the computer system determines if the speaker should be allowed access based on its assessment of the password and the speaker's identity. Dr. Wegmann is building a statistical framework that connects recent advances in big data analysis to automatic speaker recognition that he and his team hope will improve spoken password detection.

  • Language Models: Dr. Wegmann hopes to understand human performance in relation to the statistical language model problem. He and his team hope to use crowdsourcing and big data analytics to "break the logjam in language model research."

Bio

Dr. Steven Wegmann has worked at industrial research laboratories on problems in speech processing since 1994, holding positions at Dragon  Systems, Lernout & Hauspie, VoiceSignal Technologies, Nuance Communications, and Cisco Systems. He has been a staff researcher at ICSI since 2010 and began leading the Speech Group in 2013. His current research interests are in the areas of automatic speech recognition, diagnostic analysis, speaker recognition, and low resource spoken term detection. Earlier in his career, he was a mathematician who specialized in algebraic topology. He obtained his doctorate in mathematics at the University of Warwick while he was a Marshall Scholar.

Dr. Wegmann started his career in academia as an algebraic topologist. He was attracted to this field and pure mathematics in general by the beauty and difficulty of the problems he worked on. After about eleven years as an academic, he discovered that he needed to work on more practical research problems. Almost by accident, he was connected with Dragon Systems, a small startup that was developing products based on automatic speech recognition. From 1994, when he began work at Dragon Systems, until 2012, Dr. Wegmann has worked in a variety of different businesses within the corporate world including, VoiceSignal and Nuance Communications. In 2012, he left the corporate world to join the International Computer Science Institute (ICSI) where he became the speech group leader in 2013.

In his free time, aside from research, Dr. Wegmann is an avid cyclist. He  has enjoyed commuting to work via his road bike---even during the harsh winters when he lived in Boston. In addition, he loves to cook and holds the title as "household cook" in his family.

Website: https://www.icsi.berkeley.edu/icsi/people/swegmann 

In the News

Invited and Keynote speakers - 4th day

Automatic Speech Recognition and Understanding Workshop

Awards

ASRU Best Paper Award, 2013

for "The Tao of ATWV..."

Tibbetts Award, 1998

for SBIR "Semi-Automated Speech Transcription Systems at Dragon Systems"

Marshall Scholar, 1980-1983

Phi Beta Kappa,1979