How we talk to computers THE SCIENCE OF SPEECH RECOGNITION Ever dream of the day when you could control your house, car, and robotic butler with the sound of your voice? What was once strictly in the realm of science fiction is creeping ever closer to becoming reality. Automatic speech recognition (ASR) is the technology computers use to understand, transcribe, and carry out functions from spoken commands. When you tell your iPhone to "Call Martha," you're using ASR. ASR & you A FEW PLACES YOU CAN FIND ASR IN EVERYDAY LIFE (AND OUT OF IT) GLOBAL POSITIONING SYSTEMS (GPS) MILITARY AIRCRAFT CONTROL MEDICAL & LEGAL TRANSCRIPTION Typically, switching to an ASR system cuts 30% of costs PHONE ANSWERING SERVICES in medical transcription. Annoying as they may be, auto- mated phone services are SR at its most effective, usually coming close to 100% accuracy even for untrained users. SMART PHONES Android's highly acclaimed voice search is a great example of SR in mobile devices. DIGITAL DICTATION Dictation software has seemed to improve greatly in the last ten years, though major companies don't advertise their accuracy rates. Nuance provides ASR services for 65% of Fortune 100 companies and is the industry leader in medical and legal tran- scription, as well as personal ASR. How ASR works: WHEN YOU TALK TO A COMPUTER, THE SOFTWARE BREAKS IT DOWN INTO SIGNALS TO UNDERSTAND IT. STEP MULTIPLE SPEAKERS & BACKGROUND NOISE Computers are not very good at separating voices from other voices or background noise. A Microsoft Vista demonstration in 2006 infa- mously went awry when the speech recognition software misunder- YOU SPEAK COMPUTER LISTENS stood the presenter. DEAR MOM COMMA..FIX AUNT...DELETE THAT, DELETE, SELECT ALL DEAR AUNT, LET'S SET SO DOUBLE THE KILLER DELETE SELECT ALL MEASURES WAVE NORMALIZES SPEED REMOVES NOISE STEP THE COMPUTER DIVIDES THIS SIGNAL INTO PHONEMES, THE SOUNDS THAT MAKE UP WORDS. REGARDING PHONEMES They can be as short as a few 100ths or 1000ths of a second. There are roughly 40 phonemes in English. Taa, spoken in Botswana, has 112 phonemes, more than any other language. SIGNAL SPLITS 00 IS NOW PHONEMES THE COMPUTER USES STATISTICAL PROBABILITY TO DETERMINE THE WORD BASED ON DIGITAL CONTEXT. STEP 83 MEANING AND CONTEXT GOOD GOOEY GOON GOD GET GREAT Computers cannot yet reliably deter- mine the meanings of words - they can only know how words usually fit together. Homonyms - words like "air" and "heir" which rhyme but mean different things - are notoriously 6. THE COMPUTER FINDS WORDS THAT HAVE MATCHING FIRST PHONEMES. difficult for computers to differentiate. GOOD GOOEY GOON 00 EXCUSE ME WHILE I KISS THE SKY IT LOOKS AT THE SECOND PHONEMES TO SEE IF THEY ALSO MATCH. GOOD EXCUSE ME WHILE I KISS THIS GUY USING THE LAST PHONEME, IT FINDS THE WORD. THIS SYSTEM RELIES ON THE DATABASE OF WORDS (OR VOCABULARY) OF THE COMPUTER. FOR COMPARISON: STEP AVERAGE PERSON 10,000 WORDS MIKE TYSON 20,000 WORDS ENGLISH LANGUAGE 600,000 WORDS GOOGLE LANGUAGE CORPUS 13,600,000 WORDS SPEECH PROGRAM 60,000 WORDS COMPUTERS HAVE COME A LONG WAY IN SPEECH RECOGNITION BUT THEY'RE STILL NOT QUITE UP TO SPEED. INFO NO FURTHER IMPROVEMENT 1999- 2001 1993 1995 ALWAYS 10% ACCURACY 48% ACCURACY 81% ACCURACY 96% ACCURACY THE Future of ASR ..................................... GOOGLE'S GOT PLANS FOR YOU ASR systems get better the more data they have, because there is more to base their statis- tical models on. The reason Google's ASR systems work so well is that they store every search term ever typed or spoken into Google, and determine probability based on commonality of searches. GOOGLE VOICE SEARCH Searches spoken terms GOOGLE TRANSLATE Translates into 50 languages [email protected] Voice controlled appliances LIE-DETECTING ATM WATSON Sberbank, the biggest retail bank in Russia, has developed an ATM that Jeopardy's all-star is composed of 90 IBM servers working with 200 million pages of content across 4TB of storage. Watson analyzes keywords and phrases in each question, then calculates the probability of various answers, just like uses speech recognition for lie-detection by testing the customer's responses to questions against a database of in ASR. The only difference is that Watson's questions are interrogation recordings in which people were lying. delivered by text message instead of voice. CREATED BY: MEDICALTRANSCRIPTION.NET THANK YOU content/uploads/37468_Nuance_Exec_leave_behind_PDF.pdf ecognition&st=cse %20recognition&st=cse&pagewanted3D1 2003.pdf THIS IS A GREATLY SIMPLIFIED EXPLANATION OF HOW SPEECH RECOGNITION TECHNOLOGY WORKS. FOR MORE INFORMATION, PLEASE CONSULT OUR SOURCES. ...................

Speech recognition on consumer electronics.


