Menu Close

Talking with Computers.

A number of science fiction movies, from the famous 2001: A Space Odyssey (1968; HAL computer speaking), Star Wars ( 1977, speaking robots: R2D2 & C3PO), I-Robot (2004; VIKI computer and Sonny android) may have accustomed you to speech and also voice recognition and intelligent responses by computers and Androids. Now you get this on your laptops, tablets & phones. Here is how it evolved in the real world. This is a short timeline.

1950s and 1960s: Speech recognition focusses on understanding  single digits. This is expanded to recognition of a few  vowels & consonants. In 1952 Bell Laboratories’ “Audrey” system, recognized digits spoken by a single voice. In 1962 IBM’s  “Shoebox” machine,  could understand around 16 words spoken in English. In the  US, Japan, Western Europe and the Soviet Union labs work on vowels and consonants recognition was conducted.

1970s. Recognition of around & 1000 words is achieved. In the US the Defense Department, DARPA invested heavily in Speech Understanding Research (SUR) program. Carnegie Mellon’s “Harpy” speech-understanding system managed to understand around 1000 words.  Part of the successes were related to improvement in search technology. The first speech recognition companies were founded: e.g. the Threshold Technology of Bell Labs. In the early 70s amongst various natural language processing systems W.Woods’ LUNAR system answered questions about rocks brought back from the moon by the Apollo missions.  The microprocessor  revolution brought experimentation of voice control and spoken communication with a computer within easy reach of many computer enthusiasts, as witnessed by articles in Byte Magazine (June 1978).  

Articles on speech recognition in the Byte magazine June 1978 issue.

1980’s.  The use of the hidden Markov model (HMM) with predictive capacities reduces errors. Word by word dictation is achieved.  New approaches based on the work of the Soviet mathematician Andrei Markov appeared. Introduction of new algorithms and in particular Markov’s predictive “hidden Markov model” improved  recognition. An example was the Sphinx program from Carnegie Mellon (improved over years). Speech recognition started appearing  in commercial applications (e.g. the 1985 Kurzweil text-to-speech program) and even toys ( the Worlds of Wonder’s Julie doll (1987),  could be trained to respond to children’s voices). These programs only took discrete word by word dictation.

1990’s.Continuous speech recognition is successful. Commercial speech recognition programs appear. In 1990, the first consumer speech recognition product, Dragon Dictate appears and a few years later the improved Dragon Naturally Speaking. This program recognized (slow) continuous speech after some training. In 1996 IBM launched its MedSpeak with continuous speech recognition. In 1996, BellSouth in the US introduced the first voice portal: VAL, which was a dial-in interactive voice recognition system that was supposed to provide  you information based on what was said on the phone.

Dragon dictate 2-5

2000s. Speech recognition incorporated into Microsoft and Google products. NSA uses speech recognition. Unsurprisingly the US National Security Agency began using speech recognition to isolate keywords.  In 2007 Microsoft incorporated speech recognition into Windows VISTA and Google introduced GOOG-411, a telephone-based directory service. Google launched an application for Apple’s iPhone that used voice recognition technology to query Google’s search engine as well as look for contacts on the phone.

2010s. Intelligent voice recognition digital assistants appear: SIRI, S-Voice, Cortana.  In 2011 Apple announces Siri, a digital personal assistant, with the ability to understand the meaning of what is said and act accordingly.  This was followed by appearance of somewhat similar S-Voice  (Samsung, 2012) and  Cortana (Microsoft, 2014).