“Text-to-Speech vs Human Narration for eLearning.” eLearning Technology, Tony Karrer, September 14, 2010, downloaded from However, because we found there can be noticeable variation between voices using the same engine, and even within the same voice from one passage to the next, we urge anyone considering TTS to evaluate the product thoroughly, across a wide sample of phrases. Virtual Speaker and Acapela Box by the Acapela Group In our sample, which is not comprehensive, we found the following products to be viable based on quality of voices, price, and ease of use: Some others are getting close to human-sounding. After evaluating a variety of sources and voices, we feel the ones that ship with Adobe Captivate are acceptable for short passages. We believe the quality, price, and ease of use are reaching a point where text-to-speech is becoming a viable alternative to recording human voices for certain narration. Thus, we urge anyone considering TTS to check this carefully. Based on the responsiveness to our inquiries, technical support could range widely. Technical supportįinally a critical factor in anyone’s use of TTS is technical support. Note: We were able to adjust some pronunciation by changing spelling and punctuation in a trial-and-error fashion. Rather, some of them use a SDK (Software Developer Kit) and are intended for use by developers only. We found that several TTS products do not have a graphical user interface. Thus it is essential that if TTS is going to work, it must be very intuitive to tweak a voice’s inflection and punctuation. In our small eLearning shop, no one specializes in a particular skill or tool. Either way, additional voices are available for an additional fee. Fees for this kind of service varied from $2,500 per year for the engine and three voices to a one-time fee of $1,100 for the engine and two voices. The other model is based on licensed downloads of the engine and voices. In our sample, fees for this kind of service ran between $7.50 and $11.00 per finished minute. Most manufacturers who use this model base their fee on number of finished minutes of audio. (See note under Ease of Use.) The user then downloads the finished product as an audio file. The user adjusts pronunciation and inflection until the sound is satisfactory. Text is entered on the host website and read by the selected voice. TTS manufacturers seem to use one of two general business models.
We found that the same package was priced quite differently whether it was being licensed for individual use, internal distribution on an intranet, or commercially. Besides one voice’s diction sounding quite different from another, we also found that a voice could vary within itself depending on the passage. Voice quality ranged from highly robotic to amazingly human-like. All voices were judged using the same passage from a script in one of our eLearning courses. Typical options include male and female personalities along with accents such as American, British, and Australian. In addition to these companies, all of whom specialize in TTS Services, we evaluated the voices that come with Adobe Captivate. We evaluated TTS engines and voices from the following TTS engine manufacturers: We think some of the disparity stems from the wide variance of quality not only between TTS engine manufacturers, but even between different voices that use the same TTS engine. Other articles were generally in favor of it under certain circumstances. Posts on an ASTD eLearning discussion group were unanimously against using TTS. On the other hand, there are the elements of cost, suitability of voices, and ease of use. One source found that after several minutes, learners viewed it as listening to someone with an accent. On one hand, we learned that some learners, our own employees included, can accommodate TTS as long as they don’t have to strain to understand it. Can TTS really sound human enough to be practical? After spot-checking the TTS market for over a year we recently took a more in-depth look. Of course, the trade-off is voice quality. TTS, if viable, could make our production more efficient. The use of live narrators for eLearning can be a lengthy and resource-intensive process, both for initial production and for subsequent revisions. We want to report what we learned about its viability from our view as internal eLearning developers. One potential voice source is text-to-speech, or TTS. In our previous article we discussed the use of audio narration for our online courses.