The Use of Audio in eLearning

One of the great strengths of eLearning is that it allows learners to work at their own speed. Skillful readers understand and remember text by optimizing their reading speed. They skip, reflect, repeat and pause to digest. This process is central to building new mental models and relating new knowledge to existing knowledge. Even with two fingers on the pause and back buttons, this is much more difficult to control with audio.

So why use audio in eLearning at all?

This blog post looks at the pros and cons of audio, where it helps and where it hinders, and what clients and designers need to bear in mind when designing audio components.

When Is Audio Essential?

There are times when audio is essential to learning. Let’s take a few obvious examples:

Language learners need to hear how a native speaker pronounces the words and stresses the phrases.
Call center staff learning telephone skills need to hear how the conversation develops.
Doctors learning about the various types of heart murmurs need to hear them.
Musicians – and the list goes on.

Can Text and Audio Work Together?

The first crucial point is that even if audio is all words (not music or other sounds) it’s not just text read aloud. Look at it from the learner’s point of view. Audio comes in through the ears, text is read through the eyes, and the two channels are processed differently. Text and audio can work together, but we need to be aware of this fundamental difference between the two media so that they complement each other instead of competing with each other.

An interesting question arises time and time again around audio in eLearning. Which of these is it better to have?

Identical text and audio
Audio only
Text only

Identical text and audio
The research demonstrates that learning from text alone, or from audio alone, is more effective than learning from text and identical audio simultaneously (Clark and Mayer 2003, confirmed by Moreno 2007). The argument is that text and audio together will overload the learner’s sight and sound channels, and that, if they are identical, one medium or the other is redundant.

People read text on the page at a rate of about 300 words per minute. Audio is spoken at about half that speed—150 words per minute—and scripted audio can be even slower. There is, therefore, a mismatch between the two processing speeds. Anyone who has watched a presenter laboriously speaking aloud the bullet points from a PowerPoint presentation will be able to relate.

Audio only
An earlier study showed that a group presented with learning through audio alone performed 64 percent better on assessment than a group that learnt from text and audio together (Kalyuga, Chandler and Sweller 1999).

And what about audio together with graphics? Clark and Mayer go on to argue that audio combined with graphics or animation can improve learning because they use separate cognitive channels, while text and graphics both use the visual channel. This is an argument for using sound and pictures without much screen text. When animation and video are presented at speed, or when the learner is looking at a complex graph which requires a lot of explanation, audio comes into its own.

Reasons for not having audio
Another advantage of text over audio is that it stays where it is. Audio is transient. It puts a much greater load on the learner’s working memory. This is one reason for keeping the text on screen, even if there is also an audio version.

Instructions should always be text, not audio. They need to stay on screen until the learner has attempted the task

Audio as Assistive Technology

Although audio is obviously useful for visually impaired users, it makes sense to deliver this through screenreader software. This can be generated from the on-screen text without additional labor. The advantage is that experienced users can play this at astonishingly high speeds, while recorded audio locks them into the narrator’s natural speed.

Audio can also be a help to those who are dyslexic, and people, both children and adults, who have low levels of literacy. The same is true for users for whom English is a second language. The readability statistics in Word provide some guidance.

The Importance of Audio Quality

Research at Stanford by Nass and Reeves shows that users are more sensitive to the quality of audio than they are to that of video. This may sound surprising, but people are quite unforgiving when it comes to tinny audio with variable sound levels. Learners expect consistently high quality at a consistent volume. Philips, the electronics manufacturer, also found that people perceived video picture quality to be better when higher quality sound accompanied the video.

Audio quality, unlike video, seems to affect the user’s attention, memory and opinion about what is heard. Audio fidelity is therefore much more important than video fidelity, it would seem. Nass and Reeves conclude that, compared with video, ‘for designers of multimedia, audio is a good place to invest. It appears to deliver more psychological bang for the buck’.

The Uses of Audio in eLearning

Despite all these caveats, audio is undoubtedly powerful and it can do several things in eLearning, including narration and voice lip-synch to animation or video, sound effects and music. In learning, narration is by far the most important of these.

Narration
The problem with extensive audio narration in eLearning is that it sounds like written text. That’s because it is written text. Sometimes the writer has no idea it is going to be voiced, so what’s written comes out a little like someone reading a report. Designers must know that they are writing for voiced narration at the start of the process.

Narration needs a good voiceover artist who can deliver the correct intonation at the right pace. Good voiceover artists are in demand for their voices and their ability to do this quickly in the studio. A badly cast voice is worse than no voice at all.

Spoken word
Language courses usually have to get pronunciation across. They can even involve the recording of the learner’s voice in the course, for comparison with a native speaker—and even for acoustic analysis.

Where skills involve the comprehension of the spoken word, such as with call center training or public speaking, spoken word audio might well be a component of the learning.

Learning can also benefit in effectiveness from the voices of experts—such as academics, consultants or senior members of management. It has been shown that learners actually learn more if they feel that the learning is being delivered by someone whose expertise and authority they respect.

Feedback
Another strong feature of the narrative flow of learning programs is feedback in the form of formative or summative assessment. Audio feedback can be quite powerful psychologically. It can be used to reinforce positive feedback, even when there is no core narration in the presentation material.

Be careful when presenting negative experiences, however. Negative events grab attention and wake up the processing system. In fact, experiences that come after negative events are better remembered, so put a negative event first, or up front in a program, to wake learners up.

Sound effects
In general, sound effects have the role of enhancing visuals. This is fine in games and entertainment, but within eLearning, sound effects such as beeps accompanying input have been found to distract learners and have now disappeared.

Moreno and Mayer (2000) compared an instructive, narrated animation on hydraulics, with and without environmental sound effects. Those who saw the version without sound effects scored higher than those who heard the sound version. When background music was added, the results for that group were even worse.

There are, of course, acceptable contexts for sound effects, such as quizzes, specially designed games and educational programs for children.

Music
Music’s primary function is mood. It can set the emotional tone for a program or piece within a program. Fun tunes in early educational programs can stimulate young learners. Music may be fine as a method of arousal at the start of an e-Larning program, but background music is counterproductive. Research points towards NOT having background music in eLearning programm as it results in sensory overload, reducing the effectiveness of the learning. Matt Lobel (2008) highlights how taste in music is subjective and can be intrusive to the learning experience.

Moreno and Mayer (2000) put this to the test with an instructive, narrated animation on hydraulics. The animation was shown with and without background music. Those who saw the version without music scored 20-67 percent higher than those who heard the music version.

Recommendations for Use of Audio

So there is no simple answer to the question of whether text and/or audio should be used in eLearning programs. It depends on the audience. However, given constraints in budget, production skills and need, it would seem that text is still a basic and powerful medium that can be used in many learning contexts. Audio will continue to play an important role though.

Audio can be a necessary component in learning when sounds themselves are the object of the learning e.g. recognizing and analyzing stethoscope sounds, taking inbound calls in a call center, and listening to music or submarine sonar. However, audio is more often considered when the audience is seen as needing a supplement or alternative to text due to visual impairment, low levels of literacy, dyslexia or English being a second language—or simply where the target audience is not receptive to text.

On the issue of using text and audio together, it is better to use narrated audio for animated sequences, but text only for sequences under the user’s control. If you do use both, give the user the choice of switching the audio off.

Narration and feedback are the two most common uses of audio in eLearning. Audio here adds the human touch, and can be used to reinforce correct answers, keeping negative feedback as text.

Music and sound effects may be suitable in educational material for children, and sound effects can be useful in entertainment styles of presentation such as quizzes or games. Otherwise, avoid background music and effects and sounds that accompany right or wrong feedback.

Audio also demands high quality. It is unacceptable to produce cheap, tinny audio at variable volume. This needs good process and reasonable production values. It can also be expensive to update and localize.

If you enjoyed this post, you might also want to check out our ebook ‘How to Create Digital Learning’ That Works’.

Download the book