What I Got Wrong: Misunderstanding the Testing Effect

18
Mar
2016

Arnold Glass

After reading a passage for the first time, asking answering questions about the passage produces better memory for it than reading the passage repeatedly. This is called the testing effect.

Experimental psychologists who study learning have known of the testing effect since at least 1917, when it was reported by Arthur Gates. Recently, several cognitive psychologists have demonstrated that through the use of personal response systems (clickers) to ask questions throughout a lesson, the testing effect can be integrated into classroom instruction as an effective instructional methodology (Glass & Sinha, 2013, Multiple-choice questioning is an efficient instructional methodology that may be widely implemented in academic courses to improve exam performance. Current Directions in Psychological Science 22, 471 – 477). They suggest that the long delayed implementation of the testing effect as an instructional technology required the development of personnel response systems in the twenty-first century. This is not quite true.

While personal response systems provided a new opportunity for integrating questioning with instruction, this was not the first opportunity to do so. In the 1920’s, first Sidney Pressey and then later, B. F. Skinner invented special-purpose learning machine technology to integrate questioning with instruction. This technology was never widely adopted.

One may question whether the machine learning technology was too cumbersome and expensive for the benefit it delivered. However, the final version of the machine learning technology was neither cumbersome nor expensive nor a machine. It was a programmed instruction textbook. In a programmed instruction text fill-in-the-blank questions were integrated into the text. For example, a sentence later in a paragraph might contain a blank space, indicated by an underline, which referred to a fact statement earlier in the paragraph.

The student’s task was to recall and write in the missing word and then immediately check it by referring to a list of answers in the margin at the end of the page. The answers were to be kept covered by a bookmark controlled by the student and only revealed, one at a time, as each question was answered. In the first psychology course I ever took, in 1967, programmed instruction texts were used. However, that was their last gasp. A few years later they disappeared and have never been seen again.

“It is only when testing is delayed that question-answering produces better performance. So, the effect of question-answering is not on initial learning but on long-term retention”.

If integrated questioning produces better memory for a passage, why did programmed instruction texts fail to gain an audience?

In retrospect it may have been because the programmed instruction texts did not appear to produce better memory to its target audience of students and instructors. A few years later I was assisting a creative scientist by the name of Roy Lachman who decided to investigate the effectiveness of programmed instruction texts. He took a chapter from a programmed instruction text and created a new version in which all the blanks were filled in so what was previously a fill-in-the-blank question was now a declarative sentence with a key word underlined.

Half the students in the experiment received the original form of the chapter, which they had to respond to in the usual way: generating and then checking each answer.

The other half of the students simply read the chapter once. Immediately afterwards the students received a follow-up exam on the chapter and also were asked how much they enjoyed the study experience. There was no difference in exam performance and merely reading the passage was rated as a little more enjoyable because generating and checking each answer in the programmed text was perceived as tedious. The study was never published. After the fact, these results did not surprise me.

My own experience with programmed texts as a student did not leave me with the impression that I had learned any faster or any more and the constraint of inserting fill-in-the-blank questions had made the writing style very mechanical. Without clear evidence that such boring texts were producing better learning, there was no reason for instructors to continue to use them.

However, there was a fatal flaw in our experiment on programmed instruction: we only tested immediately retention.

In 2006, Henry Roediger III and Jeffrey Karpicke (Test-enhanced learning: taking memory tests improves long-term retention, Psychological Science, 17, 249 – 255) compared the effects of repeated study versus question-answering on tests 5 minutes, 2 days, and one week later. Repeated study produced better performance 5 minutes later but question-answering produced better performance 2 days and one week later.

It is only when testing is delayed that question-answering produces better performance. So, the effect of question-answering is not on initial learning but on long-term retention. Of course since the purpose of education is long-term retention this makes question-answering a valuable instructional methodology.

However, this was not immediately intuitively evident to their subjects any more than it was evident to me or my subjects over 30 years earlier. It was repeated studying that increased the students’ confidence in their ability to remember the material.

This tale illustrates two points. The first point is the difference between a competent experimenter and a clever experimenter. A competent experimenter is one who is able to design an experiment to answer the question he is asking. A clever experimenter is one who, usually on the basis of a wide knowledge of previous relevant results, asks the important question in the first place.

In this case, the important question was the relationship between retention interval and the effect of repeated study and question answering. Roediger and Karpicke were clever enough to ask it and Roy and I were not. The second point is that in evaluating an instructional methodology the details of its implementation and testing are important and the devil is certainly in them.