Where Operant Conditioning Went Wrong
Written by: J. E. R. Staddon
J.E.R. Stadden discusses adaptive behavior and learning using pigeons as an example.
Operant conditioning is BF Skinner’s name for instrumental learning, for learning by consequences. Not a new idea, of course. Humanity has always known how to teach children and animals by means of reward and punishment. What gave Skinner’s label the edge was his invention of a brilliant method of studying this kind of learning in individual organisms. The Skinner box and the cumulative recorder were an unbeatable duet.
Three things have prevented the study of operant conditioning from developing as it might have: a limitation of the method, over-valuing order and distrust of theory.
The method. The cumulative record was a fantastic breakthrough in one respect: it allowed the study of the behavior of a single animal to be studied in real time. Until Skinner, the data of animal psychology consisted largely of group averages – how many animals in group X or Y turned left vs. right in maze, for example. And not only were individual animals lost in the group, so were the actual times – how long did the rat in the maze take to decide, how fast did it run? What did it explore before deciding?
But the Skinner-box setup is also limited – to a single response and to changes in its rate of occurrence. Operant conditioning involves selection from a repertoire of activities: the trial bit of trial-and-error. The Skinner-box method encourages the study of just one or two already-learned responses. Of the repertoire, that set of possible responses emitted for “other reasons” – of all those possible modes behavior lurking below threshold but available to be selected – of those covert responses, so essential to instrumental learning, there is no sign.
Too much order? The second problem is an unexamined respect for what might be called “order at any price”. Fred Skinner frequently quoted Pavlov: “control your conditions and you will see order.” But he never said just why “order” in and of itself is desirable.
The easiest way to get order, to reduce variation, is to of course take an average. Skinnerian experiments involve single animals, so the method discourages averaging across animals. But why not average all those pecks? Averaging responses was further encouraged by Skinner’s emphasis on probability of response as the proper dependent variable for psychology.
Another way to reduce variability is negative feedback. A thermostatically controlled HVAC system reduces the variation in house temperature. Any kind of negative feedback will reduce variation in the controlled variable. Operant conditioning, almost by definition, involves feedback. The most-studied operant choice procedure – concurrent variable-interval schedule – involves negative feedback, because the more time is spent on one choice the higher that payoff probability for switching to the other.
As technology advanced, these two things converged: the desire for order, enabled by averaging and negative feedback, and Skinner’s idea that response probability is an appropriate – the appropriate – dependent variable. Variable-interval schedules either singly or in two-choice situations, became a kind of meter. Response rate on VI is steady – no waits, pauses or sudden spikes. It seemed to offer a simple and direct way to measure response probability. From response rate as response probability to the theoretical idea of rate as somehow equivalent to response strength was but a short step.
Theory Response strength is a theoretical construct. It goes well beyond response rate or indeed any other directly measureable quantity. Unfortunately, most people think they know what they mean by “strength”. The Skinnerian tradition made it difficult to see that more is needed.
A landmark 1961 study by George Reynolds illustrates the problem (although George never saw it in this way). Here is a simplified version: Imagine two experimental conditions and two identical pigeons. Each condition runs for several daily sessions. In Condition A, pigeon A pecks a red key for food reward delivered on a VI 30-s schedule. In Condition B, pigeon B pecks a green key for food reward delivered on a VI 15-s schedule. Because both food rates are relatively high, after lengthy exposure to the procedure, the pigeons will be pecking at a high rate in both cases: response rates – hence ‘strengths’ – will be roughly the same. Now change the procedure for both pigeons. Instead of a single schedule, two schedules alternate, for a minute or so each, across a one-hour experimental session. The added, second schedule is the same for both pigeons: VI 15 s, signaled by a yellow key (alternating two signaled schedules in this way is called a multiple schedule). Thus, pigeon A is on a mult VI 30 VI 15 (red and yellow stimuli) and pigeon B on a mult VI 15 VI 15 (green and yellow stimuli). In summary, the two experimental conditions are (stimulus colors above):
R R Y
Experiment A: VI 30, mult VI 30 VI 15
G G Y
Experiment B: VI 15, mult VI 15 VI 15
Now look at the second condition for each pigeon. Unsurprisingly, B’s response rate in green will not change. All that that has changed for him is the key color – from green all the time to green and yellow alternating, both with the same payoff. But A’s response rate in red, the VI 30 stimulus, will be much depressed, and response rate in yellow for A will be considerably higher than B’s yellow response rate, even though the VI 15-s schedule is the same in both. The effect on responding in the yellow stimulus by pigeon A, an increase in response rate when a given schedule is alternated with a leaner one, is called positive behavioral contrast and the rate decrease in the leaner schedule for pigeon A is negative contrast.
The obvious conclusion is that response rate alone is inadequate as a description of the ‘strength’ of an operant response. The steady rate maintained by VI schedules is misleading. It looks like a simple measure of strength. Because of Skinner’s emphasis on order, because the averaged-response and feedback-rich variable-interval schedule seemed to provide it and because it was easy to equate response probability with response rate, the idea took root. Yet even in the 1950s, it was well known that response rate can itself be manipulated – by so-called differential-reinforcement-of-low-rate (DRL) schedules, for example.
Bottom line: response rate does not equal response strength; hence our emphasis on rate may be a mistake. If the strength idea is to survive the demise of rate as its best measure, something more is needed: a theory about the factors that control an operant response. But because Skinner had successfully proclaimed that theories of learning are not necessary, real theory was not forthcoming for many years.