Instrumental Learning
The consequence of responding is the most important element in instrumental learning. In this type of learning, the behavior is instrumental in producing a change in the environment, and that environmental change in turn affects the probability of the behavior that produced it.
Thorndike on Outcomes
Questions about the nature of instrumental processes began with Edward Thorndike. According to Thorndike's casual observations in his "cat in the puzzle box" experiments, inappropriate emotional responses aimed at escaping gave way to solution-oriented responses that produced reward. Thorndike theorized that it is the connection between a stimulus and a response that determines behavioral probability. The mechanisms for habit growth are contained in the law of effect. This principle states that it is the effect of a response on the environment that determines whether or not the stimulus-response connection will form. The law of effect corresponds to positive reinforcement and punishment.
Skinner and Operant Conditioning
B.F. Skinner's version of instrumental conditioning, called operant conditioning, involves environmental control of responses. In his experiments involving a rat being placed in a Skinner box, the light is viewed as a discriminative stimulus because it serves as a cue to indicate the particular conditions under which response will be reinforced. Reinforcement controls responding by selectively strengthening behaviors that act on the environment to produce change. The key is that the outcome of responding determines the future probability of behavior. The best conditioning occurs when reinforcers are given immediately.
Shaping
Through the process of shaping, experimenters involved with training animals to perform can decrease the time required to learn the task. Shaping is achieved by reinforcing approximations to a goal behavior in a step-by-step fashion.
Basic Training Behaviors
In positive reinforcement, also known as reward training, the emission of an operant response is followed by stimuli called positive reinforcers that make the actions that produce them more probable.
When a response results in the production of an aversive stimulus, the conditioning procedure is called punishment. The effect of punishment is to suppress responses that have led to it.
Omission training is one alternative to the use of punishment. In this procedure, a positive reinforcer is given as long as an unwanted behavior does not occur. When the unwanted behavior occurs, it results in the omission of the next scheduled reward.
Negative reinforcement is a training procedure wherein operant behaviors terminate or postpone the delivery of aversive stimuli. Response probability increases.
Stimulus Control
Generalization and discrimination are also of concern in operant conditioning, but they are usually dealt with as special cases of stimulus control. This concept presumes that only certain environmental events become defining features in conditioning.
Secondary Reinforcement
Because reinforcers such as food and water are natural and have biological relevance, they are called primary reinforcers. They tend to be few in number, and they generally operate uniformly across species. When an event acquires reinforcing properties because of an association with a primary reinforcer, the event is labeled a secondary reinforcer.
Generalized Reinforcers
Generalized reinforcers are secondary reinforcers that are associated with a wide variety of other reinforcers such as food, clothing, shelter, and luxury items. A major advantage is that motivation to respond is almost guaranteed.
Chaining
Chaining occurs when a subject performs several different behaviors in sequence in order to obtain a reward. One example is that of a rat having to performs a series of tasks before a reward is available. Chains can often be very complex and complicated.
Schedules of reinforcement
The schedule of reinforcement is probably the most heavily researched topic in all of operant conditioning. The significance of schedule variables comes from the fact that unique rates and patterns of responding are produced by selected schedule conditions. The two major types of reinforcement schedules are continuous reinforcement and intermittent reinforces. Continuous schedules are defined as schedules that reinforce the occurrence of every operant behavior that satisfies an accepted criterion. There are basically four types of intermittent schedules.
In a fixed-ratio schedule, behavior is rewarded after a fixed number of responses have been made. An example of a FR schedule is an industrial worker whose wages are adjusted according to unit output.
In a fixed-interval schedule, the subject is rewarded for the first response that occurs after a specified period of time has elapsed. One example of a fixed-interval schedule is waiting for a fruit to ripen on a tree.
Under variable-ratio schedule conditions, the number of responses required for reinforcement changes, depending on where the subject is in the schedule. A familiar example of a VR schedule is provided by slot machines at gambing casinos.
As one of the most widely used schedules in basic operant conditioning research, variable-interval schedules reward responses that occur after one given interval of time, then a different time interval, and still a different time interval until the end of training has been reached. Generally, ratio schedules occasion higher rates than fixed schedules. This pattern still holds when the subject is placed on operant extinction. All intermittent schedules will lead to greater resistance to extinction than a continuous reinforcement schedule. This phenomenon is often called the partial reinforcement effect.
The Premack Principle
David Premack has stressed that reinforcers may be better seen as responses. Premack's claimed that reinforcers are relative, being effective with some responses and not others. This expression is referred to as the Premack principle. Implicit in this statement is that reinforcers can be moved up and down a hierarchy by environmental manipulation. Most any behavior can be made highly probable or improbable if the the external conditions are structured properly.
