Operant conditioning The basics PDF

Title	Operant conditioning The basics
Course	Human Learning
Institution	Western Sydney University
Pages	5
File Size	116.9 KB
File Type	PDF
Total Downloads	34
Total Views	170

Preview

CLICK TO PREVIEW PDF

Summary

Human learning week 5 lecture notes about operant conditioning...

Description

Week 6 Operant conditioning The basics Part 1: What is operant conditioning? Operant conditioning is a form of learning whereby behaviour is strengthened or weakened by its consequences. By learning the relationship between the behaviour and its consequences it becomes possible to operate on the environment to produce or avoid those consequences (hence operant conditioning). The behavior is instrumental in producing the events that follow it (instrumental conditioning). Behaviour which is followed by desirable consequences (reward) is strengthened and behaviour which is followed by aversive consequences ( punishment) is weakened. Operant versus Classical conditioning Classical conditioning – Modification of reflex behaviours e.g. salivation, nausea, eye blink, anxiety. There learns to be a contingency relationship between UCS and CS. Operant conditioning – Modification of voluntary behaviours. In operant conditioning there must be a contingent relationship between the behaviour and the reinforcer. For example, school refusal can be influenced by classically conditioned fear and by operant conditioning of avoidance behaviours. How do we study operant conditioning? Discrete trial procedure: -

It has a start and an end with one response/each trial and the reinforcement usually occurs at the end.

Free operant procedure: -

The response can be made any time and reinforcement can be scheduled according to the response.

Part 2: Reinforcers Skinner empirical definition – Reinforcer -

Skinner objected to use the of the term reward and proposed the term reinforcer instead. Reinforcer is termed as an object or event that follows a response and increases the rate of that response. What acts as a reinforcer is idiosyncratic (unique)

Primary reinforcer

Primary reinforcer are innately effective. They are stimuli that satisfy a biological need, and they don’t need to be through learning experiences in order to be effective (they can be effective even if we never encountered them). Examples of primary reinforcer are food and water, sensory stimulations, drugs that produce a high or reduce pain. The effectiveness of primary reinforcers is influenced by motivational state. Secondary reinforcer Secondary (conditioned) reinforcers acquire their ability to reinforce through association with a primary reinforcer. For example, money because it gives us access to primary reinforcing biological needs. Secondary reinforcers are often used in dog training. The sound of a clicker is first paired with a food treat then it used to reinforce behaviour. Intrinsic and extrinsic reinforcers Extrinsic reinforcement comes from the external environment e.g. being praised, winning an award. Extrinsic reinforcement can undermine intrinsic reinforcement in certain circumstances. If justification of doing the task is to receive an extrinsic reinforcer then this undermines intrinsic reinforcer. Unlikely to do something for fun if you usually get paid for doing it. Intrinsic reinforcement comes from inside the person e.g. feelings of achievement, finding the activity pleasurable. Token economy Token economy uses tokens to reinforce desired behaviours. Tokens can be exchanged for access to desired items or activities, e.g. television time. In order for the tokens be effective, tokens must be distributed consistently, immediately following target behaviour and can’t be too easy or too hard to earn. Shaping Shaping uses successive approximations to train the desired behaviour. it can be very laborious to train but can have amazing results. Shaping involves conditioning a behaviour to master a complex task.

Part 3: Edward Thorndike In order to systematically study animals intelligence Thorndike presented animals with problems to solve. He would give them the same problem again and again to see whether performance improved over time. He studied animal learning by examining how experience effected animal behaviour.

Thorndike’s puzzle box -

Cats appeared to learn the response which allowed to escape the box by accident. However, the response which allowed to escape the box was repeated and responses which did not gradually decreased in frequency.

Thorndike’s law of effect Responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely to occur again in that situation. Thorndike’s stamping- in -

-

His view of the effects of reward and satisfying effects are influenced by the strength between associations of a stimulus and response. Moreover, the reward stamps in/forms in the association. That is, responses to stimuli that are followed by a satisfier tend to be stamped in.

Part 4: B.F Skinner -

Skinner built the traditions of Pavlov, Watson, and Thorndike in systematically studying behaviour. Radical behaviourist – He thought that thoughts, emotions, and other internal mental activity should be excluded from the analysis/theorising of psychology.

Skinner box -

Skinner box: Automated delivery of the reinforcer and recording of the response. Allowed for the observation for subtle differences in the delivery of reinforcers influenced behaviours. Skinner box allows us to examine how the consequences that follow a behaviour influence the future probability of that behaviour. Skinner discovered that how the reinforcer is scheduled to occur, that is, the number of responses that the animal makes before reinforcement is delivered, or the time following the previous reinforcer that a subsequent reinforcer becomes available, can have profound effects on the pattern of behaviour.

The simplest such schedule is the continuous reinforcement schedule, where each response produces reinforcement. Continuous reinforcement is most effective during the early stages of the acquisition of a behaviour. Reinforcement is delivered according to an intermittent reinforcement schedule, where responses are reinforced only occasionally. Schedules of reinforcement – Fixed Ratio -

A fixed number of responses have to occur before the reinforcer is delivered. FR10 – Represents a fixed ratio schedule where 10 responses must be made to obtain 1 reinforcer

-

Responding is moderate and steady because the more you respond the more reinforcers you obtain Coffee loyalty card are an example of fixed ratio. If a stimulus is delivered after a set number of responses, it is considered a fixed ratio schedule. For example, a pigeon might be given a food reward after every tenth time that it pecks a button

Schedules of reinforcement – Variable ratio -

-

The reinforcer occurs after variable number of responses VR6 – represents a variable ratio schedule where on average 6 responses must be made to obtain 1 reinforcer but they could occur after many different number of responses. Responding is high and steady because you never know when you are going to be reinforced. One armed bandits/gambling machines are an example. If the number of responses required to receive a stimulus varies, then you are using a variable ratio schedule. The best example for this is a slot machine,

Schedules of reinforcement – Fixed Interval -

The reinforcer becomes available after a fixed period of time. FI1 minutes – Represents a fixed interval schedule the reinforcer becomes available 1 minutes after the last reinforcer. Responding to a fixed interval schedule shows a scallop patter. Little responding after a reinforcer was delivered and lots of responding just before it becomes available Working on assessments is an example of a fixed interval schedule. If a stimulus is given after a fixed amount of time, regardless of the number of responses, then you've got a fixed interval schedule.

Schedules of reinforcement – Variable interval -

-

-

The reinforcer becomes available after a variable period of time VI1 minutes- Represents a variable interval schedule where the reinforcers becomes available on average 1 minutes after the last reinforcer but could occur, 10s, or 110s, or 30 etc. Responding to a variable interval schedule is low steady because you never know when the reinforcer will be available, but the delivery of the reinforcer is not sensitive to the number of responses. Fishing is an example. If a stimulus is given after a variable amount of time, you've got a variable interval schedule.

Skinner’s operant respondent distinction Behaviour can be controlled by both antecedent events (respondent behaviours) and by its consequences (operant).

The environmental events that control the behaviour, determine between antecedent or consequences. Extinction -

Extinction refers to the gradual weakening and disappearance of a response because the response is no longer reinforced. Extinction after continuous reinforcement is fast Extinction after intermittent reinforcement, particularly a variable ratio schedule is slow....