- Pets and Animals»
- Dogs & Dog Breeds
Examples in Using Operant Conditioning
Applying the Tactics
In EADT: Volume 4, we covered Classical and Operant Conditioning as our two main tactical spheres in which we work in the "battle" with our dog's (and/or their behavior).
I had thought to cover them more, but then decided to just show some ways I used the tactics, especially in the Operant Conditioning sphere as that's where I "live" most of the time.
Here, I'll illustrate, hopefully well, the applications I've used with Wally to help him develop. Everything here is something I've actually used in working with Wally. So this won't be as much about theory, but hopefully will tie in to what was mentioned in Volume 4.
Remember your abbreviations for the four results of Operant Conditioning! I'll be using them throughout this hub, again for space saving as well as familiarity. Here's a reminder:
- (+R) - Positive Reinforcement
- (-R) - Negative Reinforcement
- (+P) - Positive Punishment
- (-P) - Negative Punishment
And another one you'll see:
- (c/t) - Click and Treat. Although it doesn't literally have to be a "click" and a "treat". It could be say "yes" and give a toy. It could be "good boy" and have a game of tug. Deliver your "charged" word/phrase/sound, and then deliver the reward. But since clickers and treats have become almost "default" in the +R world, this shorthand is often used, so I would like you to be familiar with it.
Your thoughts on luring...
Is Luring an approach you have used, or would use?
Luring was mentioned in an earlier Volume. It is a way to guide the dog into doing what you would like the behavior to be without any physical contact. This is a significant difference over modeling, which is a, literally, hands-on approach.
The lack of physical touching frees up the process to be used for a multitude of behaviors. The first behavior I taught with this was nose targeting. Now this may seem like a stretch, but it was luring in my mind.
What I did was hold my hand out to Wally, and when he sniffed, I clicked and treated (c/t). I took my hand away, and then presented it again. When he sniffed, c/t. I wanted to make sure his nose was actually touching my hand. He couldn't just air sniff it from a foot away and get a reward (trust me, he tried that - the little con-artist!). No, he had to put his cold, wet nose on my hand.
So what did I do if he didn't. Well, I used one of two approaches. In the beginning, I would use modeling, gently bringing his head to my hand so that his nose touched it, then c/t. The +R would help encourage the behavior, even if I had to do it for him a couple times (and only a couple - I didn't want the cue to be me touching the back of his head!) Once he got the behavior down, I named it (put the command to it) just before the nose hit my hand. Then c/t. Once learned, if at any time he didn't touch my hand (including the "phantom touch" where his nose LOOKS like it touched, but it really was a centimeter away), then he got nothing. No reward. No praise. Nothing but a nice dose of -P for his non-efforts.
I also used luring to teach the down position. I would hold a treat in my fist and let him sniff it. Then, I put my hand between his front paws and slid it back. This often got him to "fold" himself backwards into the down position. At the time he hit the floor, he got a click and I opened my fist for him to get the treat.
As with the nose targeting, I had to watch for other attempts to get the reward. I had to make sure the down was the only way he got the reward. A few mistakes would be fine, but too many and I'll confuse the behavior.
Once he was going down with ease, I'd say "lie down" just as he was maybe 2/3 the way down, and when he hit, he got a c/t. If he did anything else - nothing at all.
- (+R) - the c/t for any correctly completed behavior.
- (-P) - the lack of c/t for any incorrect attempts.
Have you tried shaping with your dog(s)?
Shaping can be hard on a dog. The whole concept of acting without direct prompting can be difficult for them to grasp. It can also be a test of a trainer's patience as the dog tries to figure out what the goal is and the trainer can only watch for steps in the right direction and reward them.
However, shaping uses a lot of +R/-P as a pair, which is great because the dog is getting feedback in both directions on how close he's coming to the desired result, and it getting rewarded along the way for each hint of progress. This can motivate him to keep trying and working in hopes of more rewards.
This is one place where those jackpots I mentioned in an earlier hub come into play. You can use them to bring a bonus helping of +R to really show the dog he was successful. Dogs DO understand that if actions usually get them one treat, then suddenly an action got him five, he's going to try to remember what got him more food and offer it again! However, the best way to deliver a jackpot is a piece at a time, but in relatively rapid succession. You want the dog to clearly see he got five (or whatever number) of rewards for the action. A bunch of treats in the hand might just seem like a big gulp of treats. However, too slow and the "wow factor" gets lost. You don't want to click again while giving the jackpot. That would send a different message to him.
The hardest part of shaping is getting started. The basic idea is that the dog should just throw behaviors at you/an object and you reward him when he performs the action you're looking for, or a step towards that direction. However, in the beginning, you might need to get the dog into just doing stuff without being told. This was clearly the hardest part for Wally. He would visably go into panic mode quickly if the +R wasn't forthcoming. (This was my first sign that -P would be really powerful on him).
Trying to shape a new behavior with that mentality would be a failure, compounding the difficulty in getting him into it. So I started with already learned behaviors. Sitting was the first one. I would bring him into the room and then just stand there, treats and clicker at the ready. He looked at me like "so....what are we going to do here?", but I gave no feedback. I may as well have been a statue until he offered me a sit.
When did sit (either as an offering or just tired of standing and doing nothing), he got a c/t. The click actually scared him a little, but he got the treat just fine. I had him stand up, and I waited again. He sat again. Another c/t. Did this about five more times, moving around the room to get him to follow me and then offer a sit. He was starting to get the hang of it.
- (+R) - the clicks and rewards for offering the desired behavior (or any behavior in the beginning).
- (-P) - withholding reward in an effort to induce greater effort/new behaviors.
Later on in the shaping process, you want to induce "frustration" or "WHY THE HECK IS THE CLICK NOT COMING!" response out of him. There's a couple reasons for this:
- It shows he's "in the zone" and is expecting a reward for an offered behavior. He's playing the shaping game and is eager for another "score".
- The frustration often leads to a new behavior as the dog tries the other behavior with extra "oomph". BE READY TO REWARD IT, so the dog understands the extra effort can win the day. You're walking a fine line here between inducing effort and triggering extinction, the deleting of a behavior from his mind!
An example of this:
Once Wally learned how to use his paws, he started to paw at me like a cat in order to get the c/t when shaping. After a while, if I held out him and made him paw (more like scratch, but anyway) me more than a few times, he would paw me even harder! Sometimes he'd start whining a little, or pawing faster and more frantically. That's the reaction I'm talking about.
But then I'd hold out on that. What he do this time? He put both his paws on me. So another c/t. I held out on all of that, and then he stood up on his hind legs and SLAPPED his paws on my leg. He got a click and a jackpot for that.
The jackpot made him stay there. When no more treats were forthcoming, he started barking. He NEVER barked while standing up like that before. He got another click and a jackpot for that effort!
Admittedly, this is a method I'm using less of, however, it's one I did use and had some success with. As such, I don't throw it away, in case it might come in handy sometime.
As mentioned earlier, Modeling is about physically manipulating the dog to "show him how" to do something. Then when he completes it, you deliver the c/t.
One of the hardest tricks for Wally to learn was to shake. Shake often means that the dog will extend his paw and hold it there so you can take it and "shake hands" with him. Two things Wally did not do - pick things up and use his paws. Modeling helped with the paw situation.
I had him sit in front of me. I had my clicker and treats ready. I then tickled the back of his paw. This ended up being the key. The other modeling approach was to take and hold his paw in my hand, lift it up, and say "shake" then c/t. That...did not work. At all.
Tickling the back of his paw - that got him to move his paw a little. I kept at it. Soon, he picked his paw up off the ground. That was it! C/T for him! Now he's like, okay...what just happened here. He puts the paw back down, and I do it again. Tickle, tickle, tickle! He's like grrr there it is again! And he lifted his paw up. Another c/t. I could see the wheels spinning, and what happened next is a lesson on always ALWAYS conside your body movements, because your dog is always watching your body.
I moved my hand down - and he started lifting is paw before my hand got there. Jackpot time for Wally! I did it again, and sure enough, the paw came out. I quickly put my hand under the paw when he was about to move it down, and clicked and treated. I then added my cue "Paw" just before I moved my hand down. He lifted his paw up, and got a c/t.
Actually, there was a few things at work here other than just straight modeling. At first it was, albeit in an odd way (the tickling). After that it became a little shaping (what to offer when the hand goes towards my paw), and at the same time, it gave a built in hand signal that still works. If my hand is near his paw, he'll put his paw in my hand. There may have been a little -R going on here too with the tickling. It wasn't anything harsh, but it evidently made him uncomfortable, and that, by definition, makes it -R. Him lifting his paw removed the sensation of the tickling. The behavior also got him a reward (+R), so it prompted him to repeat it, even without the tickling.
- (-R) - The tickling was uncomfortable to him, and prompted him to lift his paw to make it stop.
- (+R) - The click occurring while the paw was lifting up told him that he "did the right thing" and got a reward. Now, paw lifting is a behavior he would repeat.
Sidenote: Operant Tactics Mixing
The four aspects of operant conditioning do not exist just to be used individually. They can be mixed together within an exercise or activity and are generally considered more effective when they are used this way.
There are a couple often used pairings.
- +R/-P : This pairing combines the use of a charged word/sound and a reward combined with the withholding of reward delivery. This give the dog a yes/no sort of component to learning where the dog can see something gain him things, others cause him to lose out.
- -R/+P : This pairing is oriented around having the dog avoid unpleasantness being applied in order to learn what to do. The reward in this case is more the ending of a disliked event, or taking behavior to avoid the onset of a disliked event.
These are commonly seen and used and will probably be the ones you run into at the heart of a lot of advice and training discussions.
Many times trainers will be grouped based on which positive they use. "Positive Trainers" emphasize +R. "Correction Trainers" emphasize +P.
However, these labels create myths and as such, they are worthless.
There's no real such thing as 100% +R training. The instant a reward is withheld, that's -P. Period. There's no 100% +P method either. That would be applying an aversive to everything the dog did.
Rewarding everything a dog did wouldn't teach him anything. Punishing everything he does wouldn't either. As such, neither truly describes either subset of trainer.
The attempt to classify trainers has led to another category called the "balanced" trainer. What this means is that the trainer will tend to favor +R/-P or -R/+P, but also mix in another approach that "conflicts" with the pairing. For example, a "Positive Trainer" that also uses aversives if the withholding of reward doesn't get the point across. That could be shown as +R/-P/+P. Likewise, a "Correction Trainer" that gives a reward once the dog is back into compliance could be said to be a -R/+P/+R trainer.
Of course, you can mix these anyway possible and with any notation "rules" you want. I list them based on what the trainer favors first, then what other approaches taken if needed. I would write myself as being +R/-P/-R.
The fact these can be mixed and matched to quite a few combinations makes all these labels largely meaningless to me. I'd rather ask (or observe, or read) how the trainer works with his/her dog(s) and see how things fit together as a whole and which approaches he/she uses. Once you know what the four quadrants are and mean, you can do your own analysis based on what the individual trainer does.
How do you teach?
When teaching a new behavior to your dog(s), which method do you favor?
That's All for the Tactics, I Guess
Seems like there's not much else to go with, unless someone can think of things I missed or need to re-visit.
In the meantime, the next Volume will cover a different topic. See you then.
Some Links That Interested On This Journey
- Shaping Explained- Part 1 of Training Your Dog to Turn on a Light Switch
Part one of a video series where a dog is being shaped to hit a light switch.
- Dog Star Daily - Lure/Reward Training
Dr. Dunbar’s guide for lure/reward dog training
- How Shaping Develops Learning | Karen Pryor Clickertraining
Free shaping sessions give us a window into the thought process of learners. We can see how learners make choices toward solutions, and we can see when they have run out of ideas.