Why Detection Dogs Should Never Gamble | Tactical Police K9 Training

When it comes to training detection dogs for law enforcement, there is a wide variety of methods that many different K9 trainers use. There are many ways to train dogs, and many different techniques and theories that are utilized.

Thanks in part to the variety of training and different beliefs, there is a saying you may have heard before – “the only thing that two dog trainers can agree on is what the third is doing wrong”. While this is true in the dog training world, it is crucial that trainers base their opinions on the science of how dogs actually learn. This is particularly important when training K9s for law enforcement work.

Unfortunately, a clear understanding of the science behind how dogs learn is not present in many of the techniques and theories used in the K9 world.

Variable or fixed schedules for rewards?

One of the most common debates that many K9 trainers and handlers cannot agree on is whether a detection K9 should be paid every time they indicate on target odor.

For example, there are many trainers who teach that a dog should not be paid upon every indication, due to hoping to apply a variable ratio schedule of reinforcement. A variable ratio schedule means that a behavior will be reinforced after an unpredictable number of responses. A good example of this is a slot machine. Some gamblers will sit for hours and hours at a slot machine, hoping for a big win because at random, unpredictable times, they will have a small win. This causes the gambler to want to continue to respond to the slot machine – pulling the lever, believing that at some point, there will be a pay out as long as they continue to for the behavior.

This mentality is applied to detection in the hope that, by not paying the dog at every alert, the dog’s drive to alert will increase because they don’t know when (or even if) they will be rewarded. However, here is the problem when that approach is applied to detection dogs – a variable ratio schedule is dependent on positive reinforcement to keep the behavior consistent. To understand how this is a problem, a handler or trainer needs to have a clear understanding of operant conditioning.

What is operant conditioning?

Operant conditioning is a breakdown of how to modify behavior, using rewards and punishment. Understanding and using both appropriately is how we are able to train dogs to perform behaviors that benefit us.

Whether it is training a puppy not to jump, or teaching a patrol dog to out, and regardless of the method applied, operant conditioning is how these behaviors will be accomplished. Operant conditioning can be split into two options. The first is reinforcement which will increase the likelihood a behavior will happen again, while the other is punishment that will decrease the likelihood a behavior will happen again.

Reinforcement and punishment are also split into positive and negative options. Positive options mean something will be added, and negative options means something will be taken away. Looking at these different options is what will give trainers the four quadrants of operant conditioning to alter a dog’s behavior.

Positive reinforcement is when a behavior is increased by giving the dog something pleasant.
Negative reinforcement is when a behavior will be increased by removing something unpleasant once the dog completes a command.
Positive punishment means adding something unpleasant to a dog to decrease the likelihood a behavior will happen again.
Negative punishment means removing something pleasant from a dog to decrease the chances of a behavior happening again.

Properly identifying which quadrant a training plan falls into will help identify whether a behavior is likely to increase or decrease over time.

Reinforcement schedules

Positive reinforcement is how detection dogs are trained to find target odors. A well-trained dog will hunt and consistently alert to odor, because they have been trained that alerting to odor will bring something pleasant (like the reward toy). Regardless of the method of training, this is the understanding a detection dog will have through many repetitions utilizing positive reinforcement. So once the dog fully understands locating odor will lead to the reinforcement, the trainer will start utilizing reinforcement schedules to progress the dog farther.

Reinforcement schedules are how K9 handlers can predict how their positive reinforcement will affect the dog’s eagerness to respond to a conditioned behavior. Reinforcement schedules can be divided into fixed and variable duration schedules for reinforcement. Fixed schedules mean that the reinforcement will come once the behavior is performed for a required length of time. A variable schedule means that a behavior will be reinforced at an unpredictable amount of time.

There are four types of reinforcement schedules that are recognized.

A fixed ratio of reinforcement, meaning that reinforcement depends on a predictable number of responses resulting in a slow response rate.
A variable ratio of reinforcement – the slot machine.
A fixed interval means reinforcement will come after a fixed amount of time; this will cause an increase in productivity right before reinforcement should come.
Finally, a variable interval schedule of reinforcement, this means the reinforcement will come at varied times resulting in a very steady rate of activity.

The reinforcement schedule best suited in a particular circumstance will vary, depending on what a trainer is trying to accomplish.

Why is the ‘slot machine’ theory flawed?

The reason the slot machine theory is flawed at this stage of training (or once training is completed) is that, to truly be a variable ratio schedule of reinforcement, the dog will need to receive reinforcement at unpredictable times. The theory is that the dog doesn’t know when they will be paid for an indication on odor, so their productivity while working will increase.

However, just like the gambler sitting at the slot machine, there is a problem here. If the dog is paid for the behavior as soon as the dog indicates, they are receiving something pleasant to increase the likelihood the behavior will happen again, marked by positive reinforcement. If the dog is not paid after indicating on an odor, what quadrant of operant conditioning does that utilize? After the dog indicates, the handler will need to either pull the dog away from odor, or remove the odor from the dog to continue the search for more hides.

From the dog’s perspective, they will have lost something pleasant by being removed from what they were searching for. Something pleasant being removed falls out of the realm of positive reinforcement and falls into negative punishment. Meaning – by not paying a dog and removing the opportunity for payment, rather than increasing the likelihood it will happen again, we are decreasing the likelihood it will happen again because punishment decreases behavior. So rather than setting up a situation where the dog does not predict when it will be paid to make a variable ratio training schedule, the dog is instead punished for the exact behavior that we have spent months creating. A dog that gets paid every time they indicate can still experience the different reinforcement schedules, but the handler will not create conflict or confusion in the dog by moving from reinforcement to punishment.

A better option for dogs to remain consistent is a variable interval rate of reinforcement. This means that the dog knows it will be paid – but the time it will be paid is not predictable. A trainer can use this schedule while not moving away from reinforcement by varying the amount of time the dog will need to search before finding odor and varying hides and blank searches. The result from the dog utilizing this schedule of reinforcement will be a steady continuous desire to do the behavior. In this case – search for, and indicate on, an odor.

Rather than attempting to create a gambling dog, we can create a steady predictable teammate who understands the job and what the payment will be. Utilizing this rate of reinforcement will avoid the jump from reinforcement to punishment so the dog will continue to have a clear understanding of what behavior the handler is wanting to reach their reinforcement.

In conclusion

For a handler to know what the best course of action when working a detection dog, it is incredibly important to know the science behind what the dog is actually learning versus what a trainer is trying to teach.

Dogs do not understand the concept of sometimes getting what they want. They only understand punishment and reinforcement. If a trainer does not know how to identify the two, it is easy to blur the lines – rather than creating a clear understanding in the dog how to reach the reinforcement they desire, they can create conflict in the dog’s mind.

If a training plan is not creating a very clear path for the dog to reach what it wants, and instead is creating confusion, the only gambler is the trainer. And the trainer is gambling on what a dog is actually learning. A detection dog should never gamble, and neither should their handler.