How I calculated Molly's video poker loss rebates

Bob Dancer recently wrote two articles about a loss rebate promotion in a chain of Las Vegas bars, named Molly's. I helped with the calculations. You can read the articles themselves here (highly recommended if you are interested in how a gambling guru thinks about an advantageous situation):

Basically, with any loss rebate program, there are two main stopping points: One is how much you should lose before you stop gambling and ask the casino for your rebate. The other is how much you should win before you stop gamling and hit the road with your winnings (and forget the rebate). These two numbers depend on the very exact game you are playing, and the very exact formula the casino uses for paying your rebate.

Sometimes, a loss rebate promotion is so good that the upper stopping point becomes +∞. That means you are supposed to play as much as you can until you either hit the lower stopping point and get the rebate, or you get tired and leave the casino (or they tell you they don't want your action anymore). The promotion at Molly's is not such a promotion.

James Grosjean, in his tome Beyond Counting: Exhibit CAA, details the exact algorithm for finding these numbers for the game of roulette. For those who have access to the book, it starts on page 124. It goes into a lot of details, way beyond what I can even try to summarize here, so I'll focus on just explaining the basics of the algorithm, adapted for video poker.

First, you need to define a rebate function as a function of wealth, which means how much ahead or behind you are in the loss rebate session. If you have lost $10, your wealth would be -10. If you are ahead $100, your wealth would be +100.

In Molly's promotion, if you lose $20 or more, they give you $20 in free play. You can play that free play on 7/5 Bonus Poker, which has the highest return in the bar. If you play 7/5 Bonus Poker perfectly, its return would be 98.0147%. So the loss rebate function would be something like this:

def rebate_function(wealth):
    if wealth <= -20:
        return wealth + 20 * 0.980147
        return wealth

Then you set some upper and lower bounds for what you estimate your stopping points would fall in the middle of. For example, you may assume that the lower stopping points is not smaller than -20, since there's no extra benefit if you lose more than $20. For the upper boundary, we can use two royal flushes, around +16000. There's a very good chance the upper stopping point is less than $16,000, so we can start with that. (It's OK if those estimates are wrong. If the algorithm finds that you should continue even after you are ahead $16,000, you can increase the upper boundary and run the algorithm again.)

Your goal is to figure out the value function, which is the value of the loss rebate promotion based on your wealth (how much you are ahead or behind). We are trying to calculate the value function for every possible amount of wealth in the range bound by the boundaries we estimated in the previous paragraph, [-20, 16000] in the case of Molly's promotion. We don't need to calculate it for every single number in that range, since some of those numbers can't happen while playing. For example, if you play the $2 denomination at $10 a hand, you will never be $7 ahead or $5 behind. You'd only need to calculate it for multiples of 10, namely -20, -10, 0, 10, 20, 30, ..., 16000. Depending on your programming language, the value function can be held as a dictionary or a hashmap, or even a simple array or list.

You initialize the value function with the loss rebate function. So in the first iteration, value[-20] = -20 + 20 × 0.980147, value[-10] = -10, value[0] = 0, value[10] = 10, etc.

The rest of the algorithm is basically a loop. In every iteration, you try to make the value function more accurate. You start a new value function, and determine its values based on the previous value function. The new value function at each wealth point is the larger of the previous value function (representing what happens if you stop betting), and a weighted average of previous value function based on the results of one video poker hand (representing what happens if you make a bet).

Assuming you are playing 7/5 Bonus Poker perfectly at $10 a hand, there's a 54.49% change of hitting nothing and losing $10, 21.53% chance of hitting a pair of Jacks or better for a push, a 12.93% chance of hitting two-pair and winning $10, a 7.44% chance of hitting three-of-a-kind and winning $20, etc., up until a 0.0025% chance of hitting a royal flush and winning $7990. (The exact probabilities can be determined using Wizard of Odds Video Poker Strategy Calculator.) In this case, the weighted average I talked about in the previous paragraph for a specific amount of wealth would be 0.5449... × value[wealth - 10] + 0.2153... × value[wealth] + 0.1293... × value[wealth + 10] + 0.0744... × value[wealth + 20] + ... + 0.000025... × value[wealth + 7990]. (In the specific case of Molly's, there was also an extra ~0.1% earned through points. You should add that to the weighted average too.)

You continue this iteration until the values function don't change more than a tiny amount after an iteration, which tells you they have converged to be very close to their actual mathematical values. Then you stop. (There's a chance the values never converge. That usually means the upper stopping points is +∞.)

At the end of the algorithm, after you became confident you have done enough iterations to converge towards a good enough estimate of the value function, you can determine the stopping points and the expected value of the promotion. In the specific case of Molly's, the lower stopping points is clearly at -$20. To find the upper stopping point, you need to find the first number where the value function becomes an identity function. For example, you may find that value[150] = 150.19 but value[160] = 160. This means that if you are $150 ahead, there is still 19 cents of expected value if you continue playing, but if you are ahead $160, there is no more positive expectation. This means that your stopping point is when you are $160 ahead. The expected value of the whole promotion can be found at value[0]. This is how much the promotion is worth when you are $0 ahead, which is the same as when you start playing the promotion.

I also calculated the stopping points for when double-ups are allowed. These needed an extra layer of complexity, but the basic idea is that you assume you are playing a different game. For example, if you only allow yourself to double-up certain wins once, you can assume that those wins happen only half of the time, but pay you double what they paid before. It's the same algorithm, with just the probabilities and the amounts of wins changed. It works the same way when you allow yourself to double-up twice (quarter the chance of hitting the specific pays, quadruple the actual payout). You can basically play with the various double-up strategies, until you find which one has the highest expected value. That's how I determined that doubling-up until hitting a W2G has a very high return if you are OK with receiving W2Gs. But it's not possible to calculate the absolute best strategy for this promotion unless you know the exact limit for doubling up on the machines.

I wrote the code in Python, but it's possible you can do it in a spreadsheet like Excel. I'm not enough of an Excel expert to know how hard it would be to calculate in Excel.

Also note that the gambler's exact tax situation affects their strategy with any choice of game. Grosjean briefly mentions this in his book. Taxes behave like loss rebates, although in the opposite direction. It's as if the IRS has a loss rebate on you: the United Stares Treasury shares a percentage of your win, but they usually don't participate in your loss. For a detailed example of how taxes affect video poker wins, see the first post in this blog, titled "Taxes Changed Everything".