Theory behind the environment¶
TL;DR (or, I had enough of math for a lifetime)¶
The theory behind the environment is quite simple- placing a bet is choosing an possible outcome and a stake size, multiply it by the winning odds and subtract the initial bet and any losses.
The scenic route¶
After this (very) brief explanation, let’s get down to business:
Definitions¶
Let \(g\) be a game with \(N\in\mathbb{N}\) distinct possible outcomes, and let \(p_i\) be the probability of outcome \(g_i\), so that \(\sum\limits_{i \in N} p_i = 1\). Let \(G\) be the set of games so that \(g_j\) is the \(j^{th}\) game.
Let \(o\) be the decimal odds for a game \(g\), so that \(o_i\) is the decimal odd for outcome \(g_i\). Let \(O\) be the set of odds so that \(o_j\) is the odds for game \(g_j\) in \(G\).
Let \(r\) be the result of game \(g\), so that \(r_i\) is 1 if the outcome was \(i\) and 0 if not (in other words, \(r\) is the indicator of game \(g\)). Let \(R\) be the set of results so that \(r_j\) is the result for game \(g_j\) in \(G\).
Let \(b\) be the bet for game \(g\), so that \(b_i\) is the amount of money to place on outcome \(g_i\) Let \(B\) be the set of bets so that \(b_j\) is the bets for game \(g_j\) in \(G\), and let \(BANK \in \mathbb{R}^+\) be the limit of the bet, so that the sum of the money placed can not exceed \(BANK\).
Let \(M_O\) be a matrix of size \(|G| \times N\) of the numerical values of the set \(O\). In a similar manner we’ll define \(M_R\) for the numerical values of \(R\) and \(M_B\) for the numerical values of the \(B\).
Let \(W\) be the winnings matrix, defined by the Hadamard product of all the matrices defined above like this:
So, the winnings on the set of games \(G\) is grand sum (the sum of all elements) of \(W\).
The Environments¶
All the environments implemented are a subclass of an OpenAI Gym envrionment, with an observation space that equals to the odds (\(O\)) and an action space that equals to the bets (\(B\)). A step in the environments is simply getting the current subset of odds and placing a bet on them, calculating the reward by using the grand sum of \(W\) and subtracting the total amount of the bet placed. An episode is reached when there are no more games or when the \(BANK\) is depleted.