Theory behind the environment

TL;DR (or, I had enough of math for a lifetime)

The theory behind the environment is quite simple- placing a bet is choosing an possible outcome and a stake size, multiply it by the winning odds and subtract the initial bet and any losses.

The scenic route

After this (very) brief explanation, let’s get down to business:

Definitions

Let \(g\) be a game with \(N\in\mathbb{N}\) distinct possible outcomes, and let \(p_i\) be the probability of outcome \(g_i\), so that \(\sum\limits_{i \in N} p_i = 1\). Let \(G\) be the set of games so that \(g_j\) is the \(j^{th}\) game.

Let \(o\) be the decimal odds for a game \(g\), so that \(o_i\) is the decimal odd for outcome \(g_i\). Let \(O\) be the set of odds so that \(o_j\) is the odds for game \(g_j\) in \(G\).

\[o \triangleq \{o_i| i \in I_N , 1 \leq o_i \}\]

Let \(r\) be the result of game \(g\), so that \(r_i\) is 1 if the outcome was \(i\) and 0 if not (in other words, \(r\) is the indicator of game \(g\)). Let \(R\) be the set of results so that \(r_j\) is the result for game \(g_j\) in \(G\).

\[\begin{split}r_i \triangleq \begin{cases} 1 & \text{if outcome i happened,}\\ 0 & \text{otherwise.} \end{cases}\end{split}\]
\[r \triangleq \{r_i | i \in I_N\}\]

Let \(b\) be the bet for game \(g\), so that \(b_i\) is the amount of money to place on outcome \(g_i\) Let \(B\) be the set of bets so that \(b_j\) is the bets for game \(g_j\) in \(G\), and let \(BANK \in \mathbb{R}^+\) be the limit of the bet, so that the sum of the money placed can not exceed \(BANK\).

\[b \triangleq \left\{b_i | b_i \in \mathbb{R}, \sum\limits_{i \in N} b_i \leq BANK\right\}\]

Let \(M_O\) be a matrix of size \(|G| \times N\) of the numerical values of the set \(O\). In a similar manner we’ll define \(M_R\) for the numerical values of \(R\) and \(M_B\) for the numerical values of the \(B\).

Let \(W\) be the winnings matrix, defined by the Hadamard product of all the matrices defined above like this:

\[W \triangleq M_B \circ M_R \circ M_O\]

So, the winnings on the set of games \(G\) is grand sum (the sum of all elements) of \(W\).

The Environments

All the environments implemented are a subclass of an OpenAI Gym envrionment, with an observation space that equals to the odds (\(O\)) and an action space that equals to the bets (\(B\)). A step in the environments is simply getting the current subset of odds and placing a bet on them, calculating the reward by using the grand sum of \(W\) and subtracting the total amount of the bet placed. An episode is reached when there are no more games or when the \(BANK\) is depleted.