Jan Vrbik

We present a technique for computing the probability that a specific pattern of successes and failures is generated randomly before another such pattern, thus winning the corresponding game. The program we build for this purpose finds the mean and standard deviation of the number of trials needed to complete one round of such a game. It can be used to maximize the probability of winning a game by choosing the best possible pattern, and also by adjusting the probability of a success. Finally, we verify our theoretical results by a Monte Carlo simulation.

Generating a Single Pattern

We consider an experiment (such as flipping a coin or rolling a die) in which each trial results in either a success (obtaining a head, a six, etc.) or a failure , with probability and , respectively. The trials are independently repeated until a specific pattern (such as ) is generated for the first time.

Starting from Scratch

Following [1], we introduce to be the probability that the pattern has been completed for the first time in the trial. Similarly, is the probability of completing the pattern in the trial, but this may now be its occurrence, where is any positive integer (we also set and ). Note that the probabilities must add up to , whereas the probabilities have an infinite sum, since . In the definition of , it is important to stipulate that consecutive occurrences of the pattern are not allowed to overlap—once the pattern is generated, none of its parts can be used to help build its next occurrence. Thus, for example, the sequence contains only one completion of , not three.

To find the probabilities , we assume that trials of the experiment have just been completed. The probability of the last five of these trials resulting in (it is easier to explain the procedure using an example) can be expanded using the formula of total probability as follows:


Here, the left-hand side is the simple probability of generating ; the right-hand side partitions the sample space according to where (during the last five trials) the corresponding pattern has been actually completed (in terms of our example, this could have happened at the last trial, but also either two or four trials earlier).

To make sure not to miss any such possibility, one should slide the pattern past itself, obtaining a term for each perfect match, thus:

Multiplying (1) by and summing over , from to (the equation is incorrect when ), one obtains

where is the generating function of the sequence. We had to subtract from to account for the missing term ().

From [1], we know that the probability generating function of the sequence is given by



The numerator of is obtained from the right-hand side of (1) after replacing by , while the denominator equals the left-hand side of (1) multiplied by (where is the total length of the pattern).

The computation of is done by the following program.

The first line converts the argument from a string, say , to a binary list , sets n to be the corresponding length, and initializes h to 0. The second line slides the list past itself by symbols (where i ranges from 1 to n) and, whenever a full match results, adds the product of the corresponding powers of p and q to h. The last line converts the resulting h to .

The program uses conv to convert a string to a binary list and poly to convert a binary list to the probability of generating that string, further multiplied by .

Here is an example. The result of returns the result quoted in [2].

Based on the probability generating function , we can easily find the corresponding mean of the number of trials required to generate the pattern for the first time (see [3]) by

and the variance by

In the case of , one gets and for , and and for .

Head-Start Distributions

In the next section, we will also need the conditional version of the distribution of the number of trials to obtain a specific pattern of length , given that the first of its symbols have already been generated (where ). Let be the corresponding conditional probability, namely that exactly extra trials will be needed to complete the pattern (see [4]). Based on what happens in the next trial (after the first symbols are already there), and using the formula of total probability, one can derive the following set of equations for (again, using the pattern as our example):


The only nontrivial task to set these up is to establish the subscripts of both on the right-hand side of each equation. This is done as follows: to the existing symbols we append first an extra (to find the first subscript) and then an extra (to find the second one). Then, for each of the two strings thus created, we must figure out how many of its symbols (at most) can be used to build the whole pattern. Clearly, when appending the “correct” symbol, the answer is , for the “incorrect” one, it can be anywhere from to . The following table may help:

Note that is the old and that (the full pattern has already been completed—no extra trials are needed).

Multiplying each equation in (3) by and summing over from to , we obtain the following set of linear equations for the corresponding probability generating functions :


since (as and for ). After solving these, one can verify (as a way of checking) that agrees with the old .

Again, the whole task can be delegated to the following program.

The first line is self-explanatory (compare it with the first line of PGF). The second and third lines build the corresponding set of n equations for the unknown —now denoted —and the last line solves them.

The critical ingredient is the following program for establishing how deeply one string can penetrate another, while fully matching the overlapping symbols.

The first line initializes n to its largest potential value (the length of the shorter string). The second line matches the last n elements of the first string to the first n elements of the second string. If a perfect match is found, n is returned; otherwise, n is reduced by 1 and the process is repeated until a perfect match is achieved (when no perfect match ever results, n returns the value of ). Thus, for example, the following command returns the value of .

This illustrates the use of Gset.

It tells us that, with the head start, generating the pattern takes 32 extra flips, on average.

This is the number of flips needed on average for .

Competing Patterns

Suppose that each of two players selects a specific pattern and bets that, in a series of independent Bernoulli trials, the pattern appears sooner than the pattern chosen by his opponent, as investigated in [5]. We want to compute the probability of winning this game for each of the two players, the game’s expected duration in terms of the number of trials, and the corresponding standard deviation.

To achieve this, we assume that trials of the experiment have just been completed, and define the following:

  • to be the probability that the first of the two patterns has been completed at the trial for the first time without being preceded by the second pattern (and thus winning the game); we take to be equal to 0
  • , similarly, to be the probability of the second pattern winning the game at the trial

We also need the , the probability of the first pattern being completed for the first time at the trial—ignoring the second pattern (this is the old ) and (vice versa), and the following modification of these.

Let be the conditional probability that the first pattern requires an extra trials to be generated for the first time, given that the second pattern has just occurred (this time, the first pattern is allowed to “get help” from any portion of the second pattern). It is obvious that these probabilities equal one of the of the previous section (constructed for the first pattern), where is determined by sliding the second pattern past the first one until the longest perfect match is found. Thus, for example, when the first pattern is and the second one is , is equal to 3; we can get the answer by typing .

Similarly (in the same vice versa manner) we define .

Probability of Winning

Partitioning the sample space (of trials) according to the trial at which the second pattern first occurred (including the possibility that it has not occurred yet—the last event of this partition), the formula of total probability yields

The left-hand side is the simple probability of the first pattern having been completed for the first time at trial . On the right-hand side, this probability is broken down according to the first occurrence of the second pattern at trial , , , , . When the second pattern is completed at trial , it wins, explaining the factor. This factor is multiplied by the conditional probability of completing the first pattern for the first time after the occurrence of the second pattern, in exactly trials, that is, . The last term accounts for the remaining possibility of the second pattern not having occurred yet and thus letting the first pattern win, which has probability .

Multiplying the previous equation by and summing over from to results in

To understand why, recall that

The vice versa argument similarly yields

These two linear equations for and can be easily solved.

The probability that the first pattern wins the game is clearly given by . Unfortunately, substituting into the right-hand sides of


results in , so we use l’Hôpital’s rule to find the answer,

where denotes the mean corresponding to , that is, .


One can verify that

When the two patterns are “incompatible” (no matching overlaps, for example, in a run of successes played against a run of failures), the two formulas reduce to

since all capped quantities are now equal to their uncapped counterparts (getting “help” from the other pattern is the same as starting from scratch).

Example: playing the pattern against .

We first get, by the technique of the previous section,

The probability of winning over is thus

When , this yields a rather surprising value of 62.5%.

More easily, the same result can be found with the help of the following program.

Selecting the Optimal Pattern

The program game can also help us find the pattern that maximizes our chances of beating an opponent, assuming that we know the pattern that the opponent chose, and given that both patterns must be of the same length. The easiest way of doing this is to go over all possibilities and see which one results in the highest probability of winning.

In this manner, we find that when , the best chance of beating is to select (that way, our probability of winning is 65.38%), and similarly to beat we should select (yielding a chance of winning). It appears that we should always drop the last symbol of the opponent’s choice, and attach the opposite symbol in front. But this is only a conjecture which needs to be investigated more systematically (and is certainly not true for other values of ).

Similarly, when adding the condition that the two patterns must contain the same number of symbols of each type (for example, and ), we may like to know which value of gives the first pattern the highest chance of beating the second one (and vice versa). This can be done by the following two commands.

Have fun exploring some of the other possibilities!

Game’s Duration


is the probability generating function of the number of trials required to finish the game.

To find the corresponding expected value , we need to differentiate


twice, substituting at the end. This yields the following result.

This implies that (divide both the numerator and denominator by ):


For the two patterns of the previous example, this equals

which evaluates to 22.75 when .

The same result can be obtained by modifying the last line of game.

In the “incompatible” case (no possible overlap between the two patterns), (7) simplifies to

which is half of the harmonic mean of and .

Similarly, by differentiating (6) three times, we can find ; this can be easily converted to the following variance of the number of trials:


where and are the respective variances of the and distributions .

Using the previous example with , this variance is 336.1875 as , , , , , ; the corresponding standard deviation is thus 18.34.

Again, the same computation can be achieved much more easily by modifying the last line of game. This then returns both the expected value and standard deviation of the game’s duration.

In the “incompatible” case, (8) reduces to a much simpler form,

Monte Carlo Simulation

The correctness of all these formulas can be easily confirmed by the following program, whose four arguments are the two strings, the value of , and the number of rounds of this game to be randomly generated.

The program returns the resulting proportion of wins for the first string, the average length of the game, and the corresponding standard deviation.

This example is in good agreement with our theoretical results.


[1] W. Feller, An Introduction to Probability Theory and Its Applications, Vol. I, 3rd ed., New York: John Wiley & Sons, 1968.
[2] G. Blom and D. Thorburn, “How Many Random Digits Are Required Until Given Sequences Are Obtained?,” Journal of Applied Probability, 19(3), 1982 pp. 518-531. www.jstor.org/pss/3213511.
[3] S. R. Li, “A Martingale Approach to the Study of Occurrence of Sequence Patterns in Repeated Experiments,” Annals of Probability, 8(6), 1980 pp. 1171-1176.
[4] L. J. Guibas and A. M. Odlyzko, “String Overlaps, Pattern Matching, and Nontransitive Games,” Journal of Combinatorial Theory A, 30(2), 1981 pp. 183-208. www.sciencedirect.com/science/article/pii/0097316581900054.
[5] E. Pegg Jr. “Coin Flips” from the Wolfram Demonstrations Project—A Wolfram Web Resource. www.demonstrations.wolfram.com/CoinFlips.
J. Vrbik, “Betting Two Patterns against Each Other,” The Mathematica Journal, 2011. dx.doi.org/doi:10.3888/tmj.13-15.

About the Author

Jan Vrbik
Department of Mathematics, Brock University
500 Glenridge Ave., St. Catharines,
Ontario, Canada, L2S 3A1