Greg Markowsky

Two problems involving random walks on graphs are studied. First, the starting point from which a random walk on is most likely to hit a given point before another given point is determined. Second, the slowest mixing initial distribution under a random walk on a given finite graph is found.

Introduction

This article describes my investigation into several basic problems regarding random walks on graphs. On several occasions, I asked myself questions which my intuition failed to answer. I guessed at an answer, and spent some time in a fruitless attempt at proving that it was correct. Out of frustration I turned to computer simulations, only to discover that my guesses were faulty. Once I had the correct answer, I was able to supply the proofs. As every mathematician knows, it is much easier to solve a problem when you know the right answer ahead of time. This presentation is deliberately informal, as it represents the record of an actual investigation that took place, rather than a crafted paper. In fact, the notebook that I used to run my experiments has become the paper, with explanatory text added and unnecessary debris removed.

Probability That a Random Walk Hits One Point before Another

A graph is a set of vertices and a set of pairs of vertices , known as edges. We will assume throughout that is an undirected, simple graph. That is, no edges are of the form , the edge is the same as the edge , and the set of edges contains no repeated elements. The degree of a vertex of a graph is the number of edges containing . A random walk on is the random movement of a particle from vertex to vertex, where at each step a particle at any vertex moves to an adjacent vertex, with the probability of moving to each adjacent vertex given by the reciprocal of the degree of the vertex. This movement is memoryless, in the sense that each step is independent of earlier steps.

One of the most interesting and fundamental questions in this field is the question of recurrence and transience in the integer lattice of dimension . The basic and well-known result is that a random walk on is recurrent (returns infinitely often to any point) if and only if . See [1] for a beautiful proof of this involving resistance in electric circuits, as well as a more elementary proof. I have recently asked myself a related question. Given two fixed points , and a variable point , let us find , the probability that a random walk starting at hits before hitting . In one dimension this question is quite easy, but in two dimensions, as far as I have been able to ascertain, no closed form for the solution is known. To circumvent the difficulty of this question, however, we can ask a simpler one. Namely, given and , let us determine the point that maximizes . I spent some time working in vain on this problem, before I realized that it would help to know what the answer is before attempting a proof. I wrote the following program.

The function q is used to generate the correct range of points, a diamond-shaped region of radius n, and place the points in the right order. Here is an example.

This nests the diamonds to create a square lattice.

To understand the function f one must understand a bit of the mathematics of random walks. Due to the memoryless property of random walks, is a harmonic function. That is, is the average of its surrounding values,

for all in and the sum is over all adjacent to . In fact, it is easy to see that is the unique function harmonic on , such that , , and as (see [1]). The program f begins by setting , , and for all other . In each iteration through the While loop, the value of the function at each point is set to the average of the neighboring values. In this way the function fun becomes more and more nearly harmonic, and as a result closer to the values of our desired function . This process is described in [1], and just to check that f is working correctly I tested it on an example that appears in [1].

This exactly matches the values given in [1]. Confident that my function works well, I tested it on an example.

And here we see the value of numerical experimentation. Lacking what I now know to be the proper intuition, I was foolishly confident that the maximum would occur at the point , since this seems to be the point most “separated” from by . However, the simulation above clearly shows that this is not the case, and that in fact and are the maximal values. After playing with several other examples, which readers are encouraged to do on their own, I realized that the maximal point is always one adjacent to . Once that is realized, it is clear that it must be the point or points “farthest” from , in some sense. One can define farthest in terms of the Euclidean metric, but it is actually more natural to define it in the way given in the statement of theorem 1.

Theorem 1

The function is maximized at one of the points adjacent to . In order to determine the correct point, draw two lines and of slopes 1 and through . This creates four half-planes. The point lies in two of these half-planes, unless it lies on one of or , in which case lies in only one half-plane. In case lies in two half-planes, is maximized at the unique point adjacent to that does not share a half-plane with . In case lies in only one half-plane, is maximized at the two points that do not lie in the half-plane containing .

Proof

Here is a sample situation.

The statement of the theorem is that the maximum occurs at , since that is the point adjacent to that does not lie in a half-plane with . First let us prove that the maximum occurs at a point adjacent to . Let be the set of points adjacent to . Then, for any not in ,

Thus, is seen to be a weighted average over of the quantities with weights . Since

it follows that

so that the maximum is obtained on . Now, to determine at which point it is obtained, let us first define a reflection in this context. The reflection of a point in the line is the unique point such that is a line perpendicular to with its midpoint on . So, for instance, in the graphic above the reflection of over is the point . If is a sequence of points in such that is adjacent to for all , then we call a path. The reflection of a path in is the path , where each is the reflection of . So, in the graphic below, the blue path is a reflection in of the green path.

Now,

Suppose that lies in one of the half-planes formed by the line and that is a point adjacent to that also lies in this half-plane. Let be any path starting at that hits before . This path must hit at some point, possibly at . Let be the smallest value such that . Then we set . That is, is the reflection of up until and both hit , from which point and coincide. is then a path, beginning at , that must hit before (note that for , lies in the half-plane not containing ). The map is an injective map from the set of paths starting from that hit before to the set of paths starting from that hit before . Thus, the number of distinct paths starting at of length that hit before is less than or equal to the number of distinct paths starting at of length that hit before . It is in fact easy to see that this inequality is strict, since there are paths starting at that hit before , but whose reflections hit before . It follows that . Thus, the maximum occurs at the point or points adjacent to that do not share a half-plane with , and the theorem follows.

Slowest Mixing Distribution

Suppose now that is a finite graph. The question of hitting probabilities of points can be handled in a similar way to the above argument, where we still must look for harmonic functions on the graph. However, there is a related question that is a bit different, but quite interesting. Suppose, rather than a sole walker, we have a large number of random walkers, all moving at the same time. The object of interest will be the fraction of walkers that are at each vertex at any given time. Suppose has vertices, numbered from 1 to . A distribution vector is a vector in whose components sum to 1, representing a distribution of random walkers on the graph. Form an matrix whose elements satisfy unless vertices and are adjacent, in which case . This is known as the adjacency matrix. We will suppose below that the matrix is regular (every vertex has the same degree) and not bipartite (bipartite means that the vertices of the graph can be divided into two sets and , such that every vertex in is adjacent only to vertices in , and vice versa). Let denote the probability matrix obtained by dividing all elements in the adjacency matrix by the degree of each vertex. If at time 0 the walkers have distribution , then is the distribution at time 1, followed by , etc. The distribution vector gives the stationary distribution, the distribution that is unchanged (since each row of sums to 1) when the random walk undergoes a step. The key question is, when we start with an arbitrary distribution, what happens? Do the walkers somehow approach the stationary distribution, or do they stay disordered forever? It is a consequence of the Perron-Frobenius theorem that all eigenvalues of are real and lie in the interval , and that the eigenvectors form an orthogonal basis of . The largest eigenvalue is simple and equal to 1, with the stationary distribution as an eigenvector. Any function on the vertices of the graph can be expressed uniquely as , and , , and so on. If is a distribution vector, then since the basis of eigenvectors is orthogonal and the sum of the components of any vector orthogonal to must be 0. We know that . Let us assume that none of are equal to (in case one of them is, it can be shown that the graph is bipartite). In that case, an arbitrary distribution will converge geometrically to the stationary distribution, since as . Thus, we see that the largest magnitude of the eigenvectors other than determines the slowest possible rate of convergence of any distribution to the stationary one. This is all standard and well known in this field (see [2]). I was interested in using Mathematica to understand these theorems better, so I decided to try to determine the “slowest mixing” distribution, given any graph. Again, my intuition failed me, as I imagined that the slowest mixing distribution was likely to be complicated and difficult to obtain. The slowest mixing distribution will be the distribution vector where is maximized, where is chosen to be the largest eigenvalue (in magnitude) other than . If we define by , then we seek to find the maximal value of that lies in . Let us use GraphData to find interesting examples with which to work.

We only want regular graphs, and let us consider a number of vertices that is small enough with which to calculate but still large enough to be interesting; nine seems reasonable. Let us also take only graphs of degree four.

We get rid of the bipartite and disconnected graphs.

Here are the graphs.

Here are the spectrums, the sets of eigenvalues of adjacency matrices. They all have degree four as their largest eigenvalue, as they should, and all other eigenvalues lie in .

Just as a check, let us make sure that all spectrums sum to 0. This is because the sum of the eigenvalues of a matrix is equal to the trace, and the trace of the adjacency matrix is 0.

The sum of the squares of the eigenvalues gives twice the number of edges.

The sum of the cubes of the eigenvalues gives six times the number of triangles in the graph.

See [3] for a proof of the preceding facts. We see that the fourth graph in our list has many triangles, while the next to last has hardly any. It may be guessed that this property has something to do with how rapidly a distribution gets mixed on a graph, so let us closely consider these two graphs.

Here is what they look like.

Recall that we are trying to find the distribution that converges most slowly to the uniform one. The probability matrix is defined to be the adjacency matrix divided by the degree, in this case four. In the context of random walks on regular graphs, the probability matrix is more relevant than the adjacency matrix, as it represents the transition probabilities to the neighboring states of each step of the walk. Here are the all-important eigenvalues and eigenvectors for the probability matrix of the graph with many triangles.

We define , …, to be the eigenvectors.

Now, if the theory is correct, this should be an orthogonal basis of , so that for any vector we should have

w=∑_(i=1)^9(w·v_i)/(|v_i|^2)v_i.

This is a standard result from linear algebra, and holds in the much more general context of Hilbert spaces (see [4]). Let us verify it with the vector .

In order to find the slowest mixing distribution, then, we need to maximize the coefficient of in the representation of any distribution vector in . A distribution vector is a vector such that . Thus, we are led to ask Mathematica the following.

Interestingly enough, the slowest mixing distribution occurs when we concentrate the entire mass of walkers on vertex 8. But which is vertex 8? Here is the graph with the vertices labeled.

There does not seem to be anything too special about vertex 8, and considering that , we see that vertices 1, 2, and 9 would have fared equally well.

Let us now see how the other graph works.

Here, , so we see that vertices 6, 7, and 9 would have worked equally well. Here is the labeled graph.

Having worked through all of this, it is now clear which is the slowest mixing distribution. One can simply take the maximal component of and place all of the mass on that vertex. One could also distribute the mass over several vertices with maximal size and the same sign, for instance over vertices 8 and 9 in the notri graph. We are led once again to the correct theorem.

Theorem 2

Given the conditions as above, the slowest mixing distribution on the vertices of a graph is the one obtained by putting a unit mass on one of the vertices that corresponds to a maximal component of the eigenvector associated with the second largest eigenvalue (in absolute value) of the probability matrix of the graph. This distribution is not necessarily unique, as there may be more than one maximal distribution, and an equally slow distribution can be obtained by distributing the unit mass in any way over a set of maximal components of the same sign.

Acknowledgments

This work was supported by the Priority Research Centers Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education, Science and Technology (Grant #2009-0094070). I would like to thank Professor Sangu Lee of Sungkyunkwan University for the invitation that led to the writing of this paper.

References

[1] P. Doyle and L. Snell, Random Walks and Electric Networks, Washington, D.C.: Mathematical Association of America, 1984.
[2] J. Doob, Stochastic Processes, New York: Wiley, 1953.
[3] D. West, Introduction to Graph Theory, Upper Saddle River, NJ: Prentice Hall Publishing, 2001.
[4] G. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed., New York: Wiley, 1999.
G. Markowsky, “Two Basic Results Concerning Random Walks on Graphs,” The Mathematica Journal, 2011. dx.doi.org/doi:10.3888/tmj.13-16.

About the Author

Greg Markowsky is currently a lecturer in mathematics at Monash University. He does research on probability theory, complex analysis, and geometry. He also worked for two years at Wolfram Research on the Wolfram|Alpha project.

Greg Markowsky
School of Mathematical Sciences
Monash University
Building 28
Wellington Road
Clayton
Victoria 3800, Australia 9905-4487

gmarkowsky@gmail.com