### Equivalences

One type of question keeps reappearing in Usenet discussions and Mathematica user groups. It comes in many forms and recognizing that it is a question about equivalences is already half the answer. Here is one formulation:

Given a set (or list), partition it into its equivalence classes. Each element belongs to exactly one class and two elements in a class are equivalent. Instead of the classes themselves we may be interested simply in the indices of the elements. From the indices we can easily find the elements, but not vice versa. The equivalence relation is given as a function of two arguments that returns `True` if the two arguments should be considered equivalent, and `False` otherwise.

This program by Xah Lee appeared on the newsgroup comp.soft-sys.math.mathematica in March 1998 [1]. (In another discussion, the same function was named `MatchingPositions`.)

The idea is to pick the first element, find all remaining elements that are equivalent, then remove them and continue with the remaining elements until no more remain. Instead of working with the elements we use their indices and access the element with index with `li[[`i`]]`. The result is first built up in a recursive structure (a stack), then converted to a list in one step, using `Flatten`. This method has complexity , where is the length of the input list.

The default predicate generates the trivial equivalence class where each element is only equivalent to itself. The resulting index set allows us to quickly locate where the identical elements appear in the original list.

Once we have the indices, we can easily get the elements themselves.

For the trivial equivalence `SameQ` the result of `EquivalenceClasses[l]` is a partitioning of l into lists of identical elements.

For an example, let us group all built-in Mathematica symbols by their length. First we obtain a list of all symbol names in the `System`` context.

Here we measure the time it takes to split the list into lists of symbols with the same length. Using pure functions (of two arguments) is a convenient way to express arbitrary equivalences.

There are this many different lengths of built-in symbols.

One of the equivalence classes is shown here. Note that it is not necessarily the one with the shortest symbols. The ordering of the classes is not defined.

We can speed up this computation significantly by first sorting the symbols by their lengths and then splitting the result at those places where the lengths of adjacent elements differ. The predicate for `Split` is actually the reverse: the list is split where the predicate is `False`, so we use instead of .

This method has a complexity of , the complexity of sorting. The longer the input the larger the difference in run time between the two methods. Intuitively, the reason that this trick works is that the equivalence relation alone gives no information beyond a simple yes/no, and a negative answer does not help us at all. Essentially we have to compare every element with every other one. An ordering relation gives more information; consequently, sorting can be done much faster.

Because of the sorting, the first equivalence class is the one with the shortest symbols.

Note how we found the ordering predicate `StringLength[#1]<=StringLength[#2]&` from the equivalence relation `StringLength[#1]==StringLength[#2]&` by replacing the equality by <=. The key is to find an ordering that is compatible with the equivalence. Mathematica's default ordering predicate `OrderedQ[{#1,#2}]&` would not have worked. It is not compatible with the desired equivalence, because it does not guarantee that strings of the same length end up adjacent to each other; rather, it sorts strings lexicographically.

This column was motivated also by an analysis of a deliberate slowdown introduced in the `Union` function of Mathematica 4. The union of a list of elements can be found by first finding the equivalence classes, then taking just one of the equivalent elements (called a transversal).

To demonstrate the effect, let us generate a long list of random integers with many repetitions and find a transversal, or simply the union of the list elements.

The built-in `Union` sorts the data, then takes one of each run of adjacent identical elements, which is quite fast. If every integer in the given range appears at least once, the length of the result will be 10,000.

If we provide our own equivalence relation as an option, Mathematica switches to a slower algorithm. Version 3 used sorting in all cases, and it was then recognized that the command failed with user-supplied equivalences that were not compatible with canonical ordering. The equivalence relation `SameQ` gives the same result as the built-in default, but the computation is much slower.

The mathematical discipline that makes these notions precise is universal algebra [2], so let us have a look at the theory behind equivalences and the orders that generate them.

Converted by Mathematica

[Article Index] [Next Page]