The Mathematica Journal
Feature Articles
New Products
New Publications
News Bulletins
Write Us
About the Journal
Staff and Contributors
Back Issues
Download this Issue


One type of question keeps reappearing in Usenet discussions and Mathematica user groups. It comes in many forms and recognizing that it is a question about equivalences is already half the answer. Here is one formulation:

Given a set (or list), partition it into its equivalence classes. Each element belongs to exactly one class and two elements in a class are equivalent. Instead of the classes themselves we may be interested simply in the indices of the elements. From the indices we can easily find the elements, but not vice versa. The equivalence relation is given as a function of two arguments that returns True if the two arguments should be considered equivalent, and False otherwise.

This program by Xah Lee appeared on the newsgroup comp.soft-sys.math.mathematica in March 1998 [1]. (In another discussion, the same function was named MatchingPositions.)


The idea is to pick the first element, find all remaining elements that are equivalent, then remove them and continue with the remaining elements until no more remain. Instead of working with the elements we use their indices and access the element with index [Graphics:../Images/index_gr_2.gif] with li[[i]]. The result is first built up in a recursive structure (a stack), then converted to a list in one step, using Flatten. This method has complexity [Graphics:../Images/index_gr_3.gif], where [Graphics:../Images/index_gr_4.gif] is the length of the input list.

The default predicate generates the trivial equivalence class where each element is only equivalent to itself. The resulting index set allows us to quickly locate where the identical elements appear in the original list.

Once we have the indices, we can easily get the elements themselves.


For the trivial equivalence SameQ the result of EquivalenceClasses[l] is a partitioning of l into lists of identical elements.

For an example, let us group all built-in Mathematica symbols by their length. First we obtain a list of all symbol names in the System` context.


Here we measure the time it takes to split the list into lists of symbols with the same length. Using pure functions (of two arguments) is a convenient way to express arbitrary equivalences.


There are this many different lengths of built-in symbols.


One of the equivalence classes is shown here. Note that it is not necessarily the one with the shortest symbols. The ordering of the classes is not defined.


We can speed up this computation significantly by first sorting the symbols by their lengths and then splitting the result at those places where the lengths of adjacent elements differ. The predicate for Split is actually the reverse: the list is split where the predicate is False, so we use [Graphics:../Images/index_gr_14.gif] instead of [Graphics:../Images/index_gr_15.gif].


This method has a complexity of [Graphics:../Images/index_gr_18.gif], the complexity of sorting. The longer the input the larger the difference in run time between the two methods. Intuitively, the reason that this trick works is that the equivalence relation alone gives no information beyond a simple yes/no, and a negative answer does not help us at all. Essentially we have to compare every element with every other one. An ordering relation gives more information; consequently, sorting can be done much faster.

Because of the sorting, the first equivalence class is the one with the shortest symbols.


Note how we found the ordering predicate StringLength[#1]<=StringLength[#2]& from the equivalence relation StringLength[#1]==StringLength[#2]& by replacing the equality by <=. The key is to find an ordering that is compatible with the equivalence. Mathematica's default ordering predicate OrderedQ[{#1,#2}]& would not have worked. It is not compatible with the desired equivalence, because it does not guarantee that strings of the same length end up adjacent to each other; rather, it sorts strings lexicographically.

This column was motivated also by an analysis of a deliberate slowdown introduced in the Union function of Mathematica 4. The union of a list of elements can be found by first finding the equivalence classes, then taking just one of the equivalent elements (called a transversal).


To demonstrate the effect, let us generate a long list of random integers with many repetitions and find a transversal, or simply the union of the list elements.


The built-in Union sorts the data, then takes one of each run of adjacent identical elements, which is quite fast. If every integer in the given range appears at least once, the length of the result will be 10,000.


If we provide our own equivalence relation as an option, Mathematica switches to a slower [Graphics:../Images/index_gr_25.gif] algorithm. Version 3 used sorting in all cases, and it was then recognized that the command failed with user-supplied equivalences that were not compatible with canonical ordering. The equivalence relation SameQ gives the same result as the built-in default, but the computation is much slower.


The mathematical discipline that makes these notions precise is universal algebra [2], so let us have a look at the theory behind equivalences and the orders that generate them.

Converted by Mathematica      

[Article Index] [Next Page]