EquivalencesOne type of question keeps reappearing in Usenet discussions and Mathematica user groups. It comes in many forms and recognizing that it is a question about equivalences is already half the answer. Here is one formulation:
Given a set (or list), partition it into its equivalence classes. Each element belongs to exactly one class and two elements in a class are equivalent. Instead of the classes themselves we may be interested simply in the indices of the elements. From the indices we can easily find the elements, but not vice versa. The equivalence relation is given as a function of two arguments that returns This program by Xah Lee appeared on the newsgroup comp.soft-sys.math.mathematica in March 1998 [1]. (In another discussion, the same function was named
The idea is to pick the first element, find all remaining elements that are equivalent, then remove them and continue with the remaining elements until no more remain. Instead of working with the elements we use their indices and access the element with index with The default predicate generates the trivial equivalence class where each element is only equivalent to itself. The resulting index set allows us to quickly locate where the identical elements appear in the original list. Once we have the indices, we can easily get the elements themselves.
For the trivial equivalence
For an example, let us group all built-in Mathematica symbols by their length. First we obtain a list of all symbol names in the
Here we measure the time it takes to split the list into lists of symbols with the same length. Using pure functions (of two arguments) is a convenient way to express arbitrary equivalences.
There are this many different lengths of built-in symbols.
One of the equivalence classes is shown here. Note that it is not necessarily the one with the shortest symbols. The ordering of the classes is not defined.
We can speed up this computation significantly by first sorting the symbols by their lengths and then splitting the result at those places where the lengths of adjacent elements differ. The predicate for
This method has a complexity of , the complexity of sorting. The longer the input the larger the difference in run time between the two methods. Intuitively, the reason that this trick works is that the equivalence relation alone gives no information beyond a simple yes/no, and a negative answer does not help us at all. Essentially we have to compare every element with every other one. An ordering relation gives more information; consequently, sorting can be done much faster. Because of the sorting, the first equivalence class is the one with the shortest symbols.
Note how we found the ordering predicate
This column was motivated also by an analysis of a deliberate slowdown introduced in the
To demonstrate the effect, let us generate a long list of random integers with many repetitions and find a transversal, or simply the union of the list elements.
The built-in
If we provide our own equivalence relation as an option, Mathematica switches to a slower algorithm. Version 3 used sorting in all cases, and it was then recognized that the command failed with user-supplied equivalences that were not compatible with canonical ordering. The equivalence relation
The mathematical discipline that makes these notions precise is universal algebra [2], so let us have a look at the theory behind equivalences and the orders that generate them. Converted by Mathematica |