Example: LU Decomposition

Here is the code for actually doing an LU decomposition (factorization) from the package `LinearAlgebra`GaussianElimination`. Since this functionality exists in the kernel, the real purpose of this package is pedagogical, but I think the original author has gone too far with the intent to stick to standard C-looking code so that people familiar only with procedural code can understand it. The result is, in my opinion, a prime example of how you should not write code in Mathematica. Let us see why, and how we can improve it.

Here are two test examples so that we can compare timings of different versions.

This computes the ratio of the timing of the package function to Mathematica's internal algorithm.

Ouch! On my computer, the machine case is over 1000 times slower and the bignum example is over 5 times slower. Before screaming an exasperated "Mathematica is just too slow", let us try to improve the code, keeping in mind packed arrays for the machine number case.

The first thing I see is that this code is using `For` to handle the loops. As I showed above, `For` requires much more evaluation than do some alternatives. In the inner loops of this command, this will be particularly noticeable, so changing from `For` to `Do` is worthwhile.

The results are identical, but there is a clear speed improvement.

The next change will help make the code more readable. At this point, it will not seem to make much of a speed difference, but it does help. The idea is to replace the lines of code finding the correct pivot to use with a function.

This finds which of the possible pivots (in column c, rows c and above) is largest in absolute value, and swaps with the one in row c.

This, in turn, uses a simple function.

And putting it into the main function we get the following.

As before, results are identical.

Still going in the right direction, but no big savings. The place to look for big savings is, of course, the innermost loop. The next change replaces the innermost loop by giving Mathematica commands that operate on all the elements of a list.

A big speed improvement, but we no longer have identical results.

The reason the results are different is that the order of operations will be slightly different with the new code. Floating arithmetic is unfortunately subject to small differences in the order of operations. As long as the differences are small, they are not of significance. In this example, the differences are quite small.

With bignums, we also got a substantial speed improvement.

So replacing the innermost loop was spectacularly successful. There is still a set of nested loops which can be worked on though.

The results are the same as above with a decent speed improvement.

It is worth noting that even though the machine number case is still substantially slower than Mathematica's internal code, the bignum case is nearly as fast.

Now is the point at which we have to use Mathematica's compiler to get additional speed improvements.

A `CompiledFunction` object cannot return two separate types of data, which is what we need since the pivots should be integers. To work around that, you can compile just the loop part of the command, leaving the generation and evaluation of the pivots outside the compiled function interpreter.

The last argument to `Compile` tells it what type to expect from the objects which require evaluation external to the `CompiledFunction` interpreter.

This makes an LU factorization command which returns both LU and pivots.

It is important that the scoping of `Block` is used instead of `Module` here. When the `CompiledFunction` uses an external evaluation, it literally uses `pivot`, so a localized version would not be recognized.

The results are slightly different again, but the new command is definitely faster. Realistically, this is about as close as you will be able to get to Mathematica's internal code, which is highly optimized.

Converted by Mathematica      May 1, 2000

[Article Index] [Prev Page][Next Page]