The Mathematica Journal
Feature Articles
New Products
New Publications
News Bulletins
Write Us
About the Journal
Staff and Contributors
Back Issues
Download this Issue

Discrete Grayscale Image Manipulations

While you can manipulate color images, for purposes of illustration it's much easier to work with grayscale images. They can be represented as simple rectangular matrices of numbers, which can be manipulated as mathematical objects. (Color images must be represented as matrices of red, green, and blue triplets, which are more complicated to deal with.) So, we'll start our manipulations by extracting the image data from the expression, scaling it so the values run between 0 and 1, and adding up a weighted average of the red, green, and blue components of each pixel (the weights are based on the sensitivity of the eye to each color).


We can use the Dimensions function to see how many rows and columns are in our image.


Let's look at one line of this data set (row number 119 is about half way down the image).


Values near 0 are dark, and values near 1 are light. As you can see from just one line, there are a lot of numbers in this image (47600 to be exact).

We can see the graylevel image using a Show command.



Now that we have the data set as a simple two-dimensional matrix of numbers, it's easy to do all sorts of things to it. For example, if we subtract each element in it from 1, we will get an effect like a photographic negative (remember the data set is scaled to have values between 0 and 1). Note that we are making use of the fact that in Mathematica you can perform mathematical operations on nested lists and the operations will automatically be applied to each element. So in this example, we are in fact subtracting each element of the matrix from 1, individually.



It's interesting to use Mathematica to understand what's going on inside programs like PhotoShop. Understanding the mathematics behind visual manipulations can help you appreciate them and use them better.

Let's start with convolution. Many common image processing operations, like blurring, sharpening, or edge highlighting, can be described mathematically as convolutions.

To do a convolution, you start with a kernel, which is usually a fairly small matrix whose elements add up to a total of 1 (they don't have to, but this usually works best, and we'll see why in a minute).

Let's use the example of the matrix


as our kernel.

Think of the original image as a matrix of numbers, like this small subsection taken around the area of the left eye.


The nine numbers in the top left corner form a [Graphics:../Images/index_gr_18.gif] submatrix, which we can multiply (element by element) with the kernel, then add up the resulting numbers to give a single result.


This result will become one pixel in the output image. To get the next pixel, we slide over by 1 and multiply/add the kernel with the next submatrix (the one starting at the first row, second column in the image):


Repeating this process of multiply/adding the kernel with every possible [Graphics:../Images/index_gr_23.gif] submatrix in the original image, we can build up a new matrix in which each number is the result of a single operation like the two above. Here is what we get.


Notice that the first two numbers in the result are the same two we did individually. Also notice that the result is smaller by two rows and columns than the original. This is because in order to construct a [Graphics:../Images/index_gr_26.gif] submatrix around the outermost set of elements, you would have to invent numbers for elements that fall outside the original matrix. (There are optional arguments to ListConvolve that allow you to do just that, and get back a result the same size as your original, but that is beyond the scope of this article.)

Let's see what this looks like, applied to the whole original image.



What you're doing is averaging each group of nine numbers. That is, you're adding them all up and dividing by the count (9), except that you did the division first by having each element of the kernel be [Graphics:../Images/index_gr_29.gif]. So, the resulting image is a sort of [Graphics:../Images/index_gr_30.gif] moving average of the original.

That's why having all the elements in the kernel add up to 1 is useful: It means that the overall scaling (which translates into brightness) of the output image is the same as that of the original. You can blur the image even more by simply increasing the size of the kernel. This results in the moving average being taken over a larger area. For example, we can use the Table command to construct a [Graphics:../Images/index_gr_31.gif] matrix where each element is [Graphics:../Images/index_gr_32.gif].



Blurrier! Let's try a slightly more interesting kernel.



In this case, the elements add up to 0, not 1. If the pixel values in the original image are all fairly similar in a given area, the multiply/add operation will add up to 0 (black). So, it's only where the pixel values are changing that the resulting image has brighter pixels. This gives you a simple sort of edge detection.

If you try this you'll see these images coming out in a couple of seconds. How is it possible to do so many matrix operations in a fraction of a second? After all, our computer must be performing 47600 matrix multiply/add operations, one for each pixel, and each matrix operation involves dozens of multiplications and additions. That's a lot of multiplying and adding...

When you start to talk about what Mathematica is actually doing, you often find that it's nothing like what you expect. Let's do a little test:


We took a matrix with 250,000 elements and convolved it with a matrix with 10,000 elements. If we count each multiplication and each addition as one operation, then it should take 20,000 operations to calculate each output pixel (10,000 multiplications and 10,000 additions). The total number of operations thus would be:


(Note that the output image is only 400 by 400 pixels, as discussed above.) Divide the number of operations by the time it took, and we get:


The computer this test was done on operates at a speed of 266 MHz. If this is what happened, how good is our computer at packing operations into clock cycles?


Wait a minute, it appears to be doing more than five floating point operations per clock cycle. How can that be? Is this a vector processing supercomputer with six or more parallel floating point units? Well, no. It can't even do one whole floating point operation per cycle; and if it could, it wouldn't be able to get the data in and out of memory fast enough to keep up the pace. There must be something else going on.

In fact, Mathematica is using a very clever FFT algorithm. By taking the discrete Fourier transform of both the data and the kernel, messing with them, and then taking the inverse transform, it is possible to get the answer using many fewer operations. This is a good example of how a better algorithm can be far more valuable than a better implementation of a naive algorithm. Say someone had spent a year writing the most efficient possible assembly language version of the multiply/add algorithm: It would still be probably 10 to 15 times slower than even a sloppy implementation of the FFT algorithm.

So, does PhotoShop use an FFT algorithm when it applies filters that use convolution? Probably not. And, truth be told, it probably wouldn't benefit that much anyway. The FFT algorithm's benefits are mainly visible when the kernel is very large, and PhotoShop's kernels are typically small. But it's something they might want to think about.

Converted by Mathematica      April 21, 2000

[Article Index] [Prev Page][Next Page]