Volume 9, Issue 3
Tricks of the Trade
In and Out
Download This Issue
Staff and Contributors
WaveX: Extracting Wavelets from Seismic Data
Here are a few of the many particular design issues encountered in creating the tool suite, which may be of special interest to programmers.
One feature that an experienced Mathematica user will immediately notice is the use of a handle to the data (a string name), instead of the more familiar functional manipulation of lists of data. There are several reasons for this unusual design choice, the most important of which is that seismic data sets can be very large. If the data lists were returned directly by load routines and the user forgot to suppress the output, it is likely that the machine would hang for significant amounts of time as the front end attempts to format the output data. For WaveX manipulations, this might be tens of thousands of data points; for other manipulations of seismic data, this could easily be hundreds of thousands of data points on the low end. (Data sets of hundreds of gigabytes are not uncommon, though these are not read all at once into Mathematica.) The use of handles in this toolset is an attempt to shield the user from this effect.
The handles also provide a convenient means of grouping associated data, forming a sort of ad-hoc database. For example, a number of functions will place intermediate results in a symbol of the form name[handle]=data, which can then be retrieved for later analysis. Over time, utilities have been evolved for this tool suite to streamline the process and allow the user to determine what data is available after operations. The data can then be easily extracted for input into more traditional Mathematica functions, though of course the user must then take special care to suppress large output.
For a parallel to other functions, consider the use of handles to streams (e.g., InputStream) or to notebooks (e.g., NotebookObject). While in these cases, the handle is to data stored in an external program, the basic principle is similar.
There are a number of file formats for handling both seismic data and well logs. However, two particular formats are most common. The SEG-Y format for seismic data is a binary format, which typically has floating-point values stored in a rarely used IBM floating-point format, a header section using EBCDIC character encoding, and record headers in a binary integer format. The LAS format for log data, on the other hand, is an ASCII text format storing data in a simple columnar format, but with a series of headers that detail various useful sorts of log information. Each format provides its own challenges for the implementation of import/export routines.
One immediate design point that had to be considered was whether this functionality should tie into the standard functions Import and Export. In the end, this decision was driven primarily by implementation time constraints, and the simpler choice of creating separate functions for the import and export of these data types was used.
An issue regarding the reading of binary data had to do with efficiency. Seismic data sets are quite large, so care was first taken to provide mechanisms for extracting only the specified portions of the data. The use of stream manipulation capabilities via StreamPosition was quite valuable. Unfortunately, we are still constrained by 2 gigabyte limits in Mathematica's file-handling capabilities, but there is optimism that with the increasing availability of 64-bit computing, we will soon be able to deal with larger files.
Because the trace data was stored in a nonstandard format, we could not draw directly on the Experimental`BinaryImport function, or on the Utilities`BinaryFiles` standard package. Instead, a list of bytes was read in via ReadList[stream, Byte], which could then be transformed by internal conversion routines. The functions initially partitioned the list into groups of four bytes, which then were transformed into the corresponding floating-point numbers. A small compiled function (generated with Compile) was used for the transformation, which provided some speed improvement. However, it was discovered that if the flat list of input bytes were provided to a compiled function that processed a whole trace at once, a substantial performance improvement could be seen. The lesson is that compiled functions become more useful if we can move the entire processing loop inside, though unfortunately some sacrifice in modularity must be made with the current capabilities of Compile.
The SEG-Y and LAS formats were developed by industry organizations, but not all implementors follow the data format guidelines precisely. This meant that the routines had to be written to account for variations found in practice. This was particularly common for the LAS data format, where we encountered a number of files that were contrary to the design specification. The code had to be sufficiently modular to be easily adapted to these variations. Rule-based programming proved to be quite valuable in this context.
About Mathematica | Download Mathematica Player
© 2005 Wolfram Media, Inc. All rights reserved.