Volume 9, Issue 4

Articles
In and Out
Trott's Corner
New Products
New Publications
Calendar
News Bulletins
New Resources
Classifieds

Editorial Policy
Staff and Contributors
Submissions
Subscriptions
Back Issues
Contact Information

T R O T T ' S C O R N E R

# The General Search Process

After parsing the objects selected in the pull-down menus (whether function, constant, number, or operation; that are either to occur or not), we form a Mathematica expression representing the search. To find the identities that match, we start with a list of all identities in the form of a list of strings of the identity numbers. Then for each object, a call is made to the prepared hash tables. The identity numbers returned are either joined or complemented with the current list (according to whether OR or AND was specified in the search). Such a look-up and reduction step is done consecutively for all the specified search criteria, resulting in a list of identity numbers that match them all. Because the look-up operations and list manipulations are fast in Mathematica, even complicated searches typically take a fraction of a second of CPU time, including all preprocessing and postprocessing (with the exception of searches for mathematically matching hypergeometric functions).

If present in the search, Mathematica patterns are treated differently. First, they are put into canonical form (carefully avoiding any evaluation). Then, as much as possible without actually calling a function like MatchQ, the mathematical meaning is inferred from a structural pattern (for instance, the structural pattern _ArcTan encodes the two functions ArcTan[z] and ArcTan[x, y]) and used in a hash table lookup. More complicated patterns that contain pattern tests and conditions are wrapped with HoldPattern and matched literally against held versions of all identities.

If specified, filter options to return only those identities that contain basic arithmetic operations and either only elementary functions or only integer functions are applied to the result. To decide which functions are present in an identity, pregenerated hash tables are used, too.

Finally, the matching identities are sorted by complexity using leaf count, byte count, and the number of functions and variables; this is remotely similar to the default measure used for "simplicity" in Simplify. We use a linear combination of byte count and leaf count to compensate for any large atomic expressions such as large integers. While on average we have 1 byte count leaf counts, many identities substantially deviate from this average. The following graphic compares leaf counts and byte counts for all identities from the Wolfram Functions site.

The complexities are precalculated for all identities, allowing for fast sorting.

A further possibility for sorting is to penalize extra functions. Suppose you make a search for all identities that contain the functions Cos, Sin, and Tan. Then identities that contain only these three functions (and basic arithmetic functions) would be returned first.