![]() Volume 9, Issue 2 Articles Tricks of the Trade In and Out Trott's Corner New Products New Publications Calendar News Bulletins New Resources Classifieds Download This Issue Editorial Policy Staff Submissions Subscriptions Advertising Back Issues Contact Information |
Tricks of the Trade
Symbolic Analysis of DNA Sequences Jose Manuel Gutiérrez DNA chains can be viewed as symbolic sequences of four possible nucleotides--adenine (A), cytosine (C), thymine (T), and guanine (G)--that define the structure of amino acids to form proteins. DNA sequences are available at the Given a symbolic sequence The following commands implement an efficient version of the chaos-game algorithm for an IFS consisting of four transformations. We map each symbol,
Symbolic sequences can be extremely long so it is important to efficiently load and process such data. Moreover, if the data is available via the internet, there is no need to store it locally. Rolf Mertig supplied code for opening and reading from a URL stream using J/Link.
The following code reads in DNA sequences as a string of characters, separately printing out descriptive information.
There are two types of sequence identification numbers: GI numbers (a series of digits that are assigned consecutively by NCBI to each sequence it processes) and version numbers (which consist of the accession number followed by a dot and a version number). Either format can be used. Now, we use the previous command to import the mouse mitochondrion and man mitochondrion complete genomes.
Both genomes define similar patterns (they are close in the phylogenetic tree: mitochondrial eukaryotes, vertebrata). Perhaps it is, after all, hard to answer the question, "Are you a man or a mouse?"
|
||||||||
About Mathematica | Download Mathematica Player Copyright © 2004 Wolfram Media, Inc. All rights reserved. |