The 
Mathematica Journal
Volume 9, Issue 1

Search

In This Issue
Tricks of the Trade
In and Out
Trott's Corner
New Products
New Publications
Calendar
News Bulletins
New Resources
Classifieds

Download This Issue 

About the journal
Editorial Policy
Staff
Submissions
Subscriptions
Advertising
Back Issues
Contact Information

XML and Mathematica
Pavi Sandhu

Working with NotebookML

Introduction

NotebookML is an XML format for describing Mathematica notebooks. It involves a mapping of a notebook expression to a similar XML tree structure. The names of elements and attributes in NotebookML are chosen to match the names of the corresponding parts of the original notebook expression. Here is an example of a simple notebook expression.

Here is the same expression in NotebookML.

Note the direct correspondence between the parts of the notebook expression and their XML counterparts. The conversion is done on the FullForm of the notebook expression. For example, while a list can be denoted by {} in a notebook, the underlying representation of the list is still List[]. Hence, in NotebookML, a list would be represented by a List element of the form <List>…</List>.

NotebookML is useful for exporting complete notebooks in XML format. However, you can also export individual cells, mathematical formulas, or other types of content in a notebook as XML, using what is called ExpressionML. This is a subset of NotebookML that enables you to save arbitrary Mathematica expressions in XML format. Here is the FullForm for a mathematical formula.

Here is the ExpressionML representation of the same formula. You can generate the NotebookML or ExpressionML for any type of notebook expression, using Export or ExportString.

NotebookML and ExpressionML are both 100% well-formed, standards-compliant XML. Hence, they make it easy to integrate Mathematica notebooks or parts of notebooks into any XML framework or workflow. The namespace for NotebookML and ExpressionML is specified by the URL www.wolfram.com/XML. Also, the DTD for NotebookML and ExpressionML is located at the URL www.wolfram.com/XML/DTD/notebookml1.dtd.

Syntax of NotebookML

Strings, Numbers, and Symbols

The String, Number, and Symbol elements are the only NotebookML elements that can directly contain character data. All other NotebookML elements are either empty elements or can only contain other elements. For example, here is a simple cell in Text style.

Here is the corresponding NotebookML.

Note that each string in the cell expression is represented by a String element in NotebookML. One benefit of using a String element to describe string data is that it provides a clearer indication to XML-processing applications that the whitespace inside the string data is significant.

In addition to strings, the contents of a cell can contain numbers or symbols. These structures are represented in NotebookML using the elements Number and Symbol. For example, the Mathematica expression

has the following NotebookML representation.

Symbol elements lack any special structure. As an example, consider the following Mathematica code fragment.

The corresponding NotebookML is shown below. The Function element is used to enclose the names of all built-in functions.

In notebooks, two-dimensional structures such as fractions, subscripts, superscripts, and so on, are represented by box expressions. Here is the expression for a cell containing both text and a box expression.

Here is the NotebookML representation of the above cell. Note the use of the TextData, BoxData, and SuperscriptBox elements in NotebookML to represent the corresponding parts of the cell expression.

Options

In Mathematica, various properties of notebooks, cells, and the contents of a cell are described by options. A notebook expression with options specified has the following underlying structure.

Each option has the general form name -> value, where the left-hand side specifies the option name and the right-hand side specifies the value of that option. In NotebookML, an option is represented using the form shown below.

For example, this notebook expression will include a ruler in the toolbar at the top of the displayed window.

Here is the corresponding NotebookML.

Multiple options are grouped together in an Options element. Although this is not necessary, it is useful (for ease of programming and efficiency reasons) in the context of many XML applications like XSLT. For example, the cell expression:

has the following representation in NotebookML.

Importing NotebookML

NotebookML files can be imported through the kernel using Import. The first argument of Import is the filename of the file you are importing. If the file carries the .xml or .nbml extension and is in NotebookML format, then Import will automatically convert the file into a notebook expression. Here is an example.

You can also specify the import format explicitly, by using "SymbolicXML" or "NotebookML" as the second argument to Import. This is useful when the file you are importing does not have a .xml or .nbml extension to indicate the nature of its contents.

The import process is much faster, especially for large files, if you specify "NotebookML" as the import format. However, you should only use this form if you are sure that the file is a NotebookML document.

You can also import the file as SymbolicXML using the following command.

Processing NotebookML

Converting between NotebookML and SymbolicXML

Suppose you have written a technical paper in Mathematica and now want to submit the paper to a journal that only accepts submissions in a specific XML format, say DocBook. You, therefore, need to transform your notebook into DocBook format. One way to do this is to use XSLT to transform the NotebookML output from Mathematica into DocBook.

However, you can easily perform the same transformation completely within Mathematica by using SymbolicXML as an intermediary. Mathematica’s pattern matching and symbolic processing can make the task quite easy. A schematic of the process might look like the following.

Here is a simple example of the type of manipulations you can perform. This command defines a rule for replacing tab characters ('\t') with four nonbreaking spaces. This is something one might want to do before presenting work on the web.

This applies the rule to a notebook called nb and exports the result as an XML file.

Preprocess

Preprocessing allows any function to be applied to the notebook before the XML conversion functions are applied to it. For example, one might want to ensure that all closed cell groups are open before proceeding with the conversion. This can be done with the following command.

One could then preprocess the notebook by using openCellGroups in the following way.

Postprocess

Postprocessing provides a handle on the SymbolicXML object, representing the NotebookML before it is exported. This gives you total freedom to manipulate the SymbolicXML in any fashion you choose. However, it is your responsibility to output legitimate SymbolicXML. Mathematica will try to automatically correct minor errors but if there is a more serious error, the export process will fail.

Here, we use a postprocessing rule to replace characters.

You could then postprocess the symbolic NotebookML by using changeChars in the following way.

Of course, you can use both pre- and postprocessing together.

Exporting NotebookML

Introduction

You can export notebooks as NotebookML using the Export function. Just as when exporting a notebook as HTML, when you export a notebook as NotebookML, graphics and boxes are automatically saved as GIF images.

The first argument for Export specifies the file to export to, and the second argument specifies the data to be exported. If you specify the file extension of the exported document as ".xml", then Export will automatically generate NotebookML. For example, the following will export the current notebook as a file named "anothertest.xml" in NotebookML.

If you export a Cell with the file extension .xml, Export will automatically generate ExpressionML. The following example will export the cell as ExpressionML.

If the filename for the exported data does not carry the .xml extension, then you must specify "XML" or "NotebookML" as the second argument for Export. The data for export is then specified as the third argument. Here is an example.

You can also export data using ExportString. This will return NotebookML for a notebook expression or ExpressionML for a cell expression without assigning the output a particular filename. The first argument for ExportString is the data to be exported, and the second argument is the desired export format. In this example, the notebook expression is exported as NotebookML.

Conversion Options for Export

You can use ConversionOptions to control the behavior of Export and ExportString. There are four conversion options available for exporting NotebookML.

  • "Annotations" includes or excludes XML declarations, DOCTYPE declarations, and style advisories in any desired combination.
  • "BoxFormats" specifies the export formats for various typeset box objects in the document.
  • "GraphicsFormats" specifies the export formats for graphics objects in the document.
  • "Stylesheets" will associate a style sheet (CSS or XSLT) for presentation on the web.

Annotations

The Annotations option takes a list whose elements can be any combination of the following values: "DocumentHeader", "DOCTYPEDeclaration", "XMLDeclaration", and "StyleAdvisories".

"DocumentHeader" determines if any header information should be added at the beginning of the NotebookML file. The "DOCTYPEDeclaration" and "XMLDeclaration" settings only take effect if "DocumentHeader" is specified as one of the annotations.

If "DOCTYPEDeclaration" is included in the list of annotations, then a DOCTYPE declaration is included in the header of the exported document. The default setting for "Annotations" includes "DOCTYPEDeclaration", as shown in the following example.

If you exclude "DOCTYPEDeclaration" from the list of Annotations values, the DOCTYPE declaration is omitted from the header of the exported document. Excluding the DOCTYPE declaration can sometimes shorten processing time taken to read the resulting XML document into an application, because the parser does not have to reference a DTD.

If "XMLDeclaration" is included in the list of annotation values, an XML declaration is also included in the header of the exported document. The default setting for "Annotations" includes "XMLDeclaration", as shown in the following example.

If "XMLDeclaration" is excluded from Annotations, it is also omitted from the header of the exported document. Excluding the XML declaration is useful if a user wishes to create a NotebookML fragment for insertion in another XML document.

If "StyleAdvisories" is included in the list of annotations, class attributes are returned for Cells and StyleBoxes that have styles associated with them. The default setting for "Annotations" includes "StyleAdvisories", as shown in the following example.

If you exclude "StyleAdvisories" from Annotations, the class attributes associated with Cell are omitted.

BoxFormats

BoxFormats exports box data as NotebookML, GIF, or MathML. This option can be set to one or more of the following values: Automatic, "GIF", or "MathML". The default setting is "BoxFormats"->Automatic.

With "BoxFormats"->Automatic, the boxes will be losslessly exported as NotebookML. This is the only box format setting that is guaranteed to be lossless.

With "BoxFormats"->"GIF", the box data will be represented as a GIF file. The exported NotebookML document will reference the exported GIF file by using an XHTML tag such as <img src="file.gif">.

With "BoxFormats"->"MathML", the box data will be exported as MathML, which will be embedded inside the NotebookML.

If BoxFormats has more than one value, then the BoxData parent element will have multiple child elements associated with it, one element per format.

For example, given some BoxData expression:

and the following values for BoxFormats

a NotebookML expression of the following form is generated.

GraphicsFormats

The "GraphicsFormats" option changes the format of notebook graphics into another format for export. This is often useful since some external applications do not support Mathematica notebook graphics. This option can be set to one or more of the following values: Automatic, "Bitmap", "GIF", "Metafile", "PICT", "PostScript", or "QuickTime".

Automatic is the default setting for this option. It is necessary to set "GraphicsFormats"->Automatic to ensure lossless import and export of a notebook.

For the "Bitmap", "Metafile", "PICT", "PostScript", and "QuickTime" values, the original notebook graphic will be converted to the specified graphic type and then exported.

For the "GIF" setting, an external GIF file is created from the original notebook graphic. An <xhtml:img src> element is inserted in the exported file. Here is an example.

If "GraphicsFormats" has more than one value, then multiple child elements are created for each specified format inside the GraphicsFormats parent element.

Stylesheets

You can associate your NotebookML documents with a style sheet (CSS or XSLT) by using the Stylesheets option. This option takes a list of rules that represent the pseudo-attributes in the XML-style sheet processing instruction. For more information about the various pseudo-attributes, see: www.w3.org/TR/xml-stylesheet.

The following is a list of rules for "Stylesheets".

Each of these rules must have a string as its value. For example, the following option

would result in the following XML statement.

Here is an example that uses this conversion option.

"Stylesheets" can also take a list of rules, each of which may take a list of values as a sublist. Each sublist corresponds to one xml-stylesheet processing instruction.

Using a CSS style sheet, NotebookML can be displayed in many current generation browsers. For example, Internet Explorer 5 or later and Netscape 6 or later have built-in support for CSS. The style sheet can even mimic the behavior of Mathematica’s environments (like Working, Printout, Presentation, etc.). To save a notebook as NotebookML with a CSS style sheet, use the Export function with a conversion option of "StyleSheets" pointing to the relevant style sheet. Here is an example.

If you save a notebook as NotebookML and use a CSS style sheet that contains definitions for the various elements, the resulting file can be rendered in a web browser. The advantage of this approach (instead of simply converting the notebook to HTML) is that you only need to create a single document, which can be viewed either in web browsers or in Mathematica. A non-Mathematica user can view the document in any web browser. But a Mathematica user will be able to open and edit the document as a notebook, evaluate the input, manipulate the graphics, and so on.



     
About Mathematica  Download Mathematica Player
Copyright © Wolfram Media, Inc. All rights reserved.