Paper presented at the 14th Eurographics Conference, Imperial College, March 26-28, 1996

The Molecular Object Toolkit: A New Generation of VRML Visualisation tools for use in Electronic Journals.

Omer Casher and Henry S. Rzepa

Department of Chemistry, Imperial College of Science Technology and Medicine, London, SW7 2AY. E-mail: o.casher@ic.ac.uk and rzepa@ic.ac.uk

We describe here the thinking behind our development of what we have termed a Molecular Objects Toolkit (MOT), a collection of VRML (Virtual Reality Modelling Language) authoring tools designed to accept as input popular molecular file formats and to produce as output VRML files. These tools are being integrated into MOzART 1.0, a molecular VRML editor, to allow for user-created VRML files. These tools are also being integrated into server-side cgi-bin programs to allow for dynamically-created VRML files from molecular data residing either in the Web server or in an external database server. Our objective is to produce a complete MOT library for teaching and research purposes, and to integrate them into the Hyperwave Server to allow for structured maintenance of the electronic documents and molecular data. The implications for how electronic chemistry journals, virtual libraries and electronic conferences might use these technologies are discussed.

Virtual Reality Modelling Language and Electronic Publishing in Chemistry.

Almost all of the world's scientific and technical literature is still published in primary form on paper, as indeed exemplified by these conference proceedings. In molecular science subjects in particular, this can be a particularly restricting medium. To cope, the subject has evolved an arcane and often obscure typeset symbolism which can easily lead to isolationism and lack of integration with other subject areas. Although experiments in electronic publishing in chemistry go back a surprising thirty years, the advent of the World-Wide Web (WWW) system in the last five years has introduced an unparalleled opportunity to "re-invent" the scholarly journal and the means by which advances in the subject are communicated.[2] HTML was introduced around 1990 as a mark-up language associated with the WWW, and provides support for text based information transmission and display. In 1992, features were added to HTML to add support for images, and via a protocol known as MIME (Multipurpose Internet Mail Extension) for other media types such as audio or video. Hitherto conspicuous by its absence was any support for three dimensional model media types. Whilst the molecular sciences now had a broad framework for mapping conventional printed journals to electronic form, in essence little infra-structure existed to advance the expression of the subject in this new medium.

Virtual Reality modelling language (VRML)[3] is a relatively recent innovation on the World-Wide Web for expressing complex three dimensional information on the Internet and which we believe holds much potential for molecular sciences. It is most simply expressed as a three dimensional extension to the two dimensional ASCII character set. In the latter, a single byte of information suffices to encode the quite complex shape of a letter, numeral or other character in the standard ASCII set. A local program (word processor, editor, World-Wide Web client) serves to convert this byte of information symbolic representations of some very specific 2D objects (the ASCII characters) is a particularly concise way of transmitting information. Encoding the actual shapes of the characters as a bit-mapped image would result in far larger files. In VRML 1.0, a set of three dimensional objects, such as spheres or cylinders, can be allocated a size, texture or colour, and position, and represented in a 3D space using a visualisation program. In the same way that a text file is a highly compact method of transmitting information where the task of screen rendering is performed locally, so VRML is a very efficient method for transmitting visually complex 3D information. A VRML file has the potential for being far more compact than a bit mapped 2D image or even a bit-mapped 3D animation file in MPEG format.

The adoption of VRML has been particularly prominent in those subject areas which are dominated by three dimensional concepts, such as the molecular sciences. However, whereas there exist a wide variety of tools for manipulating the fundamental ASCII character set (e.g. ranging from simple text editors to sophisticated page setting programs), far fewer tools exist for creating molecular VRML models. In part at least, this is because such tools require a reasonable knowledge of the Open Inventor[4] file format on which the VRML 1.0 specification is a self-consistent subset. The Open Inventor Toolkit itself is a C++ object class specific to 3D model. Having been ported to the Windows operating system and to all the major Unix platforms, Open Inventor makes a logical starting point for our VRML authoring development.

The EyeChem Module Suite

Our initial implementation of a VRML toolkit was based around our EyeChem suite of modules[5] that run within the IRIS Explorer visualisation system. Explorer's 3D rendering is based on Open Inventor and several EyeChem modules implement it. Extending EyeChem to produce VRML encoded representations hterefore required little modification.

EyeChem was initially developed to visualise molecular models. These include ball and stick and molecular surface representations to quantum mechanical calculations of molecular systems. Modules can be added whenever needed and interfaces for appropriate visualisations can be rapidly assembled. Recent additions to the EyeChem suite includes programs to automatically generate VRML files to represent the 3D scatter plot data from a structural database query[6].

Any toolkit also has to include a facility for generating hyperlinks within the 3D objects described, in an analogous manner to how HTML (hypertext mark-up language) introduces the concept of hyperlinks within a collection of ASCII characters. These different representations of molecular properties were integrated together with the aid of additional EyeChem modules to generate VRML files containing any necessary hyperlinks between the various rendered properties.

Taken together, these various tools have allowed us to construct a three dimensional equivalent to the two dimensional hyperglossary we have previously described[7], in which hyperlinks serve to establish connections between various molecular data expressed as three dimensional rendered objects on a computer screen. We used this concept to complement a television program by illustrating in a popular manner how the properties of the molecule dimethyl sulfate relate to the half-life of this species in the bloodstream.[8] In this, we believe we have now achieved a genuine advance in molecular visualisation over the more conventional medium of print, and one which integrates well with other communication media such as television.

The Limitations of VRML 1.0 for Molecules

Our work has exposed a number of limitations in the VRML 1.0 specification. For example, the file format is often ill-suited for our needs. In a 3D ball and stick molecular model, each sphere and cylinder needs to be explicitly defined and cannot be grouped into sets. Where one has molecules with tens of thousands of atoms and bonds, this becomes very unwieldy. Moreover, no VRML node exists that is appropriate for a protein ribbon representation, which molecular biologists use to represent higher order structures in very large molecules. We have circumvented this problem by defining our own nodes in a VRML 1.0 extension. The ribbon in the DNA example that is viewed in a VRML client such as Webspace is represented by an Open Inventor NURBS node. The main drawback of this approach is that most of the existing generation of VRML clients do not understand NURBS and therefore cannot view the ribbons.

Another limitation is that although compact for small molecules, VRML file sizes can be prohibitively large for larger molecules, especially when additional molecular geometries such as surfaces are required. Partially in response to the very large VRML files that we showed could be generated by molecular models, gzip compression was introduced in order to reduce file sizes, with the decompression having to be performed at the client end. Clearly however, this does not represent a scaleable solution to the problem of representing complex molecular data, and new solutions need to be found.

The Molecular Object Toolkit (MOT) for VRML Authoring.

Molecular Inventor, under development at SGI by Mark Benzel, is an Open Inventor node class specific for molecules. It includes nodes that hold atom and bond data, and nodes to display molecules and atomic surfaces. The MOT we are developing is a suite of Molecular Inventor-based programs that transcribe molecular data of interest into VRML. In a broad sense MOs are EyeChem-like modules that run as stand-alone programs without any Explorer graphical interface. MOT file readers load molecular data, whilst other MOTs can generate geometric representations such as surfaces or ribbons. The MOT VRML writer will create optimised VRML files of the geometric representations. We are currently implementing MOTs as stand-alone programs that will run in Web Severs using the cgi format. The advantage here is that molecular data can reside in the Web Server in whatever format it was created. Only when it is accessed will a VRML file of it be created dynamically. As VRML is still evolving, this will preclude the need to manually prepare a new VRML file if and when its format is modified. Moreover, the scientist need not know anything about VRML to publish their molecular data on the Web in this form.

MOzART 1.0: An "Open" Molecular VRML Editor

We are developing MOzART as an extensible stand-alone VRML authoring environment based on Molecular Inventor. It implements the MOTs to input the various molecular file formats and display a 3D model of the molecule and its associated properties. Various components of the 3D model can be selected and hyperlinks for these can be entered. Using the MOT VRML Writer, the 3D model can then be saved as VRML files.

Although MOzART will have most of the capabilities of EyeChem, without the Explorer overhead its performance will be far superior. One of the prime advantages however is the extensibility of EyeChem through the addition of modules. As MOzART will be an open system, extending the environment through the addition of programs will also be possible.

The Moving Worlds VRML 2.0 Proposal

The direction that VRML is heading in its second incarnation (V2.0) is hotly contested as several major vendors, including SGI, Apple, Sun and Microsoft, have submitted VRML 2.0 proposals. The proposed evolution that is fully compatible with the developments we have outlined above is the Moving Worlds Specification[9] by SGI in collaboration with Sony and WorldMaker. Although it contains the existing VRML 1.0 nodes, Moving Worlds does have several powerful features making it appropriate for our needs. Object behaviour is described by script nodes. Script nodes can contain Java[10] applets or any script type that the browser can interpret. Extensions to the specification are accomplished by node prototyping. An alternate representation node has a pointer to a more complex representation if the browser cannot handle the extension nodes. Examples using Moving Worlds specification are under preparation.[11]

The HyperWave (Hyper-G) Server

We have described above a scenario for developing electronic libraries and publishing using a three dimensional metaphor particularly appropriate for molecular sciences. Creating a complex and extensively cross-hyperlinked document collection has other implications that must be considered. For example as the document collection increases in size, the indexing of collections and maintenance of hyperlinks within text documents and VRML files can become a major problem. First generation World-Wide Web servers had few intrinsic tools to automate these processes. As part of the molecular VRML project, we implemented a HyperWave (previously known as Hyper-G)[12] server since November 1995 as part of a pilot project to create an on-line electronic journal. Hyper-Wave has numerous advantages over conventional WWW servers that make it particularly well-suited for electronic publishing. Of particular significance is its ability to communicate with all existing Hyper-G servers world-wide. If documents are moved or deleted the changes are propagated throughout the server and to all other servers.

We are implementing Molecular Object VRML Authoring Toolkit as Hyper-Wave cgi-bin programs. By running the MOs in HyperWave, we will be able to take advantage of the ability to add gateways in HyperWave to remote databases using the stateful protocols that HyperWave can support. This in turn would in principle at least allow access to the vast storehouse of information available in existing molecular databases. We are investigating mechanisms whereby HyperWave can retrieve the file from a molecular database, and which in turn can be converted on-the-fly to VRML for local viewing whenever required.

Conclusions

Much of the technology described in this paper relates to how the visually complex subject of chemistry can be integrated into new mechanisms for exchanging information such as electronic journals, virtual libraries or conferencing mechanisms. We have discussed here the development of a new generation of visulisation tools appropriate for electronic publishing in chemistry. An enormous amount of work still needs to be done, but a vision of how we might be operating in the future is already emerging.

Acknowledgements: We thank in particular Peter Murray-Rust, Christopher Page and Christopher Leach for many helpful discussions and contributions to the work described here.

References

1 This paper is on-line as http://www.ch.ic.ac.uk/rzepa/eg/

[2] H. S. Rzepa, B. J. Whitaker and M. J. Winter, >Chemical Communications, 1994, 1907; O. Casher, G. Chandramohan, M. Hargreaves, C. Leach, P. Murray-Rust, R. Sayle, H. S. Rzepa and B. J. Whitaker, J. Chem. Soc., Perkin Transactions 2, 1995, 7; H. S. Rzepa, "The Future of Electronic Journals in Chemistry". Trends in Analytical Chemistry, 1995, 14, 464; B. J. Whitaker and H. S. Rzepa, "Chemical Publishing on the Internet", Conference on Chemical Information, Nimes, France, October, 1995; D. James, B. J. Whitaker, C. Hildyard, H. S. Rzepa, O. Casher, J. M. Goodman, D. Riddick, P. Murray-Rust, "The Case for Content Integrity in Electronic Chemistry Journals: The CLIC Project", New Review of Information Networking, 1996, in press.

3 G. Bell, A Parisi, M. Pesce, "The Virtual Reality Modeling Language", November 1994. See http://www.eit.com/vrml/vrmlspec.html.

[4] J. Wernecke, "The Inventor Mentor: Programming Object-Oriented 3D Graphics with Open Inventor(TM)" Re. 2, Reading, Massachusetts: Addison-Wesley Publishing Company, 1994.

5 O. Casher, H. S. Rzepa and S. Green, "EyeChem 1.0: A Modular Chemistry Toolkit for Collaborative Molecular Visualisation.", J. Mol. Graphics, 1994, 12, 226. See http://www.ch.ic.ac.uk/jmg/CRG.html; O. Casher and H. S. Rzepa, "Chemical Collaboratories using World-Wide Web Servers and EyeChem Based Viewers", J. Mol. Graphics,1995, 13, 268.

6 O. Casher, C. S. Page and H. S. Rzepa, paper presented at the 2nd Electronic Computational Chemistry Conference (ECCC2), November 1995; see http://www.ch.ic.ac.uk/eccc2/ This paper is also due to be published in Theochem, 1996.

7 C. Leach, P. Murray-Rust and H. S. Rzepa, "Electronic Conference on Trends in Organic Chemistry", (Eds H. S. Rzepa, J. M. Goodman and C. Leach), June, 1995.

8 O. Casher, H. S. Rzepa and D. A. Widdowson, supplemental material to the television program Equinox, transmitted on Channel 4 (UK) on November 29, 1995; see http://www.ch.ic.ac.uk/equinox/

9 C. Marrin et. al, "Moving Worlds Specification", February 1996. See http://webspace.sgi.com/moving-worlds/spec/spec.main.html

10 "Java(TM): Programming for the Internet", Sun Microsystems, Inc., 1995. See http://java.sun.com/ For chemical examples of Java, see http://www.ch.ic.ac.uk/java/

11 O. Casher, H.S. Rzepa, "Molecular Moving Worlds", 1996. See http://www.ch.ic.ac.uk/VRML/mmw.html

[12] See K. Schmaranz, "Hyper-G and Electronic Publishing", in "Hyper-G. The Next Generation Web Solution", H. Maurer (Ed), Addison-Wesley, 1996.