GenomePixelizer SVG-fied


Abstract


This paper presents the benefits of using XML, XSLT, and SVG technologies combined with JavaScript scripting in re-development of the genome visualization tool to provide users with simpler interface, maximized interactivity; as well as, improved efficiency of genetic data analysis.

GenomePixelizer is the Genome Visualization Tool that I co-developed in 2002. It was written using the TCL/TK toolkit and was designed to help with the visualization of the relationships between duplicated genes in genome(s) and to study the relationships between members of gene clusters [1].

GenomePixelizer proved itself useful [2, 3, 4] in the detection of duplication events in genomes, tracking the "footprints" of evolution, as well as displaying the genetic maps and other aspects of comparative genetics [1].

GenomePixelizer is not an intuitive tool to use. It provides a lot of functionality; however, it requires special data pre-processing: it takes in 3 input files (a file containing setup information, a file containing pre-processed genome information and a distance matrix file) and has a complicated user interface. GenomePixelizer requires the download and setup of the TCL/TK package. Large datasets need to be subdivided into smaller datasets and re-run through GenomePixelizer in order to see more detail.

The featured tool: GenomePixelizer SVG-fied is lightweight, dynamic and interactive. It takes in one XML file containing setup information, genome information and distance matrix and uses XML Style Sheets and SVG to plot genes over chromosomes and to identify duplicated genes. Furthermore, users can click on a specific gene and land on the NCBI entry for the specified gene. Since we are dealing with scalable graphics, users can also zoom onto the region of interest. In the near future users will be able to rearrange chromosomes by dragging them and move around clusters of genes for further analysis.


Table of Contents

Biological Background
Homology
Example: Pairwise Sequence Alignment
Overview of Homology Finding Algorithms
Introduction
GenomePixelizer – the prototype
Features of GenomePixelizer
GenomePixelizer SVG-fied vs. GenomePixelizer
Implementation
Data Sources
Visualization
Future Work
Bibliography

Quick search in Google for genome visualization tools using "Genome Visualization" keywords have produced the following hit: the Argo Genome Browser - Broad Institute's tool for "visualizing and manually annotating whole genomes" [5], Circos - tool that visualizes "intra- and inter-chromosomal relationships within one or more genomes, or between any two or more sets of objects with a corresponding distance scale" [6], Alfresco - visualization tool that "allows effective comparative genome sequence analysis", and GenomePixelizer - a tool that visualizes the relationships between duplicated genes in genome(s) [1].

The first three tools mentioned: Argo, Circos and Alfresco come as standalone programs or webstart applications and allow multiple various functionalities for effective genome analysis. GenomePixelizer only comes as a standalone program. It provides a lot of functionality and allows for effective interactivity, however it has quite complicated interface. It also requires special data pre-processing: it takes in 3 input files (a file containing setup information, a file containing pre-processed genome information and a distance matrix file) and requires the download and setup of the TCL/TK package. Large datasets need to be subdivided into smaller datasets and re-run through GenomePixelizer in order to see more detail.

While all four above-mentioned tools have their benefits and strengths, they all have stale interfaces: all interactivity is implemented by means of links and pop-ups. The user is not able to drag clusters of genes away from chromosomes and quickly rearrange them by clicking and dragging.

The proposed tool: GenomePixelizer SVG-fied is lightweight, dynamic and interactive. It takes in one XML file containing setup information, genome information and distance matrix and uses XML Style Sheets and SVG to plot genes over chromosomes and to identify duplicated genes. Furthermore, users can click on a specific gene and land on the NCBI entry for the specified gene. Since we are dealing with scalable graphics, users can also zoom onto the region of interest. In the future, this tool will allow for "dragging": users will be able to drag the clusters of genes out in order to take a close-up look.

The prototype tool - GenomePixelizer was designed to help in “visualizing the relationships between duplicated genes in genome(s) and to follow relationships between members of gene clusters” [10].

GenomePixelizer is a visualization tool that “generates custom images of genomes out of the given set of genes. Each element on the picture has a physical address defined by coordinates (pixels), hence the name “GenomePixelizer”” [10].

GenomePixelizer was specifically developed for the analysis of the “evolution of NBS-LRR encoding genes in Arabidopsis relative to other genome duplication events”[10].

The following paragraph lists features and highlights of GenomePixelizer as described on GenomePixelizer website (http://atgc.org/GenomePixelizer/):

  1. Written in Tcl/Tk and works on any computer platform (Unix/Linux, Windows, Mac) that support the Tcl/Tk toolkit.

  2. GenomePixelizer does not need to be compiled; it works like Perl or Python scripts, using the Tcl/Tk language interpreter which can be downloaded for free at www.scriptics.com or tcl.activestate.com.

  3. GenomePixelizer allows the display of desired features through the whole genome simultaneously. Generated images should fit into the user's computer monitor without scrolling. For larger genomes, it is possible to generate bigger images with a build-in scroll-bar.

  4. Simple and flexible input file may be set up, edited and modified using any spreadsheet editor (e.g. MS Excel or StarOffice). Researcher can easily manipulate the set of genes of interest, add new sets, change or remove old ones and re-run program on a fly

  5. Zoom in functionality, cluster viewing, minimal modification in the input file and some simple re-calculations allow the viewing of regions of high gene density in greater detail.

  6. Regions with high gene density can be drawn using automatic or manual correction. Manual correction may produce nicer images; however with large set of genes it takes time.

  7. GenomePixelizer allows the viewing of relationships between different sets of genes based on a distance matrix file.

  8. The source of sequences is not restricted to a single organism and it is possible to view relationships between different genomes.

  9. GenomePixelizer can be used to generate images of genetic maps with a given set of genetic markers. Instead of megabases, the size of chromosomes should be indicated in centiMorgans.

  10. Generated images can be captured by any screenshot program and incorporated into Web pages. You can also save the generated image as a PostScript file.

  11. GenomePixelizer can generate HTML ImageMap tags. This feature can be used to create "clickable" images for Web pages or online presentations.

  12. The source code is freely available and minimal code modifications can add new features to the program. [10]