Genome Vectorizer

Writing proceedings for the SVG Open 2010 conference

Keywords: bioinformatics, visualization

Elena Kochetkova
Graduate Student

San Jose State University
Department of Computer Science

San Jose
1 Washington Street
USA

Biography

Elena Kochetkova is a graduate student majoring in Computer Science at San Jose State University. Elena has received her B.S. degree at University of California, Davis in 2004. Her primary area of interest is Data Mining and Information Visualization for Bioinformatics Researches. While pursuing her undergraduate degree at UC Davis, Elena had been working in Dr. Michelmore's lab doing bioinformatics programming for the research related to tomato and lettuce genetics and breeding as well as research on classical and molecular genetics of disease resistance in Arbidopsis, lettuce, and tomato. Two of her major programs are GenomePixelizer - a tool that helps visualizing the relationships between duplicated genes in genome(s) and between the members of gene clusters, and GenBank Parser - a script designed to translate the region of DNA sequence specified in the CDS part of each gene into a protein sequence. In 2002 Elena received John Moran Award for the contribution to the research in Plant Pathologies.

Naidele Katrumane Manjunath
Graduate Student

San Jose State University
Department of Computer Science

Biography

Naidele Katrumane Manjunath received her B.E degree in Computer Science from Visvesvaraya Technological University, India in 2008. She is currently a graduate student at San Jose State University and is majoring in Computer Science. Her field of interest is wireless networks and network security.


Abstract


Genome study requires information pertaining to every gene in a chromosome. Visualizing information of how a gene is related to other genes in the same chromosome or to genes in other chromosomes will make the understanding of the relations more clear. Available genome visualization tools provide mainly the view of the overall genome [Websites 2,3,4] and some tools have features like zooming into a particular chromosome and viewing the gene in detail[Kozik02].

In this paper we present a new capability of 'Genome Vectorizer' formerly known as 'Genome Pixelizer SVG-fied', where a gene can be dragged out of a chromosome for more detailed analysis. This feature is an extension to 'Genome Pixelizer SVG-fied' which is a genome visualization tool that has been developed by Elena Kochetkova [Kochetkova09].

'Genome Vectorizer' is a re-designed version of GenomePixelizer, written in 2002 in TCL/TK, a tool that allows for the detection of duplication events in genomes. It capitalizes on the benefits of using XML, XSLT, and SVG technologies combined with JavaScript scripting "to provide users with simpler interface maximized interactivity; as well as, improved efficiency of genetic data analysis"[Kochetkova09].

The gene dragging capability will provide users with much more interactivity, since often the users focus on a particular gene rather than the entire chromosome. Once the common genes are matched between chromosomes the user can drag the gene of interest out along with all the details pertaining to it.


Table of Contents


1. Introduction
2. Motivation
3. Biological Background
4. Previous Work
5. Implementation
6. Functionality
7. Future Work
8. Summary
Bibliography

1. Introduction

Genome Vectorizer is a genome visualization tool developed by Elena Kochetkova. This tool presents the benefits of using XML, XSLT, and SVG technologies combined with JavaScript scripting in re-development of the genome visualization tool to provide users with simpler interface, maximized interactivity; as well as, improved efficiency of genetic data analysis.

This paper presents the extended version of Genome Vectorizer. The previous version provided the users with interactivity but did not allow the user them to drag a particular gene and its components. The tool presented here will provide this option to the user making the Genome Vectorizer more convenient for the user. This is done by making use of scalability and dragging properties of SVG.

The extended GenomePixelizer SVG-fied uses the same files as GenomePixelizer SVG-fied. An xml file has details pertaining to the chromosome and genes. It has a matrix called the distance matrix where in the distance between two genes is specified. XSL stylesheets, java script and jquery is used to provide a interactive feature to the tool.

2. Motivation

Bioinformatics involves the application of technology in the study of complex biological problems. There are several research work being conducted pertaining to bioinformatics, especially the analysis of genome data. They all result in tremendously large database of genes and require analyzing. The demand for tools to visualize any process involving the genome is great since bioinformatics is a highly visual science. The difficulties in understanding the resultant data can be overcome by visualizing the information, since experimental results become clearer when viewed graphically rather than seeing the raw values.

3. Biological Background

Gene is a unit of hereditary and contains a sequence of DNA ( Deoxyribonucleic Acid) codes. The codes are represented by the alphabets A, G, C, T. If two genes when aligned, that is placed beside each other have a sequence of these codes matching then they are said to be similar.

Homology is the similarity in the characteristics of organisms due to their common ancestors. Homology helps in shedding light on different issues such as tracing the evolution of organisms, understanding the structure of the chromosome, detecting duplication segments within and between chromosomes and other facts pertaining to the organism. Homology is concluded between organisms by the evidence of similarity. Organisms have to sufficiently similar to be considered as homologous. "Percent identity" is the degree of similarity of two or more sequences. If sequences have high percent identity it might be because they are homologous. That is homology is inferred once a certain similarity threshold is crossed.

Gene co linearity and gene synteny are two of the many ways used in detecting homology in chromosomes. An important technique in tracing homology is gene comparison. The comparison is performed by aligning chromosomes. Aligning genes involves pair wise alignment of the nucleotides. [Kochetkova09]

4. Previous Work

'The idea of a genome visualization tool is not new. Many of the tools available provide highly interactive user interface but they lack in user friendliness. The concentration of too many connecting lines between the chromosomes makes it difficult for the user to analyze it. These tools have their own strengths and weaknesses.

The properties of SVG make 'Genome Vectorizer' unique when compared to these other tools. It provides dragging capability as shown in Figure 2. It lets drag an entire chromosome out to a separate location on the screen for observation. The user was not able to focus on which genes are connected since the genes are mostly clustered together very closely inside a chromosome. The feature provided by the extended version will let each gene to be extracted out. When the user extracts a gene its connecting genes as well as the lines connecting to them move along with it.

5. Implementation

The important part of implemntation is seperating the gene cluster from the chromosome and allowing it to be dragged out.

	
		function Drag(evt)
		{
			GetTrueCoords2(evt);
			if (DragTarget)
			{
		            var newX = TrueCoords.x - GrabPoint.x;
		            var newY = TrueCoords.y - GrabPoint.y;
			    DragTarget.setAttributeNS(null, 'transform', 'translate(' + newX + ',' + newY + ')');
			    for (var i=0;i<DragLines.length;i++) {
				var gene=document.getElementById(DragLines[i][1]);
		                newX1 = gene.getAttribute('x1');
				newY1 = gene.getAttribute('y1');
				DragLines[i][0].setAttributeNS(null, 'transform', 'translate(' + newX + ',' + newY + ')');
	                    }
		         }
		};
	
	

The code for dragging is be as mentioed above.The function Drag is used to capture an drag event and calculate the co-ordinates for the new position.

6. Functionality

The basic look and feel of the Genome Vectorizer is as shown in the Figure1. The tool takes in input from the user. The user can enter the percent identity required. Percent identity as described is the percetage of homology between the two chromosomes. The 'get matrix' button pops up a window in which a matrix of the genes of chromosome are shown. The grid format enables the user to get the exact value of the percent identity between any two pairs of genes. Once the user gives the input and clicks on the 'Re-draw' button the graphical view of the common genes with percent identity of the specified value and more, between the chromosomes and within it is shown. Lines connect the common genes. Arcs are used to show the similarity within one particular gene itself.

second.png

Figure 1: The genome vectorizer

Now we provide the added functionality of gene dragging similar to that shown in the Figure 2. Each gene which is shown in the chromosome can be dragged out. When the user drags a gene out all the other genes which it is connected also are dragged along with it. The user can place it anywhere on the white space available and zoom into the gene.

third.png

Figure 2: Dragging capability provided by genome vectorizer

The rapidly growing field of bioinformatics and genome study requires the integration of data from various resources like databases available online and results of experiments. This tool will provide the user the option of connecting to the National Center for Biotechnology Information (NCBI) database. When the user clicks on any link the user will end up on a page of the gene on the NCBI site.

7. Future Work

The data required for the tool is collected and stored in a XML file manually. This is tedious for the user. To overcome this problem a parser can be created which parses the chromosome details from databases available into an XML file. A broad range of interactivity can be added to the tool. One of the features can be to let the user dynamically change the chromosome.

8. Summary

Most visualization tools for genomes represent the data in a very complex way and have a lot of complicated files involved. Using SVG has made it much simpler. This tool provides users more interactivity but with lesser complexity. SVG being an open standard is an added advantage for this tool.

Bibliography

Articles

1. "GenomePixelizer SVG-fied". Elena Kochetkova. Student. San Jose State University. . 2009. 12.

2. "GenomePixelizer - a visualization program for comparative genomics within and between species". A. Kozik, E. Kochetkova, and R. Michelmore. 2002. 12.

3. "A New Horizon for SVG: Bioinformatics". Benjamin G.S. Horsman. Student/Researcher. Simon Fraser University. . 1-1-2003. 8.

4. "SVG Bioinformatics Collaboration Tool". Dr. William Rutherford. Research Faculty. Group for Advanced Information Technology,Vancouver . . 1-1-2003. 10.

5. "BioViz: Genome Viewer". Development of an SVG GUI for the visualization of genome data . Christopher T Lewis, Steve Karcz, Andrew Sharpe, and A.P. Parkin Isobel. 07-Nov-2003. 10.

6. " Bioinformatics visualization and integration with open standards: The Bluejay genomic browser ". Andrei L. Turinsky, Andrew C. Ah-Seng, Paul M. K Gordon, Julie N. Stromer, Morgan L. Taschuk, Emily W. Xu, Christoph W. Sensen, and . . Nov-18-2004. 20. Copyright 2004 Bioinformation Systems e.V..

Websites

http://toddot.net/projects/GD-SVG/index.shtml . GD::SVG - Seamlessly enable SVG output from GD scripts . Todd W. Harris. Copyright 2003 Todd Harris and the Cold Spring Harbor Laboratory.

http://www.broad.mit.edu/annotation/argo/ . Argo Genome Browser . Copyright 22 May 2009 BROAD Institute.

http://mkweb.bcgsc.ca/circos/ . Circos . Martin Krzywinski. Copyright 2004-2010 Martin Krzywinski |.

http://www.sanger.ac.uk/Software/Alfresco/ . Copyright 27 May 2008 Sanger Institute.

XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.