Multivariate Parallel Coordinates Using SVG with Unemployment Data and Other Data.

Aaron Lemoine

Aaron Lemoine is currently a student at San Jose State University who is interested in developing software applications.


Abstract


Overview

SVGMG (SVG Multivariate Graph) is application for visualizing MPC data in SVG. SVGMP is able to upload an MPC XML dataset, which conforms to the MPC DTD developed for this application. The SVGMG application uses SVG for interaction and display. It uses modern SVG and JavaScript interactivity frameworks.

Background

MPC (Multivariate Parallel Coordinates) is a subject that is useful for thinking about data. MPC may be considered as related to Visual Data Mining, since it is used to find information by visual perception and may take part in the data mining process. MPC can be used to visualize data to make predictions, see trends, etc.

For example, an MPC application area, when applied to Visual Data Mining process, is Product Marketing. An application that helps visualize MPC may help product marketing advertisers predict trends about the marketing of a new product. MPC may allow for information gathering on continually growing datasets, which is also important for finding patterns and prediction. Product Marketing would benefit especially as new data is gathered with each advertising cycle.

What is MPC

MPC presents parallel coordinates designed for multivariate analysis, MPC displays many entities over several attributes on a coordinate system for visual analysis. Entities are represented by polylines with each point corresponding to an attribute's position and the entities value for the attribute.

MPC can be applied to any numerical dataset including unemployment data, weather data, car data, retail sales data, etc. Using MPC we can find easily detect patterns and correlations between any numerical dataset's attributes.

Architecture

SVG, jQuery, the jQuery SVG extension, HTML, JavaScript, and XML are combined to create SVGMG an interactive MPC application. SVGMG is HTML with a SVG component that draws parallel coordinates and allows user interaction. SVGMG uses HTML widgets which are useful for other interactivity as SVG lacks common widgets. Javascript along with jQuery and its extension handles the interactive events called by SVG and HTML widgets. SVGMG is able to use any MPC datasets by uploading and validating a XML.

Interactive SVGMG

Several problems may arise with MPC which SVGMG interactivity will solve. For large datasets polylines may blend together, SVGMG allows user interaction to change the color of a single polyline or a group of polylines to help data stand out. As comparison of two attributes is possible when next to each other, SVGMG is able to rearrange attributes and allowing comparisons of different attributes. It may be useful to change the value range of an MPC attribute, therefore SVGMG supports rescale attributes value ranges.


Table of Contents

Introduction
Need for Interactivity
Prior Work
My Tool
Implementation
Recoloring
Rearranging
Rescaling
Data Format
Future Work
Summary
Bibliography

Introduction

In our current society, databases continually grow as people use the internet, make purchases, etc. Its important to analyze the growing databases to discover patterns, which serves fields such as fraud detection, marketing, and future prediction. Data mining is a process to refine databases into smaller datasets, which are then applied to some algorithm to find useful patterns. Visual data mining is another approach that combines data mining with visualization to help find patterns with human visual perception. Examining figure 2 we can see several patterns by visual perception. Examining the fifth and sixth attributes, we can visually perceive a growth pattern in unemployment percentage.

This paper will focus on the visual data mining latter, creating a visualization with the processed data, in particular Multivariate Parallel Coordinates (MPC), which are a form of parallel coordinates designed for multivariate analysis. MPC displays many entities represented by polylines over several attributes, see figure 2. Each point on a polyline's X value contains the attribute location while the Y value contains the attribute value.

Need for Interactivity

MPC should be interactive as with static coordinates several problems will arise. With large datasets, polylines can be difficult to distinguish as they may mix together, some interactivity such as coloring lines would be useful to solve this problem. Attributes that are next to each other can be compared while those that are apart cannot be compared, so interactivity to move attribute positions are useful for comparing. Its also useful to rescale attribute range values to gain new incite on the data.

Prior Work

Other work such as Edsall's work [2] well covers the interactivity need mentioned, however using technology other than SVG. My program adapts MPC techniques from others work and applies it to SVG, with the advantage being it may potentially help advance SVG technologies. Parallel coordinates are applicable in several areas so starting with a simple SVG application may be useful in leading to more advance SVG applications with parallel coordinates.

My Tool

SVG, jQuery, the jQuery SVG extension, HTML, JavaScript, and XML are combined to create SVG Multivariate Graph (SVGMG) an interactive MPC tool. The SVGDom, jQuery, and Javascript parts are all contained in a browser, while it reads the data from another XML file.

Figure 1. SVGMG architecture

SVGMG is designed to handle recoloring interaction, rearranging attributes, and rescaling attributes, which are accomplished by a combination of SVG events on HTML widgets. SVGMG is able to use any MPC datasets by uploading and validating a XML file.

Implementation

Consider the following MPC drawn by SVGMG.

The MPC are a mess of polylines which are not easily distinguishable. To fix this problem, click events on the polylines to recoloring them is one method.

Recoloring the polyline in this case isn't useful as SVG uses a painter's method for drawing its objects. The newer objects are drawn on top of the older objects, therefore recoloring a older polyline may leave it mostly hidden under newer polylines. A solution to this problem is to remove the old polyline and append it as the newest polyline.

function linColor(evt) 
{ 
	var setColor = sColor.getAttribute("fill"); 
	var pNode = evt.target.parentNode; 
	evt.target.parentNode.removeChild(evt.target); 
	evt.target.setAttribute("stroke", setColor); 
	pNode.appendChild(evt.target); 
} 

Thus the desired results are produced.

It is now possible to see a newly recolored polyline.

As for selecting the colors,

function selectColor(evt) 
{ 
	sColor.setAttribute("stroke", "black");
	sColor.setAttribute("stroke-width", "1"); 
	sColor = evt.target; 
	sColor.setAttribute("stroke", "yellow"); 
	sColor.setAttribute("stroke-width", "5"); 
} 

several color boxes are displayed. By clicking on a colored box, the sColor variable is changed. The sColor is then used by polyline coloring methods to get the desired color.

Trying to recolor polylines by clicking on them is great if you are interested in only one polyline, but trying to recolor a polyline group by clicking can be a painful task. A solution to this is HTML widgets in which a user can plug in comparative conditions based on a single attribute and its Y position to recolor a polyline group. Since polylines use a string of points, the next challenge is finding which point is needed for the change condition. As the polyline points string uses space and comma characters to separate the points, this can be used to locate the interest point.

while(c < spot)
{
	startIndex = points.indexOf(" ", startIndex+1);
	c++;
}
startIndex = points.indexOf(",", startIndex+1);

This code is used to find the starting Y point's index. To find Y's end point index, this code is used.

endIndex = points.indexOf(" ", startIndex+1);

Now with the interest point's starting and ending index, the polyline's point value substring is then compared with the widget input and redraws the polyline if it matches the condition. This point finding and checking is tested with all polylines.

Unlike removing and readding a single polyline when clicking, removing and readding a polyline group needs to take a different approach. The DOM is updated as polylines are removed which causes some issues with the Javascript in trying to determine which objects are to be redrawn and removed. Initially I had removed each polyline when there was a match, stored the removed item into a DocumentFragment, and restared the incremental counter to get by the DOM updating problem, however I found a more efficient method later which I employed. Simply using a decremental counter to check all of the polyline items would remove the need to reset the counter. Now we get a MPC which looks like this.

SVGMG allows two attributes to be rearranged. First click on one attribute,

then click on the other attribute

and now the two are swapped, allowing different attributes to be compared with each other.

Code used for this process is similar to group recoloring, as it also uses the same method for finding both points that need to be swapped for each polyline. After finding the two points, the points are swapped and each polyline is updated.