Abstract
Context free grammars are usually described in Backus Naur Form (BNF) and are a common part of computer programming language specifications. Visual representations of CFG productions are sometimes used in documentation to give a high-level overview of the different parts of a language that is easy to understand, yet comprehensive, in that it describes the full power of the language. There is a standard flow graph notation for visualizing grammar productions. This flow graph notation is very similar to a finite state machine graph with one start and one final state. Examples of the use of this type of visualization include the Oracle SQL language documentation [19] and the ANTLRworks IDE for the ANTLR compiler compiler language [5]. The World Wide Web consortium (W3C) uses a standard notation for the BNF grammar included in all of its language specifications documents.
This paper describes an application for CFG visualization SVG. Grammar productions are parsed using ANTLR and an SVG graph is generated using Java and the Velocity template language. The grammar visualization tool is used to present an augmented view of W3C specifications. This augmented view displays a W3C specification side-by-side with a visual representation of each grammar production of the specification's language. This application works within a web browser and uses Javascript to bring SVG to life by adding interactivity and animations. The user can move between grammar productions by clicking on the production references that appear in the current flow graph, or by selecting an item from a full list of grammar productions. The W3C specification document is kept in sync with the currently displayed graph, so that the user can read the textual description and related documentation from the specficiation document while looking at the production graph.
Context free grammars are used to describe formal languages in computer science. The most common format for Context Free Grammar representation is the Backus-Naur Form or BNF for short. A BNF consists of a set of production rules - each rule describes a single part of the language. Productions usually follow a hierarchical structure with a single root production at the top, which captures the entire language. A hypothetical BNF description of a natural language might consist of production rules for paragraphs, sentences and various phrases. A more practical example of a BNF production for a street address might look like this:
address ::= street_name number city state zip_code
In this example the pφroduction's rule name is address, and the part on the right side of the ::= symbol is the production's value, which in turn, in this case is a sequence of other rules, which are defined separately. Each of those rules are non-terminals. A production can refer to other non-terminals and terminals. Terminals represent a character, a sequence of characters or a simple character pattern, which can be represented as a regular expression.
Grammar productions are an essential part of the description of any programming language and are usually included in programming language specifications. Productions can be also represented in a more visual way in the form of a diagram, and such diagrams are often included in programming language documentation. The online docuemntation for the Oracle database product for example includes such visual diagrams for each element of the Oracle SQL programming language.
Grammar production graphs are a very useful documentation aid as demonstrated in the above example. They represent and intuitive, easy-to-understand view of what is otherwise very dry, terse, technical material.
The XML Language Specification, issued by the World Wide Web Consortium (W3C) represents a complete description of the XML language. A core part of the specification is the BNF grammar definition of the XMl language. The entire specifications document is structured around the various productions of the BNF grammar, with productions separated into groups, which appear in sections alongside detailed descriptions of the meaning of each element represented by each production rule. While the rules are listed in their BNF form, the graphical form of productions is not included. This paper describes a project, whose aim is to provide an enhanced view of the XML (and other) language specification documents from the W3C. This enhanced view inlcudes a navigable graphical representation of the BNF productions and appears alongisde the original specifications document. The graphs of all the rules from an individual section of the document are displayed alongisde that section of the original document and the user is able to navigate through the specification by clicking on references to other production rules, and observing a synced-up view of the main specifications document and the current production graph.
Grammar visualization application screenshot.
Figure 2. Grammar Visualization Application Screenshot
The screenshot above shows a full view of the application window, which includes the annotated version of the W3C XML Language Specification displayed in a browser window. The graphical represnetation of the rules from the current section of the specifications document appears in the top part of the window with the original specifications document below it. The top section of the window includes a drop-down box, which can be used to navigate between the sections of document and display a particular section of the specification alongisde its productions' graphs.
The production graphs are visualized using SVG - an open web standard for vector graphics on the web. SVG is an excellent choice for a web application such as this, due to its good integration with HTML and JavaScript. SVG code can be directly embedded inside HTML and its structure is fully exposed for programmatic manipulation using JavaScript - the standard language for web programming inside the browser.
In order to produce a graphical representation of a BNF production, the frst step was to obtain a structured representation of a production. The format of BNF productions as used in the XML Language Specification is fully described in Section 6 of the XML Language Specification document itself. Using this information as reference a BNF grammar was written for XML symbol productions using ANTLR. ANTLR is a Java compiler compiler tool, which is used for generating language parsers based on a BNF language description. The parsers are generated in the Java language and ANTLR itself is written completely in Java. ANTLR also provides an IDE called ANTLRWorks, for developing programs in the ANTLR language. ANTLRWorks greatly simplifies and speeds up the development of ANTLR grammar, by providing a syntax-hightlighted editor with Ctrl-Click style navigation between rules, a step-by-step parser debugger and a visual representation of the tokenized input and the output Abstract Syntax Tree (AST). AST is the structured representation of the input to a grammar, which in this case is a production rule. This structured tree can be used to generate the SVG graphical representation of the production. An example of an AST as displayed in ANTLRWorks is included in the screenshot below:
Once the grammar for XML production rules was developed, the next step was to use the AST produced by parsing a rule using ANTLR to produce the appropriate SVG image. The production visualization format is standard and well-established with numerous example implementations such as in the Oracle documentation example shown above. The ANTLRWorks IDE itself includes a feature to display the graphical representation of grammar production rules. The ANTLRWorks format of visual representation was used as a reference point in the development of the current application. All of our visualization primitives were based entirely on the ANTLRWorks grammar visualization model.
Grammar productions consist of only a handful of basic operations, each of which requires a distinct form of visual representation. All grammar productions are built using combiantions of these basic operators. As a first step a set of templates, one for each basic operator, were produced. These templates consisted of static SVG code and are shown and described here.
This is the core of the root of each production - it includes the production name and the right arrow which indicates the the end of the production.
The most basic building block, a sequence of sub-expressions is represented as a concatenated list of boxes. There are no connecting arrows here - all the necessary connectors are drawn as part of the leaf unit expressions (see below).
The sub-expression (represented as a dotted outline box) is optional and can be skipped (can occur zero or one times) in the input.
The sub-expression (represented as a dotted outline box) can occur one or more times in the input.
The sub-expression (represented as a dotted outline box) can occur zero or more times in the input.
Either one of the subexpression can appear and will be matched. All the sub-expressions are mutually exclusive. This template is designed to be able to seamlessly handle a choice with a single option. This special case occurs in almost every production due to the way the AST is constructed by ANTLR.
This is the first of the two leaf unit expressions. This tempalte represents a reference to another production. The given production (by its name) is matched here.
Once the basic building units of a grammar production graph were designed, a simple tree walker was implemented using the Velocity Template Language (VTL). VTL is a template language processor implemented in Java. It was originally designed with the primary goal of web application development, but is also available as a standalone library for use in any Java application. Velocity is a simple but powerful scripting language, which is very useful for the purpose of SVG generation because it eliminates the need to embed SVG code inside Java, which would be a cumbersome effort and result in hard to understand and unmaintainable code. Instead, VTL allows code to be embedded directly inside the SVG files described above, thus bringing the static SVG files to life in a way. Besides one velocity template for each of the above SVG templates, only one additional velocity template is required, called eval_node. This template acts as a dispatcher of sorts, delegating the processing of each node of the AST to the correct template based on the AST node's type. Two sample SVG production gpraphs from the XML Specification is included below:
The next step in the implementation of the application was the extraction of all of the grammar production rules from the XML Specification web page. Upon examination it was discovered that all of the rules of the XML Specification follow an format with a common structure and appear inside HTML tables with a common structure. This alllowed an easy implementation of the extraction process by the use of a small set of simple regular expressions. In addtion to the grammar productions themselves, grouping information was also extracted and the expressions were stored inside an in-mmeroy list of javabeans, including all the necessary information (HTML anchor names) to locate each production rule inside the HTML document at runtime by JavaScript code running inside the web browser.
The final step of the project was the implementation of an integrated XML Specifications viewer which would present the user with a synchronized view of the SVG production graphs and the source XML Specification document from the W3C. This augmented view of the online specification is served from a standard Jav Enterprise Web Server where our application is deployed as a WAR file. The web application consists of an index page, which sets the layout of the augmented XML specification view and includes both the production graph in the top pane and the XML Specifications document loaded directly from the W3C web site. The individual production graphs are loaded dynamically inside the browser using AJAX technology without refreshing the web page inside the browser.
ANTLRWorks was used for the development and debugging of the ANTLR grammar. For the rest of the application, the Eclipse IDE was used, with the Veloeclipse plugin for editing Velocity Template files and the Java EE tools for the development of the web application servlet, HTML and JavaScript content. The application was deployed and tested on version 6 of the Apache Tomcat Web Server. The Eclipse Java EE tools provide an easy-to-use environment which greatly simplifies the development of Java EE applications, with one-click packaging and deployment of an application to the web container with full debugging support.