Abstract
The introduction of large-scale clinical testing at the University of Muenster created demand for copious evaluations of the quality of multiple-choice questions (MCQs). In this context a software module for the automation of this reporting procedure was developed using Scalable Vector Graphics (SVG).
Key steps of the Java-based implementation include allocating the relevant data, computing statistical parameters and arranging both with the MCQ contents. Vector graphics are created during the subsequent XSL-transformation and are integrated into print-oriented XSL-FO with XML-namespaces. This monolithic XML-source is then converted to PDF using the FOP library from the Apache XML Graphics Project.
This approach required some initial development effort, but supported a maintainable separation of program logic and output layout. The inclusion of vector graphics facilitated producing integrated reports combining high-quality visualizations and efficient filesizes.
Table of Contents
In 2004 the Medical Faculty at the University of Muenster introduced a modular reorganization of several clinical courses changing the focus of teaching from subject (e.g. radiology) to topic (e.g. heart and vessels). Additionally, testing was centralized using multiple-choice questions (MCQs). In this context a database was developed to support the processes of student registration, content editing, preparation of individual test sheets, and score calculation. The demand for uniformly computed evaluations of the quality of large numbers of test items led to the design of a software module for the automation of this reporting procedure.
What causes shingles? a) The bacterium Bordetella pertussis b) Morbillivirus c) Rubella virus d) Varicella zoster virus e) Human herpesvirus type six (HHV-6)
Figure 1. Example of an one-best-answer test item. This question format is typically used in medical education.
All inferences about the quality of a single item are based on the assumption that the overall score (i.e. the fraction of correct answers) permits the discrimination of the individual candidate 'competence'. Agreement between the scores for a single question and the aggregate student abilities is considered to indicate high-quality items. Students who are more 'competent' in summary are also more likely to answer correctly in these cases. This presumptive correlation is evaluated both statistically and graphically. Deviations from this pattern can point to problems such as ambiguous phrasing.
Key steps of the Java-based implementation include allocating the relevant data with SQL queries, computing statistical parameters and arranging these values with the MCQ contents in an intermediary XML-stream (straightforward format). Vector graphics are created during the subsequent XSL-transformation and are integrated into print-oriented XSL-FO using XML-namespaces. The resulting standard-compliant stream comprises a synopsis of the MCQ contents with the statistical parameters and graphics. In the final step this monolithic XML-source is converted into a PDF-file for printing or electronic transmission via email or HTTP.
Different stylesheets were built in order to produce a variety of output formats (single source publishing). Other targets of the XSL-transformation include CSV (comma separated values) with tabular data for further manual processing, and scatterplots aggregating the parameters of several items (discriminatory power versus difficulty level).
The output files contain for each possible answer bar charts and Tukey's box plots illustrating the sizes and 'competence'-distributions of the relevant subpopulations (i.e. the different candidate groups which were making either the correct choice or were deluded by one of the distractors). The visual representations of differences in 'competence' among these groups are reinforced with color-gradients. This color-encoding in conjunction with a consistent graphical layout intends to provide quickly accessible conclusions about the item quality. The following tables contain sample diagrams and corresponding sourcecode fragments.
The creation of XHTML integrating SVG with JavaScript (see Table 6, “ Production overview and screenshot of alternatively generated XHTML with integrated SVG and Javascript. Javascript allows parts of the scatterplot to become responsive and mutable, so that the user can navigate the different test items in this report. ”) initially had only little widespread effect due to the low availability of SVG-enabled browsers among our users.
<fo:instream-foreign-object> <svg width="68" height="11.5"> <g style="fill:#9999CC; stroke:#000000"> <rect height="10" stroke-width="0.5" x="1" y="1" width="21"/> </g></svg> </fo:instream-foreign-object> <!-- ... --> <fo:instream-foreign-object> <svg width="68" height="11.5"> <g style="fill:#666699; stroke:#000000"> <rect height="10" stroke-width="0.5" x="1" y="1" width="3.599"/> </g></svg> </fo:instream-foreign-object> |
![]() |
Table 1. Sourcecode fragment and screenshot of a diagram representing the distribution of answers of 110 students. The solution 'A)' is highlighted with a brighter color. Apparently a large number of students opted for "D)", so this distractor was possibly not phrased sufficiently fallacious.
<!-- sample chart for distractor 'E)' -->
<svg width="80" height="12">
<defs>
<linearGradient xml:id="55647">
<stop style="stop-color: rgb(0,160,0)" offset="0%"/>
<stop style="stop-color: rgb(160,160,160)" offset="100%"/>
</linearGradient>
</defs>
<g style="stroke:#000000; fill: url(#55647)">
<rect style="fill:black" height="12" width="0.5" stroke-width="0.5" y="0" x="39.5"/>
<rect height="4.75" stroke-width="0.5" width="17.12" x="22.38" y="3.375"/>
</g>
</svg>
|
![]() |
Table 2. Sourcecode fragment and screenshot of a diagram illustrating the distributions of 'relative competences' of the candidate groups deluded by the four wrong answers. For instance, the students who believed 'E)' to be accurate collectively made 1993 of 3401 correct other choices (58,6%). This proportion is lower than in the candidate group chosing the correct answer 'A)' (67,16%), indicating that the distractor 'E)' can likely be seen as a 'relatively incompetent' option. This difference in presumed 'competence' determines the width of the adjacent green bar (the number of candidates defines its height). Conversely, the students opting for 'D)' turned out to be otherwise relatively 'competent', suggesting that 'D)' could reflect a rather informed conviction. This can be a sign of a possible error in this test item and warrants scrutinizing its contents.
<!-- sample boxplot for candidate distribution 'A)' -->
<svg width="320" height="14">
<g style="stroke:#000000; fill:#999999">
<!-- ... -->
<!-- maverick: -->
<circle r="1.5" cy="7" cx="49.599"/>
<!-- whiskers: -->
<line y2="10" y1="4" x2="76.759" x1="76.75999"/>
<line y2="10" y1="4" x2="277.56" x1="277.56"/>
<line y2="7" y1="7" x2="277.56" x1="76.75999"/>
<!-- quartiles: -->
<rect height="10" width="67.040" y="2" x="163.56"/>
<!-- median: -->
<line y2="13" y1="1" x2="192.64" x1="192.64"/>
<!-- mean: -->
<circle style="stroke:#000000; fill:#000000" r="1.5" cy="7" cx="190.32"/>
</g>
</svg> |
![]() |
Table 3. Sourcecode fragment and screenshot of Tukey's boxplots representing the 'competence distributions' in the five different candidate groups, visualizing that the distractor 'E)' was apparently chosen by relatively 'incompetent' students.
<path style="stroke:black;stroke-width:0.02;fill:url(#rocGrad13611)" d="M1 0 L1.0 0.0142 L1.0 0.0428 L0.975 0.0428 L0.926 0.328 ..."/> |
![]() |
Table 4. Sourcecode fragment and screenshot of a diagram based on the principle of 'Receiver-Operating-Characteristics' (ROC). For every possible threshold 'competence' the fraction of correct choices in both candidate groups (either 'relatively competent' or 'relatively incompetent') provides the two coordinates of a continuous ROC-curve of a test item. In the case of a positive item discrimination these curves demonstrate typical 'bulges', which are highlighted using color gradients. The area under the curve (AUC) is interpreted as an indicator of item quality.
<g style="fill:blue;fill-opacity:0.32;stroke:black;stroke-width:.8px;stroke-opacity:0.55;"> <circle xml:id="datapoint_109" cx="45" cy="47.5" r="3.2"/> <circle xml:id="datapoint_110" cx="19" cy="31.5" r="3.2"/> <circle xml:id="datapoint_111" cx="24" cy="30.5" r="3.2"/> <!-- ... --> </g> |
![]() |
Table 5. Sourcecode fragment and screenshot of a scatterplot aggregating two statistical parameters from a set of test items. The representation as transparent 'bubbles' provides a limited impression of the local probability density. At the same time this diagram facilitates spotting the odd one out.
| |
![]() |
Table 6. Production overview and screenshot of alternatively generated XHTML with integrated SVG and Javascript. Javascript allows parts of the scatterplot to become responsive and mutable, so that the user can navigate the different test items in this report.
The software module is based on the FOP library from the Apache XML Graphics Project, invoked for rendering the PDF-files. The combined use of this print formatter, XSL-transformations, and SVG has quickened the expansion of the available functionality. At the same time this approach supported a maintainable separation of program logic and output layout. The inclusion of vector graphics allowed the automatic production of integrated files with efficient filesizes, facilitating a network-based distribution of the reports without compromising image quality.