Next: Results Up: Implementation of the Tutorial Previous: The Server (CGI) Side

Using Lex for CGI Scripts

The first issue to address when writing CGI scripts is how to pre-process the input. HTTP specifies an encoding of the data that is not naturally amenable to writing regular expressions. For example, space characters are replaced by the plus sign (`+'). Many of the special characters including plus (`+'), the angle brackets (`<', `>') and the ampersand (`&') are replaced by special character sequences. While there is nothing in this encoding that would give technical difficulty to Lex, it is not comfortable for the human author of the regular expressions.

To deal with this, we pre-process the characters on their way into the scanner Lex generates. We do so by redefining a macro that reads characters and places them into an array which is scanned by the state machine created by Lex. The new macro is shown below.

#define YY_INPUT(buf,result,max_size) \
   { int digit ; \
     int c = getc( yyin ) ; \
     result = c == EOF ? 0 : 1 ; \
     if( c == '%' ) { \
        c = getc( yyin ) ; \
        digit = c >= 'A' ? (( c & 0xdf )
              - 'A' ) + 10 \
           : c - '0' ; \
        digit *= 16 ; \
        c = getc( yyin ) ; \
        digit += c >= 'A' ? (( c & 0xdf )
              - 'A' ) + 10 \
           : c - '0' ; \
        buf[0] = (char) digit ; \
     } \
     else if( c == '+' ) \
        buf[0] = ' ' ; \
     else \
        buf[0] = (char) c ; \

The basic motivation behind using Lex for the CGI scripts is that we can express both the right answers, and in some cases some wrong answers, with regular expressions. The code fragment:

WS	[ \t]*
ARG1	{WS}target{WS}
ARG2	{WS}names{WS}\[{WS}n{WS}\}{WS}
ARGS	{ARG1},{ARG2}|{ARG2},{ARG1}
COND1	{WS}strcmp{WS}\({ARGS}\){WS}=={WS}0{WS}
COND2	{WS}0{WS}=={WS}strcmp{WS}\({ARGS}\){WS}
COND3	{WS}!{WS}strcmp{WS}\({ARGS}\){WS}
answer1={WS}if{WS}\({COND}\){WS}	{
      send_file( "ans5-7.right" ) ;
      return( 0 ) ;
[^\r]*			{
      send_file( "ans5-7.wrong" ) ;
      return( 0 ) ;
shows the Lex code that looks for the expected answer of:
if( strcmp( target, names[n] ) == 0 )
and most likely correct variations. The strategy here is to break the expected answer down into its components and define regular expressions for all valid versions of each component. For example, here we have an if statement with some condition inside the parentheses. We'll allow any amount of whitespace around the if and around the parentheses. The condition itself might be expressed either as strcmp( ... ) == 0 or as 0 == strcmp( .... ). It might also be !strcmp( ... ) (though the students haven't seen that particular idiom and would be unlikely to use it). So we allow the condition to be either COND1, COND2 or COND3 where each describes one of the options. A similar organization is used for the arguments of the function strcmp().

If we recognize the string we're looking for (in any of its acceptable variations), we'll copy the file ans5-7.right out to the client. This file is an HTML file that contains a message that the answer was right and a link to the next part of the tutorial. If any other string was seen, we send the file ans5-7.wrong back to the client.

For a one-of-many multiple choice question, the Lex code is much simpler. Here we just identify which of a few fixed strings is being sent and then respond with the appropriate file. This is illustrated in this example.

"answer5-5=A"\&?	{
   send_file( "ans5-5.wronga" ) ; return( 0 ) ;
"answer5-5=B"\&?	{
   send_file( "ans5-5.wrongb" ) ; return( 0 ) ;
"answer5-5=C"\&?	{
   send_file( "ans5-5.wrongc" ) ; return( 0 ) ;
"answer5-5=D"\&?	{
   send_file( "ans5-5.wrongd" ) ; return( 0 ) ;
"answer5-5=E"\&?	{
   send_file( "ans5-5.right" ) ; return( 0 ) ;

The many-of-many multiple choice question is more interesting. For this question, there are many combinations of right and wrong selections and we would like to send a response appropriate to each one. If we create an HTML file for each possibility, there will be many cases of portions begin repeated in a large subset of the files. Instead, we generate the response on the fly by chosing to include or exclude each of several responses that address what's wrong with one of the selections. If we have five possible answers, we will have 32 possible response pages that can be generated. Lex code that does this processing is exemplified in Figure 5. (This code fragment has been abbreviated showing only three of the five response components in order to make it fit on the page.)



Figure 5:  Example Lex Code for Scanning a Many-of-Many Multiple Choice Question

Next: Results Up: Implementation of the Tutorial Previous: The Server (CGI) Side

Brian L. Stuart
Mon Aug 12 16:20:19 CDT 1996