The Dragon Parser Generator

2 Getting started
Back to TOC

To get into the flow of dragon, we start with a small sample grammar description which can be feeded directly into dragon. The grammar given below can be created using any simple text editor.

@ Sample Dragon description 
@ 
HEADER 
IGNORETOKEN STRINGVAL
SEPIGNORE '\t' 
SEPIGNORE ' '
END

TOKENSET 
a  :  a
b  :  b
c  :  c    
END 

PRODUCTIONSET
Start	:	S a  ; A1
S	:	C STRINGVAL C ; A2
C	:	b C ; A5
C	:	c ; A3
END

As we look to the grammar description more closely, the file consists of three sections.

At the beginning, there is a header section beginning with a HEADER-Token and ending with the first END-Token. Any lines above the HEADER-Token line are ignored and can be used for comments.

Inside the header section, some general rules are declared for the parsing procedure. In our example, two whitespace separator tokens are declared (Space and Tab). This declaration tells dragon scanner to treat these tokens just as token separators and don't return them to the dragon parser. In general, you can use any kind of characters for token separators, but in most cases, whitespace characters like SPACE and TAB seem to be appropriate.

The next section beginning with the keyword TOKENSET describes the set of lexical elements the dragon scanner should detect. Each line inside this section contains a regular expression describing the token detection rule and a corresponding token, which is returned by the scanner on detection. In the sample above, we declare three tokens a, b and c which are returned on the corresponding regular expressions as defined above. Note, that the notation of the regular expressions is very similar to BNF ( Backus Naur Form) and is described more detailed later on.

At last, the language grammar is described in the section opened with the keyword PRODUCTIONSET. Each line consists of the production name, a first separator ( ':') , the production defintion, a second separator ( ';') and the semantic action. The second separator and the semantic action are optional and can be omitted. As we look to the production definition, we see a sequence of known token symbols and other known production names building the production.

2.1 Java code generation

Based on this grammar defintion, we will generate now the parser code using the dragon generator tool. For this, cut the code written above and paste it in a file called G1.def. From a Unix shell, invoke now dragon with

game > dragon -t java -p G1 
	Reading Header ...
	Reading Terminals ...
	Reading Productions ...
	Checking Productions ...
	Analysing ...
	Starting LALR analysis ...
	Reached LR(0) elements: 
	1...3...4...5...
	Lookahead calculation ...
	X...X...
	Closure creation ...
	.....
	Creating syntax table ...
	Reached LR(1) elements: 
	0...5
	Generating Code ...

This produces the file G1.java with a class description for the parser class G1. For a complete parser application, this class must be derived and some methods have to implemented.

class G1Main extends G1
{
	String input = new String (c c a);
	int i=0;

	G1Main()
	{

	}
	public char nextChar() 
	{  
		if ( i < input.length() )
		{
			char c = input.charAt(i);
			i++;
			return c;
		}
		else
			return 0;
	};

	public void backChar() {}
	public static void main(String[] args)  
	{
		G1Main g1 = new G1Main();
		try {	
			g1.parse();

		} catch ( Exception e ) {
			e.printStackTrace();
		}
	}
}

2.2 C++ code generation

To get C++ parser code, you have to ommit the -j option during for the dragon generation call.

$ dragon -t c++ -p G1
Reading Header ...
Reading Terminals ...
Reading Productions ...
Checking Productions ...
Analysing ...
Reached LR(1) elements: 
1...5...10...11...
Generating Code ..

This produces the following files

G1.h
G1.cc

To complete the code for the compiler, you still have to add a main routine, scanner feed routines and some semantic actions. This can be done with the following program

//
// g1main.cc
// ---------
// Testmodul for dragon
//
// by Bjoern Lemke, (C)opyright 2003 by Bjoern Lemke

#include Chain.h
#include G1.h
#include Exception.h

class G1Main  : public G1 {
public:
    G1Main(char *pC) : G1()  {  _pC = pC; _i = 0;  }
    char nextChar() {
        if (_pC[_i])
        {
            _i++;
            return (_pC[_i-1]);
        }
        return 0;
    }
    void backChar() {  _i--;  return;  }
    void A1() { cout << A1 is called << endl; }
    void A2() { cout << A2 is called << endl; }
private:
    int _i;
    char *_pC;  
};

main(int argc, char** argv)
{
    G1Main g(argv[1]);

    try {
      if (g.parse())
        cout << Parse OK << endl;
      else
        cout << Parse Failed << endl;
    }
    catch (Exception e)
    {
        cout << e.asString() << endl;
    }
}

Now you can compile and assemble the modules to an executable

$ g++ -I../base/ -c g1main.cc
$ g++ -I ../base /-c G1.cc
$ g++ -o g1 g1main.o G1.o ../base/libBase.a

Please note : The libBase library is available as opensource on www.lemke-it.com

2.3 GoLang code generation

bigmac > dragon -t golang -p G1 
Reading Header ...
Reading Terminals ...
Reading Productions ...
Checking Productions ...
Analysing ...
Starting LALR analysis ...
Calculating LR(0) elements ... 
Lookahead calculation ...
Closure creation ...
Creating syntax table ...

This produces the file G1.go with a parser implementation. For the semantic action we implement the main module checkParser.go

package main

import "fmt"
import "os"
import "G1"

type MyAction struct {
   feed string
   i int
   tl []string
   isReserved bool
   reservedToken G1.Token
}
 
func ( ma *MyAction ) A1() {
        fmt.Printf("Action A1 called ...\n")
	fmt.Printf("TL Size = %d\n", len(ma.tl))
}
 
func ( ma *MyAction ) A2() {
        fmt.Printf("Action A2 called ...\n")
}

func ( ma *MyAction ) A3() {
        fmt.Printf("Action A3 called ...\n")
}

func ( ma *MyAction ) A5( ) {
        fmt.Printf("Action A5 called ...\n")

	fmt.Printf("TL Size = %d\n", len(ma.tl))
	for i, t := range ma.tl {
	   fmt.Printf("At pos %d, Token=%s\n", i, t)
	}
}

func ( ma *MyAction ) SetTokenList( tl []string ) {
        ma.tl = tl
}

func ( ma *MyAction ) NextChar() byte {
 
   if ma.i < len(ma.feed) {
      fmt.Printf("Next Char = %c\n", ma.feed[ma.i])

      c := ma.feed[ma.i]
      ma.i++

      return c 
   }
   // returning zero indicates end of scanner stream
   return 0
}
 
func ( ma *MyAction ) BackChar() {
     fmt.Println("BackChar called")
     ma.i--
}

func ( ma *MyAction ) IsReserved() bool {

     // for a simple scanner, this method can return false
     return false
}

func ( ma *MyAction ) GetAndResetReserved() G1.Token {

     // since IsReserved never returns true, this method is never called
     // so we can return any token value
     return G1.Token_ENDTOKEN
}

func main() {
   if len(os.Args) < 2 {
      fmt.Printf("Argument missing\n")
   } else {
 
      fmt.Printf("Testing parser with %s %d\n", os.Args[1], len(os.Args))
 
      ma :=  MyAction { i: 0, feed: os.Args[1], isReserved: false }
      p := G7.NewParser( &ma ) // { feed: os.Args[1] } ) )
      if p.Parse() {
         fmt.Println("Parse OK")
      } else {
         fmt.Printf("Parse Failed : %s\n", p.Error)
      }
   }
}