Friday, October 19, 2012

Using the ANTLR C++ Grammar for C Target

From the ANTLR site:
ANTLR is a language tool that provides a framework constructing recognisers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.
Unfortunately, the grammars publicly listed at the site don't normally come with an easy to follow set of instructions. This post will hopefully provide such help for the C++ grammar for C Target.

First, you will need to install the ANTLR C runtime library, only then can you use the ANTLR C++ grammar files.

The ANTLR C Runtime Library

An alternate set of instructions can be found at the ANTLR site.
  1. Download the library  (v3.4 at the time of this post)
    wget http://www.antlr.org/download/C/libantlr3c-3.4.tar.gz
  2. Untar:
    tar -zxvf libantlr3c-3.4.tar.gz
  3. Configure:
    ./configure --enable-64bit
    (Note that you need to use the --enable-64bit flag if your machine is capable, otherwise you'll run into problems later on)
  4. Compile:
    make
  5. Fix any compilation errors. For example, if you encountered the error "/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory", then a quick Google search would lead to this stack overflow question, which will suggest installing glibc-devel.i386 via yum if your machine is running CentOS 5.8
  6. Repeat steps 3-5 until there are no compilation errors
  7. Install:
    sudo make install
  8. Export the library path:
    export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
    
    

The ANTLR C++ Grammar

  1. Download the ANTLR java library. You will need this to process the .g grammar file.
    wget http://www.antlr.org/download/antlr-3.4-complete.jar
  2. Download the grammar file
    wget http://www.antlr.org/grammar/1295920686207/antlr3.2_cpp_parser4.1.0.zip
  3. Extract files:
    unzip antlr3.2_cpp_parser4.1.0.zip
  4. Process the .g grammar file:
    • java -jar /path/to/antlr-3.4-complete.jar CPP_grammar_.g
    • Rename the resulting code files to C++
      mv CPP_grammar_Lexer.c CPP_grammar_Lexer.cpp
      mv CPP_grammar_Parser.c CPP_grammar_Parser.cpp
  5. Edit CPP_grammar_Parser.cpp and comment out  line 29639 (there is an extra ');' present) that will prohibit the file from compiling
  6. Edit cpp_full_prog.cpp at line 239 and change
    input = antlr3AsciiFileStreamNew(fName);
    with:
    input = antlr3FileStreamNew(fName, 4);
    The integer 4 refers to the constant ANTLR3_ENC_8BIT found in antlr3defs.h (in the C runtime library installed in the first section)
  7. Compile and link:
    g++ -o cpp_full_prog cpp_full_prog.cpp CPP_grammar_Lexer.cpp CPP_grammar_Parser.cpp Helper/*.cpp -I/usr/local/include/ -I./antlr_include/ -L/usr/local/lib/ -lantlr3c
For more information on how to massage information out of the lexer, tokens, or parser, you can refer to the C runtime library API

Essentially, the ANTLR C runtime library works by defining a series of structs with dynamically assigned function pointers in each struct. This way, each struct can be assigned with the functions that should be used for that struct. This however, makes for a little weird looking function calls like "currToken->getType(currToken)".

No comments:

Post a Comment