Bison Internals

GNU/Bison is a LALR parser generator written in C. It can generate C and C++ code that implements LALR and GLR parsers. This project is an effort to document internal data structures of Bison.

Monday, May 15, 2006

Begin main.c

Here is a brief outline of the entry point to bison - the file src/main.c

1. Initialize uniqstrs table by calling uniqstrs_new(); Apparently bison maintains a table of unique strings.
Source: uniqstrs.h, uniqstrs.c

2. Parse the command line arguments by calling getargs(argc,argv). This function uses GNU getopt to parse the command line and sets all the options (requested by user) globally. It also takes up the responsibility to display help info when invoked with --help option and the version info when invoked with -V option.
Source: getargs.h, getargs.c

3. Initialize a "Muscle table" by calling muscle_init() - looks like this is something related to M4 processing at the backend.
source: muscle_tab.h, muscle_tab.c

4. All action begins now. A call to reader() will read the input grammar specification and make an internal representation of the grammar.
source: reader.h, reader.c

5. A call to reduce_grammar() will get rid of useless Non-terminals and productions.
source: reduce.h, reduce.c

6. Record "other info" about grammar:
derives_compute(): umm, I think it is computing the derieved non-terminals for each pair of non-terminal symbols - this is useful for computing gotos as explained in sec 4.7 (under Efficient construction of LALR parsing tables) in the "Dragon book" by Aho, Ullman and Sethi;)
source: derives.h, derives.c
nullable_compute(): Calculate the set of non-terminals that can expand to the null string. This is also expalined in the same section in the book.
source: nullable.h, nullable.c

7. Generate parser states (the LR(0) items) by calling generate_states(). A state is represented by a very elobarate data structure found in state.h. This fucntion will move the dot over various symbols in the grammar productions and find all the LR0 items (which actually represent states of an NFA that recognizes the viable prefixes of the grammar). I guess only the "Kernel" items are generated.
source: LR0.h, LR0.c

8. The generated LR0 items form a NFA; we have to convert it into a DFA by subset construction (or sets of items) - This is done by calling lalr(); The computation of lookahead tokens is also done here (see algorithm 4.12 of dragon book) and then my guess is alogrithm 4.13 follows (Efficient computation of Kernels of LALR(1) collection of sets of items)
source: lalr.h, lalr.c

9. Now any Shift/reduce or reduce/reduce conflicts can be solved using precedence info supplied by grammar-writer. This is done by calling conflicts_solve() and conflicts_print() [to report conflicts to user]
source: conflicts.h, conflicts.c

10. The parser tables are now generated from the DFA created above. A call to tables_generate() will achieve this.
source: tables.h, tables.c

11. Here there is also a call to grammar_rules_never_reduced_report() which will report rules that will never be reduced because of conflicts
source: gram.h, gram.c

12. compute_output_file_names() - computes o/p file names. If your input file is parser.y then the ouput will be parser.tab.c
source: files.h, files.c

13. Here there are a couple of decisions - If you asked for a detailed report (--verbose) then a call to print_results() will print out the results to parser.output file.
If you also asked for a graph (VCG) a call to print_graph() will achieve that.
sources: print.h, print.c, print_graph.h, print_graph.c

14. Now it is time output the actual parser.tab.c file; the computed tables are used to output the parser file. A function call to output() will do the job.
source: output.h, output.c

15. The next task is to free up all the memory allocated by bison. So various _free functions do this job here.
source: each free fucntion is found in the corresponding c file; e.g., lalr_free() is found in lalr.c

That is a moderately simplified description of main.c. I skipped little parts where it checks for errors from various steps;

Based on these steps, bison can be split into the front-end, core and back-end; The steps 1-4 comprise the front-end, 5-11 the core and the rest is the backend; The backend is also resposible for generating a C++ parser instead of a C parser if requested by the user.

Whoever said opensource software is not organized well is 100% wrong! that is a really well organized structure; dont you think!

0 Comments:

Post a Comment

<< Home