bisonc++

Langue: en

Autres versions - même langue

Version: 332443 (ubuntu - 24/10/10)

Section: 1 (Commandes utilisateur)

NAME

bisonc++ - Generate a C++ parser class and parsing function

SYNOPSIS

bisonc++ [OPTIONS] grammar-file

DESCRIPTION

The program bisonc++ is based on previous work on bison by Alain Coetmeur (coetmeur@icdc.fr), who created in the early '90s a C++ class encapsulating the yyparse function as generated by the GNU-bison parser generator.

Initial versions of bisonc++ (up to version 0.92) wrapped Alain's program in a program offering a more modern user-interface, removing all old-style (C) %define directives from bison++'s input specification file (see below for an in-depth discussion of the differences between bison++ and bisonc++). Starting with version 0.98, bisonc++ is compiled from a complete rebuilt of the parser generator, closely following the description of Aho, Sethi and Ullman's Dragon Book. Moreover, starting with version 0.98 bisonc++ is now a C++ program, rather than a C program generating C++ code.

Bisonc++ expands the concepts initially implemented in bison and bison++, offering a cleaner setup of the generated parser class. The parser class is derived from a base-class, mainly containing the parser's token- and type-definitions as well as several member functions which should not be (re)defined by the programmer.

Most of these base-class members might also be defined directly in the parser class, but were defined in the parser's base-class. This design results in a very lean parser class, declaring only members that are actually defined by the programmer or that must be defined by bisonc++ itself (e.g., the member function parse as well as those support functions requiring access to facilities that are only available in the parser class itself, rather than in the parser's base class).

Moreover, this design does not require the use of virtual members: the members which are not involved in the actual parsing process may always be (re)implemented directly by the programmer. Thus there is no need to apply or define virtual member functions.

In fact, there are only two public members in the parser class generated by bisonc++: setDebug (see below) and parse. Remaining members are private, and those that can be redefined by the programmer using bisonc++ usually receive initial, very simple default in-line implementations. The (partial) exception to this rule is the member function lex, producing the next lexical token. For lex either a standardized interface or a mere declaration is offerered (requiring the programmer to provide a tailor-made implementation for lex).

To enforce a primitive namespace, bison used a well-known naming-convention: all its public symbols started with yy or YY. Bison++ followed bison in this respect, even though a class by itself offers enough protection of its identifiers. Consequently, the present author feels that these yy and YY conventions are outdated, and consequently bisonc++ does not generate any symbols defined in either the parser (base) class or in the parser function starting with yy or YY. Instead, all data members have names, following a suggestion by Lakos (2001), starting with d_, and all static data members have names starting with s_. This convention was not introduced to enforce identifier protection, but to clarify the storage type of variables. Other (local) symbols lack specific prefixes. Furthermore, bisonc++ allows its users to define the parser class in a particular namespace of their own choice.

Bisonc++ should be used as follows:

o
As usual, a grammar must be defined. Using bisonc++ this is no different, and the reader is referred to bison's documentation for details about specifying and decorating grammars.
o
The number and function of the various %define declarations as used by bison++, however, is greatly modified. Actually, all %define declarations are replaced by their (former) first arguments. Furthermore, `macro-style' declarations are no longer supported or required. Finally, all directives use lower-case characters only and do not contain underscore characters (but sometimes hyphens). E.g., %define DEBUG is now declared as %debug; %define LSP_NEEDED is now declared as %lsp-needed (note the hyphen).
o
As noted, no `macro style' %define declarations are required anymore. Instead, the normal practice of defining class members in source files and declaring them in a class header files can be adhered to using bisonc++. Basically, bisonc++ concentrates on its main tasks: the definition of an initial parser class and the implementation of its parsing function int parse, leaving all other parts of the parser class' definition to the programmer.
o
Having specified the grammar and (usually) some directives bisonc++ is able to generate files defining the parser class and the implementation of the member function parse and its support functions. See the next section for details about the various files that may be written by bisonc++.
o
All members (except for the member parse) and its support functions must be implemented by the programmer. Of course, additional member functions should also be declared in the parser class' header. At the very least the member int lex() must be implemented (although a standardized implementation can also be generated by bisonc++). The member lex is called by parse (support functions) to obtain the next available token. The member function void error(char const *msg) may also be re-implemented by the programmer, but a basic in-line implementation is provided by default. The member function error is called when parse detects (syntactic) errors.
o
The parser can now be used in a program. A very simple example would be:
 
     int main()
     {
         Parser parser;
         return parser.parse();
     }
         
 

GENERATED FILES

Bisonc++ may create the following files:

o
A file containing the implementation of the member function parse and its support functions. The member parse is a public member that can be called to parse a token-sequence according to a specified LALR1 type grammar. The implementations of these members is by default written on the file parse.cc. There should be no need for the programmer to alter the contents of this file, as its contents change whenever the grammar is modified. Hence it is rewritten by default. The option --no-parse-member may be specified to prevent this file from being (re)written. In normal circumstances, however, this option should be avoided.
o
A file containing an initial setup of the parser class, containing the declaration of the public member parse and of its (private) support members. The members error and print receive default in-line implementations which may be altered by the programmer. The member lex may receive a standard in-line implementation (see below), or it will merely be declared, in which case the programmer must provide an implementation. Furthermore, new members may be added to the parser class as well. By default this file will only be created if not yet existing, using the filename <parser-class>.h (where <parser-class> is the the name of the defined parser class). The option --force-class-header may be used to (re)write this file, even if already existing.
o
A file containing the parser class' base class. This base class should not be modified by the programmer. It contains types defined by bisonc++, as well as several (protected) data members and member functions, which should not be redefined by the programmer. All symbolic parser terminal tokens are defined in this class, so it escalates these definitions in a separate class (cf. Lakos, (2001)), thus preventing circular dependencies between the lexical scanner and the parser (circular dependencies occur in situations where the parser needs access to the lexical scanner class to define a lexical scanner as one of its data members, whereas the lexical scanner, in turn, needs access to the parser class to know about the grammar's symbolic terminal tokens. Escalation is a way out of such circular dependencies). By default this file is (re)written any time bisonc++ is called, using the filename <parser-class>base.h. The option --no-baseclass-header may be specified to prevent the base class header file from being (re)written. In normal circumstances, however, this option should be avoided.
o
A file containing an implementation header. An implementation header may be included by source files implementing the various member functions of a class. The implementation header first includes its associated class header file, followed by any directives (formerly defined in the %{header ... %} section of the bison++ parser specification file) that are required for the proper compilation of these member functions. The implementation header is included by the file defining parse. By default the implementation header is created if not yet existing, receiving the filename <parser-class>.ih. The option --force-implementation-header may be used to (re)write this file, even if already existing.
o
A verbose description of the generated parser. This file is comparable to the verbose ouput file originally generated by bison++. It is generated when the option --verbose or -V is provided. When generated, it will use the filename <grammar>.output, where <grammar> is the name of the file containing the grammar definition.

OPTIONS

If available, single letter options are listed between parentheses following their associated long-option variants. Single letter options require arguments if their associated long options require arguments as well. Options affecting the class header or implementation header file are ignored if these files already exist.
o
--analyze-only (-A)
Only analyze the grammar. No files are (re)written. This option can be used to test the grammatic correctness of modification `in situ', without overwriting previously generated files. If the grammar contains syntactic errors only syntax analysis is performed.
o
--baseclass-preinclude=header (-H)
Use header as the pathname to the file preincluded in the parser's base-class header. This option is useful in situations where the base class header file refers to types which might not yet be known. E.g., with %union a std::string * field might be used. Since the class std::string might not yet be known to the compiler once it processes the base class header file we need a way to inform the compiler about these classes and types. The suggested procedure is to use a pre-include header file declaring the required types. By default header is surrounded by double quotes (using, e.g., #include "header"). When the argument is surrounded by pointed brackets #include <header> is included. In the latter case, quotes might be required to escape interpretation by the shell (e.g., using -H '<header>').
o
--baseclass-header=header (-b)
Use header as the pathname of the file containing the parser's base class. This class defines, e.g., the parser's symbolic tokens. Defaults to the name of the parser class plus the suffix base.h. It is generated, unless otherwise indicated (see --no-baseclass-header and --dont-rewrite-baseclass-header below).
o
--baseclass-skeleton=skeleton (-B)
Use skeleton as the pathname of the file containing the skeleton of the parser's base class. Its filename defaults to bisonc++base.h.
o
--class-header=header (-c)
Use header as the pathname of the file containing the parser class. Defaults to the name of the parser class plus the suffix .h
o
--class-skeleton=skeleton (-C)
Use skeleton as the pathname of the file containing the skeleton of the parser class. Its filename defaults to bisonc++.h. The environment variable BISON_SIMPLE_H is not inspected anymore.
o
--construction
This option may be specified to write details about the construction of the parsing tables to the standard output stream. This information is primarily useful for developers, and augments the information written to the verbose grammar output file, produced by the --verbose option.
o
--debug
Provide parse and its support functions with debugging code, showing the actual parsing process on the standard output stream. When included, the debugging output is active by default, but its activity may be controlled using the setDebug(bool on-off) member. Note that no #ifdef DEBUG macros are used anymore. By rerunning bisonc++ without the --debug option an equivalent parser is generated not containing the debugging code.
o
--error-verbose
When a syntactic error is reported, the generated parse function will dump the parser's state stack to the standard output stream. The stack dump shows on separate lines a stack index followed by the state stored at the indicated stack element. The first stack element is the stack's top element.
o
--filenames=filename (-f)
Specify a filename to use for all files produced by bisonc++. Specific options overriding particular filenames are also available (which then, in turn, overide the name specified by this option).
o
--force-class-header
By default the generated class header is not overwritten once it has been created. This option can be used to force the (re)writing of the file containing the parser's class.
o
--force-implementation-header
By default the generated implementation header is not overwritten once it has been created. This option can be used to force the (re)writing of the implementation header file.
o
--help (-h)
Write basic usage information to the standard output stream and terminate.
o
--implementation-header=header (-i)
Use header as the pathname of the file containing the implementation header. Defaults to the name of the generated parser class plus the suffix .ih. The implementation header should contain all directives and declarations only used by the implementations of the parser's member functions. It is the only header file that is included by the source file containing parse's implementation . User defined implementation of other class members may use the same convention, thus concentrating all directives and declarations that are required for the compilation of other source files belonging to the parser class in one header file.
o
--implementation-skeleton=skeleton (-I)
Use skeleton as the pathname of the file containing the skeleton of the implementation header. Its filename defaults to bisonc++.ih.
o
--include-only
All grammar files are concatenated to the standard output stream in their order of processing. Following this, bisonc++ terminates.
o
--insert-stype
This option is only effective if the debug option (or %debug directive) has also been specified. When insert-stype has been specified the parsing function's debug output will also show selected semantic values. It should only be specified if objects or variables of the semantic value type STYPE__ can be inserted into ostreams.
o
--lines (-l)
Put #line preprocessor directives in the file containing the parser's parse function. By including this option the compiler and debuggers will associate errors with lines in your grammar specification file, rather than with the source file containing the parse function itself.
o
--max-inclusion-depth=value
Set the maximum number of nested grammar files. Defaults to 10.
o
--namespace=namespace (-n)
Define the parser base class, the paser class and the parser implentations in the namespace namespace. By default no namespace is defined. If this options is used the implementation header will contain a commented out using namespace declaration for the requested namespace.
o
--no-baseclass-header
Do not write the file containing the parser class' base class, even if that file doesn't yet exist. By default the file containing the parser's base class is (re)written each time bisonc++ is called. Note that this option should normally be avoided, as the base class defines the symbolic terminal tokens that are returned by the lexical scanner. By suppressing the construction of this file any modification in these terminal tokens will not be communicated to the lexical scanner.
o
--no-lines
Do not put #line preprocessor directives in the file containing the parser's parse function. This option is primarily useful in combination with the %lines directive, to suppress that directive. It also overrules option --lines, though.
o
--no-parse-member
Do not write the file containing the parser's predefined parser member functions, even if that file doesn't yet exist. By default the file containing the parser's parse member function is (re)written each time bisonc++ is called. Note that this option should normally be avoided, as this file contains parsing tables which are altered whenever the grammar definition is modified.
o
--parsefun-skeleton=skeleton (-P)
Use skeleton as the pathname of the file containing the parsing member function's skeleton. Its filename defaults to bisonc++.cc. The environment variable BISON_SIMPLE is not inspected anymore.
o
--parsefun-source=source (-p)
Define source as the name of the source file containing the parser member function parse. Defaults to parse.cc.
o
--print=matched-text-function
The print option provides an implementation of the Parser class's print function displaying the current token value and the text matched by the lexical scanner as received by the generated parse function. The option value matched-text-function must be set to a function call expression returning the matched text. E.g.,
 
     --print="d_scanner.YYText()"
                 
 
o
--required-tokens=number
Following a syntactic error, require at least number successfully processed tokens before another syntactic error can be reported. By default number is zero.
o
--scanner=header (-s)
Use header as the pathname to the file defining a class Scanner, offering a member int yylex() producing the next token from the input stream to be analyzed by the parser generated by bisonc++. When this option is used the parser's member int lex() is predefined as
 
     int lex()
     {
         return d_scanner.yylex();
     }
                 
 
and an object Scanner d_scanner is composed into the parser. The d_scanner object is constructed using its default constructor. If another constructor is required, the parser class may be provided with an appropriate (overloaded) parser constructor after having constructed the default parser class header file using bisonc++. By default header is surrounded by double quotes (using, e.g., #include "header"). When the argument is surrounded by pointed brackets #include <header> is included. In the latter case, quotes might be required to escape interpretation by the shell (e.g., using -s '<header>').
o
--scanner-debug
Show de scanner's matched rules and returned tokens. This displays the rules and tokens matched and returned by bisonc++'s scanner, not the tokens received by the generated parser. If that is what you want use the --print option.
o
--scanner-token-function=function-call
The scanner function returning the next token, called from the generated parser's lex function. A complete function call expression should be provided (including a scanner object, if used). This option overrules the d_scanner.yylex() call used by default when the %scanner directive is specified. Example:
 
     --scanner-token-function "d_scanner.lex()"
                 
 
o
--show-filenames
Write the names of the generated files to the standard error stream.
o
--skeleton-directory=directory
Specifies the directory containing the skeleton files to use. This option can be overridden by the specific skeleton-specifying options (-B -C, -H, and -I).
o
--thread-safe
No static data are modified, making bisonc++ thread-safe.
o
--usage
Write basic usage information to the standard output stream and terminate.
o
--verbose (-V)
Write a file containing verbose descriptions of the parser states and what is done for each type of look-ahead token in that state. This file also describes all conflicts detected in the grammar, both those resolved by operator precedence and those that remain unresolved. By default it will not be created, but if requested it will receive the filename <parse>.output, where <parse> is the filename (without the .cc extension) of the file containing parse's implementation.
o
--version (-v)
Display bisonc++'s version number and terminate.

DIRECTIVES

The following directives can be used in the initial section of the grammar specification file. When command-line options for directives exist, they overrule the corresponding directives given in the grammar specification file. Directives affecting the class header or implementation header file are ignored if these files already exist.

o
%baseclass-header header
Defines the pathname of the file containing the parser's base class. This directive is overridden by the --baseclass-header or -b command-line options.
o
%baseclass-preinclude header
Use header as the pathname to the file pre-included in the parser's base-class header. See the description of the --baseclass-preinclude option for details about this option. Like the convention adopted for this argument, header will (by default) be surrounded by double quotes. However, when the argument is surrounded by pointed brackets #include <header> is included.
o
%class-header header
Defines the pathname of the file containing the parser class. This directive is overridden by the --class-header or -c command-line options.
o
%class-name parser-class-name
Declares the name of this parser. This directive replaces the %name declaration previously used by bison++. It defines the name of the C++ class that is generated. Contrary to bison++'s %name declaration, %class-name may appear anywhere in the first section of the grammar specification file. However, it may be defined only once. If no %class-name is specified the default class name Parser is used.
o
%debug
Provide parse and its support functions with debugging code, showing the actual parsing process on the standard output stream. When included, the debugging output is active by default, but its activity may be controlled using the setDebug(bool on-off) member. Note that no #ifdef DEBUG macros are used anymore. By rerunning bisonc++ without the --debug option an equivalent parser is generated not containing the debugging code.
o
%error-verbose
When a syntactic error is reported, the generated parse function will dump the parser's state stack to the standard output stream. The stack dump shows on separate lines a stack index followed by the state stored at the indicated stack element. The first stack element is the stack's top element.
o
%expect number
If defined the parser will not report encountered shift/reduce and reduce/reduce conflicts if all detected conflicts are equal to the number following %expect. Conflicts are mentioned in the .output file and the number of encountered conflicts is shown on the standard output if the actual number of conflicts deviates from number.
o
%filenames header
Defines the generic name of all generated files, unless overridden by specific names. This directive is overridden by the --filenames or -f command-line options.
o
%implementation-header header
Defines the pathname of the file containing the implementation header. This directive is overridden by the --implementation-header or -i command-line options.
o
%include path
This directive may be used to read part of the grammar specification file from the file specified at path. Unless path is an absolute file-path, path is searched relative to the location of bisonc++'s grammar specification file. This directive can be used to split long grammar specification files in shorter, meaningful units.
o
%left terminal ...
Defines the names of symbolic terminal tokens that should be treated as left-associative. I.e., in case of a shift/reduce conflict, a reduction is preferred over a shift. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive will have the lowest precedence, the last used the highest. See also %token below.
o
%lines
Put #line preprocessor directives in the file containing the parser's parse function. It acts identically to the -l command line option, and is suppressed by the --no-lines option.
o
%locationstruct struct-definition
Defines the organization of the location-struct data type LTYPE__. This struct should be specified analogously to the way the parser's stacktype is defined using %union (see below). The location struct is named LTYPE__. If neither locationstruct nor LTYPE__ is specified, the aforementioned default struct is used.
o
%lsp-needed
Defining this causes bisonc++ to include code into the generated parser using the standard location stack. The token-location type defaults to the following struct, defined in the parser's base class when this directive is specified:
 
     struct LTYPE__
     {
         int timestamp;
         int first_line;
         int first_column;
         int last_line;
         int last_column;
         char *text;
     };
            
 
o
%ltype typename
Specifies a user-defined token location type. If %ltype is used, typename should be the name of an alternate (predefined) type (e.g., size_t). It should not be used if a %locationstruct specification is defined (see below). Within the parser class, this type is available as the type `LTYPE__'. All text on the line following %ltype is used for the typename specification. It should therefore not contain comment or any other characters that are not part of the actual type definition.
o
%namespace namespace
Define the parser class in the namespace namespace. By default no namespace is defined. If this options is used the implementation header will contain a commented out using namespace declaration for the requested namespace. This directive is overridden by the --namespace command-line option.
o
%negative-dollar-indices
Do not generate warnings when zero- or negative dollar-indices are used in the grammar's action blocks. Zero or negative dollar-indices are commonly used to implement inherited attributes, and should normally be avoided. When used, they can be specified like $-1 or $<type>-1, where type is a %union field-name.
o
%nonassoc terminal ...
Defines the names of symbolic terminal tokens that should be treated as non-associative. I.e., in case of a shift/reduce conflict, a reduction is preferred over a shift. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive will have the lowest precedence, the last used the highest. See also %token below.
o
%parsefun-source source
Defines the pathname of the file containing the parser member parse. This directive is overridden by the --parse-source or -p command-line options.
o
%prec token
Overrules the defined precendence of an operator for a particular grammatic rule. Well known is the construction
 
     expression:
         '-' expression %prec UMINUS
         {
             ...
         }
                 
 
Here, the default priority and precedence of the `-' token as the subtraction operator is overruled by the precedence and priority of the UMINUS token, which is commonly defined as
 
     %right UMINUS
                 
 
(see below) following, e.g., the '*' and '/' operators.
o
%print matched-text-function
The print directive provides an implementation of the Parser class's print function displaying the current token value and the text matched by the lexical scanner as received by the generated parse function. The argument matched-text-function must define a complete function call expression returning the text matched by the lexical scanner. E.g.,
 
     %print d_scanner.YYText()
                 
 
If the function call expression contains white space matched-text-function should be surrounded by double quotes.
o
%required-tokens number
Following a syntactic error, require at least number successfully processed tokens before another syntactic error can be reported. By default number is zero.
o
%right terminal ...
Defines the names of symbolic terminal tokens that should be treated as right-associative. I.e., in case of a shift/reduce conflict, a shift is preferred over a reduction. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive will have the lowest precedence, the last used the highest. See also %token below.
o
%scanner header
Use header as the pathname to the file pre-included in the parser's class header. See the description of the --scanner option for details about this option. Similar to the convention adopted for this argument, header will (by default) be surrounded by double quotes. However, when the argument is surrounded by pointed brackets #include <header> is included. Note that using this directive implies the definition of a composed Scanner d_scanner data member into the generated parser, as well as a predefined int lex() member, returning d_scanner.yylex(). If this is inappropriate, a user defined implementation of int lex() must be provided.
o
%scanner-token-function function-name
The scanner function returning the next token, called from the generated parser's lex function. A complete function call expression should be provided (including a scanner object, if used). This option overrules the d_scanner.yylex() call used by default when the %scanner directive is specified. Example:
 
     %scanner-token-function d_scanner.lex()
                 
 
If the function call contains white space scanner-token-function should be surrounded by double quotes.
o
%start non-terminal
The non-terminal non-terminal should be used as the grammar's start-symbol. If omitted, the first grammatic rule is used as the grammar's starting rule. All syntactically correct sentences must be derivable from this starting rule.
o
%stype typename
The type of the semantic value of tokens. The specification typename should be the name of an unstructured type (e.g., size_t). By default it is int. See YYSTYPE in bison. It should not be used if a %union specification is defined. Within the parser class, this type is available as the type `STYPE__'. All text on the line following %stype is used for the typename specification. It should therefore not contain comment or any other characters that are not part of the actual type definition.
o
%token terminal ...
Defines the names of symbolic terminal tokens. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive will have the lowest precedence, the last used the highest.
NOTE: Symbolic tokens are defined as enum-values in the parser's base class. The names of symbolic tokens may not be equal to the names of the members and types defined by bisonc++ itself (see the next sections). This requirement is not currently enforced by bisonc++, but unexpected compilation errors may result if this requirement is violated.
o
%type <type> non-terminal ...
In combination with %union: associate the semantical value of a non-terminal symbol with a union field defined by the %union directive.
o
%union union-definition
Acts identically to the bison and bison++ declaration. as with bison generate a union for semantic type. The union type is named STYPE__. If no %union is declared, a simple stack-type may be defined using the %stype directive. If no %stype directive is used, the default stacktype (int) is used.

PUBLIC MEMBERS AND -TYPES

The following public members can be used by users of the parser classes generated by bisonc++ (`Parser Class':: prefixes are silently implied):

o
LTYPE__:
The parser's location type (user-definable). Available only when either %lsp-needed, %ltype or %locationstruct has been declared.
o
STYPE__:
The parser's stack-type (user-definable), defaults to int.
o
Tokens__:
The enumeration type of all the symbolic tokens defined in the grammar file (i.e., bisonc++'s input file). The scanner should be prepared to return these symbolic tokens Note that, since the symbolic tokens are defined in the parser's class and not in the scanner's class, the lexical scanner must prefix the parser's class name to the symbolic token names when they are returned. E.g., return Parser::IDENT should be used rather than return IDENT.
o
int parse():
The parser's parsing member function. It returns 0 when parsing has completed successfully, 1 if errors were encountered while parsing the input.
o
void setDebug(bool mode):
This member can be used to activate or deactivate the debug-code compiled into the parsing function. It is always available but is only operational if the %debug directive or --debug option was specified.When debugging code has been compiled into the parsing function, it is not active by default. To activate the debugging code, use setDebug(true). This member can be used to activate or deactivate the debug-code compiled into the parsing function. It is available but has no effect if no debug code has been compiled into the parsing function. When debugging code has been compiled into the parsing function, it is active by default, but debug-code is suppressed by calling setDebug(false).

PROTECTED ENUMS AND -TYPES

The following enumerations and types can be used by members of parser classes generated by bisonc++. They are actually protected members inherited from the parser's base class.

o
Base::ErrorRecovery__:
This enumeration defines two values:
 
     DEFAULT_RECOVERY_MODE__,
     UNEXPECTED_TOKEN__
         
 
The DEFAULT_RECOVERY_MODE__ terminates the parsing process. The non-default recovery procedure is available once an error token is used in a production rule. When the parsing process throws UNEXPECTED_TOKEN__ the recovery procedure is started (i.e., it is started whenever a syntactic error is encountered or ERROR() is called).
The recovery procedure consists of (1) looking for the first state on the state-stack having an error-production, followed by (2) handling all state transitions that are possible without retrieving a terminal token. Then, in the state requiring a terminal token and starting with the initial unexpected token (3) all subsequent terminal tokens are ignored until a token is retrieved which is a continuation token in that state.
If the error recovery procedure fails (i.e., if no acceptable token is ever encountered) error recovery falls back to the default recovery mode (i.e., the parsing process is terminated).
o
Base::Return__:
This enumeration defines two values:
 
     PARSE_ACCEPT = 0,
     PARSE_ABORT = 1
         
 
(which are of course the parse function's return values).

PRIVATE MEMBER FUNCTIONS

The following members can be used by members of parser classes generated by bisonc++. When prefixed by Base:: they are actually protected members inherited from the parser's base class. Members for which the phrase ``Used internally'' is used should not be called by user-defined code.

o
Base::ParserBase():
The default base-class constructor. Used internally.
o
void Base::ABORT() const throw(Return__):
This member can be called from any member function (called from any of the parser's action blocks) to indicate a failure while parsing thus terminating the parsing function with an error value 1. Note that this offers a marked extension and improvement of the macro YYABORT defined by bison++ in that YYABORT could not be called from outside of the parsing member function.
o
void Base::ACCEPT() const throw(Return__):
This member can be called from any member function (called from any of the parser's action blocks) to indicate successful parsing and thus terminating the parsing function. Note that this offers a marked extension and improvement of the macro YYACCEPT defined by bison++ in that YYACCEPT could not be called from outside of the parsing member function.
o
void Base::clearin():
This member replaces bison(++)'s macro yyclearin and causes bisonc++ to request another token from its lex() member, even if the current token has not yet been processed. It is a useful member when the parser should be reset to its initial state, e.g., between successive calls of parse. In this situation the scanner will probably be reloaded with new information too (in the context of a flex-generated scanner by, e.g., calling the scanner's yyrestart member.
o
bool Base::debug() const:
This member returns the current value of the debug variable.
o
void Base::ERROR() const throw(ErrorRecovery__):
This member can be called from any member function (called from any of the parser's action blocks) to generate an error, and thus initiate the parser's error recovery code. Note that this offers a marked extension and improvement of the macro YYERROR defined by bison++ in that YYERROR could not be called from outside of the parsing member function.
o
void error(char const *msg):
This member may be redefined in the parser class. Its default (inline) implementation is to write a simple message to the standard error stream. It is called when a syntactic error is encountered.
o
void errorRecovery__():
Used internally.
o
void Base::errorVerbose__():
Used internally.
o
void executeAction():
Used internally.
o
int lex():
This member may be pre-implemented using the scanner option or directive (see above) or it must be implemented by the programmer. It interfaces to the lexical scanner, and should return the next token produced by the lexical scanner, either as a plain character or as one of the symbolic tokens defined in the Parser::Tokens__ enumeration. Zero or negative token values are interpreted as `end of input'.
o
int lookup():
Used internally. otherwise. See also below, section BUGS.
o
void nextToken():
Used internally. otherwise. See also below, section BUGS.
o
void Base::pop__():
Used internally.
o
void Base::popToken__():
Used internally.
o
void print()):
This member can be redefined in the parser class to print information about the parser's state. It is called by the parser immediately after retrieving a token from lex. As it is a member function it has access to all the parser's members, in particular d_token, the current token value and d_loc__, the current token location information (if %lsp-needed, %ltype or %locationstruct has been specified). See also the option --print.
o
void Base::push__():
Used internally.
o
void Base::pushToken__():
Used internally.
o
void Base::reduce__():
Used internally.
o
void Base::symbol__():
Used internally.
o
void Base::top__():
Used internally.

PROTECTED DATA MEMBERS

The following private members can be used by members of parser classes generated by bisonc++. All data members are actually protected members inherited from the parser's base class.

o
size_t d_acceptedTokens__:
Counts the number of accepted tokens since the start of the parse() function or since the last detected syntactic error. It is initialized to d_requiredTokens__ to allow an early error to be detected as well.
o
bool d_debug__:
When the debug option has been specified, this variable (true by default) determines whether debug information is actually displayed.
o
LTYPE__ d_loc__:
The location type value associated with a terminal token. It can be used by, e.g., lexical scanners to pass location information of a matched token to the parser in parallel with a returned token. It is available only when %lsp-needed, %ltype or %locationstruct has been defined.
Lexical scanners may be offered the facility to assign a value to this variable in parallel with a returned token. In order to allow a scanner access to d_loc__, d_loc__'s address should be passed to the scanner. This can be realized, for example, by defining a member void setLoc(STYPE__ *) in the lexical scanner, which is then called from the parser's constructor as follows:
 
             d_scanner.setSLoc(&d_loc__);
        
 
Subsequently, the lexical scanner may assign a value to the parser's d_loc__ variable through the pointer to d_loc__ stored inside the lexical scanner.
o
LTYPE__ d_lsp__:
The location stack pointer. Do not modify.
o
size_t d_nErrors__:
The number of errors counted by parse. It is initialized by the parser's base class initializer, and is updated while parse executes. When parse has returned it contains the total number of errors counted by parse. Errors are not counted if suppressed (i.e., if d_acceptedTokens__ is less than d_requiredTokens__).
o
size_t d_nextToken__:
A pending token. Do not modify.
o
size_t d_requiredTokens__:
Defines the minimum number of accepted tokens that the parse function must have processed before a syntactic error can be generated.
o
int d_state__:
The current parsing state. Do not modify.
o
int d_token__:
The current token. Do not modify.
o
STYPE__ d_val__:
The semantic value of a returned token or non-terminal symbol. With non-terminal tokens it is assigned a value through the action rule's symbol $$. Lexical scanners may be offered the facility to assign a semantic value to this variable in parallel with a returned token. In order to allow a scanner access to d_val__, d_val__'s address should be passed to the scanner. This can be realized, for example, by passing d_val__'s address to the lexical scanner's constructor. Subsequently, the lexical scanner may assign a value to the parser's d_val__ variable through the pointer to d_val__ stored in a data member of the lexical scanner. Note that in some cases this approach must be used to make available the correct semantic value to the parser. In particular, when a grammar state defines multiple reductions, depending on the next token, the reduction's action only takes place following the retrieval of the next token, thus losing the initially matched token text.
o
LTYPE__ d_vsp__:
The semantic value stack pointer. Do not modify.

TYPES AND VARIABLES IN THE ANONYMOUS NAMESPACE

In the file defining the parse function the following types and variables are defined in the anonymous namespace. These are mentioned here for the sake of completeness, and are not normally accessible to other parts of the parser.

o
char const author[]:
Defining the name and e-mail address of Bisonc++'s author.
o
ReservedTokens:
This enumeration defines some token values used internally by the parsing functions. They are:
 
     PARSE_ACCEPT   =  0,
     _UNDETERMINED_ = -2,
     _EOF_          = -1,
     _error_        = 256,
        
 
These tokens are used by the parser to determine whether another token should be requested from the lexical scanner, and to handle error-conditions.
o
StateType:
This enumeration defines several moe token values used internally by the parsing functions. They are:
 
         NORMAL,
         ERR_ITEM,
         REQ_TOKEN,
         ERR_REQ,    // ERR_ITEM | REQ_TOKEN
         DEF_RED,    // state having default reduction
         ERR_DEF,    // ERR_ITEM | DEF_RED
         REQ_DEF,    // REQ_TOKEN | DEF_RED
         ERR_REQ_DEF // ERR_ITEM | REQ_TOKEN | DEF_RED
        
 
These tokens are used by the parser to define the types of the various states of the analyzed grammar.
o
PI__ (Production Info):
The type defines a struct containing information about the production rules that were used by a grammar.
Its first field contains the identification number of a production's defining non-terminal;
Its second field defines the number of elements of a production
o
SR__ (Shift-Reduce Info):
This struct provides the shift/reduce information for the various grammatic states. SR__ values are collected in arrays, one array per grammatic state. These array, named s_<nr>, where tt<nr> is a state number are defined in the anonymous namespace as well. The SR__ elements consist of two unions, defining fields that are applicable to, respectively, the first, intermediate and the last array elements.
The first element of each array consists of (1st field) a StateType and (2nd field) the index of the last array element; intermediate elements consist of (1st field) a symbol value and (2nd field) (if negative) the production rule number reducing to the indicated symbol value or (if positive) the next state when the symbol given in the 1st field is the current token; the last element of each array consists of (1st field) a placeholder for the current token and (2nd field) the (negative) rule number to reduce to by default or the (positive) number of an error-state to go to when an erroneous token has been retrieved. If the 2nd field is zero, no error or default action has been defined for the state, and error-recovery is attepted.
o
STACK_EXPANSION:
An enumeration value specifying the number of additional elements that are added to the state- and semantic value stacks when full.
o
PI__ (Production Info):
This struct provides information about production rules. It has two fields: d_nonTerm is the identification number of the production's non-terminal, d_size represents the number of elements of the productin rule.
o
static PI__ s_productionInfo:
Used internally by the parsing function.
o
static SR__ s_<nr>[]:
Here, <nr> is a numerical value representing a state number. Used internally by the parsing function.
o
static SR__ *s_state[]:
Used internally by the parsing function.

RESTRICTIONS ON TOKEN NAMES

To avoid collisions with names defined by the parser's (base) class, the following identifiers should not be used as token names:

o
Identifiers ending in two underscores;
o
Any of the following identifiers: ABORT, ACCEPT, ERROR, clearin, debug, or setDebug.

OBSOLETE SYMBOLS

All DECLARATIONS and DEFINE symbols not listed above but defined in bison++ are obsolete with bisonc++. In particular, there is no %header{ ... %} section anymore. Also, all DEFINE symbols related to member functions are now obsolete. There is no need for these symbols anymore as they can simply be declared in the class header file and defined elsewhere.

EXAMPLE

Using a fairly worn-out example, we'll construct a simple calculator below. The basic operators as well as parentheses can be used to specify expressions, and each expression should be terminated by a newline. The program terminates when a q is entered. Empty lines result in a mere prompt.

First an associated grammar is constructed. When a syntactic error is encountered all tokens are skipped until then next newline and a simple message is printed using the default error function. It is assumed that no semantic errors occur (in particular, no divisions by zero). The grammar is decorated with actions performed when the corresponding grammatic production rule is recognized. The grammar itself is rather standard and straightforward, but note the first part of the specification file, containing various other directives, among which the %scanner directive, resulting in a composed d_scanner object as well as an implementation of the member function int lex. In this example, a common Scanner class construction strategy was used: the class Scanner was derived from the class yyFlexLexer generated by flex++(1). The actual process of constructing a class using flex++(1) is beyond the scope of this man-page, but flex++(1)'s specification file is mentioned below, to further complete the example. Here is bisonc++'s input file:

 %filenames parser
 %scanner ../scanner/scanner.h
 
                                 // lowest precedence
 %token  NUMBER                  // integral numbers
         EOLN                    // newline
 
 %left   '+' '-' 
 %left   '*' '/' 
 %right  UNARY
                                 // highest precedence 
 
 %%
 
 expressions:
     expressions 
     evaluate
 |
     prompt
 ;
 
 evaluate:
     alternative
     prompt
 ;
 
 prompt:
     {
         prompt();
     }
 ;
 
 alternative:
     expression
     EOLN
     {
         cout << $1 << endl;
     }
 |
     'q'
     done
 |
     EOLN
 |
     error
     EOLN
 ;
 
 done:
     {
         cout << "Done.\n";
         ACCEPT();
     }
 ;
 
 expression:
     expression 
     '+'
     expression
     {
         $$ = $1 + $3;
     }
 |
     expression 
     '-'
     expression
     {
         $$ = $1 - $3;
     }
 |
     expression 
     '*'
     expression
     {
         $$ = $1 * $3;
     }
 |
     expression 
     '/'
     expression
     {
         $$ = $1 / $3;
     }
 |
     '-'             
     expression      %prec UNARY
     {
         $$ = -$2;
     }
 |
     '+'             
     expression      %prec UNARY
     {
         $$ = $2;
     }
 |
     '('
     expression
     ')'
     {
         $$ = $2;
     }
 |
     NUMBER
     {
         $$ = atoi(d_scanner.YYText());
     }
 ;
 
 

Next, bisonc++ processes this file. In the process, bisonc++ generates the following files from its skeletons:

o
The parser's base class, which is not modified by the programmer at all:
 #ifndef ParserBase_h_included
 #define ParserBase_h_included
 
 #include <vector>
 #include <iostream>
 
 
 namespace // anonymous
 {
     struct PI__;
 }
 
 
 class ParserBase
 {
     public:
 // $insert tokens
 
     // Symbolic tokens:
     enum Tokens__
     {
         NUMBER = 257,
         EOLN,
         UNARY,
     };
 
 // $insert STYPE
 typedef int STYPE__;
 
     private:
         int d_stackIdx__;
         std::vector<size_t>   d_stateStack__;
         std::vector<STYPE__>  d_valueStack__;
 
     protected:
         enum Return__
         {
             PARSE_ACCEPT__ = 0,   // values used as parse()'s return values
             PARSE_ABORT__  = 1
         };
         enum ErrorRecovery__
         {
             DEFAULT_RECOVERY_MODE__,
             UNEXPECTED_TOKEN__,
         };
         bool        d_debug__;
         size_t      d_nErrors__;
         size_t      d_requiredTokens__;
         size_t      d_acceptedTokens__;
         int         d_token__;
         int         d_nextToken__;
         size_t      d_state__;
         STYPE__    *d_vsp__;
         STYPE__     d_val__;
 
         ParserBase();
 
         void ABORT() const;
         void ACCEPT() const;
         void ERROR() const;
         void checkEOF__() const;
         void clearin();
         bool debug() const;
         void pop__(size_t count = 1);
         void push__(size_t nextState);
         void popToken__();
         void pushToken__(int token);
         void reduce__(PI__ const &productionInfo);
         void errorVerbose__();
         size_t top__() const;
 
     public:
         void setDebug(bool mode);
 }; 
 
 inline bool ParserBase::debug() const
 {
     return d_debug__;
 }
 
 inline void ParserBase::setDebug(bool mode)
 {
     d_debug__ = mode;
 }
 
 inline void ParserBase::ABORT() const
 {
     throw PARSE_ABORT__;
 }
 
 inline void ParserBase::ACCEPT() const
 {
     throw PARSE_ACCEPT__;
 }
 
 inline void ParserBase::ERROR() const
 {
     throw UNEXPECTED_TOKEN__;
 }
 
 
 // As a convenience, when including ParserBase.h its symbols are available as
 // symbols in the class Parser, too.
 #define Parser ParserBase
 
 
 #endif
 
 
 
 
o
The parser class parser.h itself. In the grammar specification various member functions are used (e.g., done) and prompt. These functions are so small that they can very well be implemented inline. Note that done calls ACCEPT to terminate further parsing. ACCEPT and related members (e.g., ABORT) can be called from any member called by parse. As a consequence, action blocks could contain mere function calls, rather than several statements, thus minimizing the need to rerun bisonc++ when an action is modified.
Once bisonc++ had created parser.h it was augmented with the required additional members, resulting in the following final version:
 #ifndef Parser_h_included
 #define Parser_h_included
 
 // $insert baseclass
 #include "parserbase.h"
 // $insert scanner.h
 #include "../scanner/scanner.h"
 
 
 #undef Parser
 class Parser: public ParserBase
 {
     // $insert scannerobject
     Scanner d_scanner;
         
     public:
         int parse();
 
     private:
         void error(char const *msg);    // called on (syntax) errors
         int lex();                      // returns the next token from the
                                         // lexical scanner. 
         void print();                   // use, e.g., d_token, d_loc
 
         void prompt();
         void done();
 
     // support functions for parse():
         void executeAction(int ruleNr);
         void errorRecovery();
         int lookup(bool recovery);
         void nextToken();
 };
 
 inline void Parser::error(char const *msg)
 {
     std::cerr << msg << '\n';
 }
 
 // $insert lex
 inline int Parser::lex()
 {
     return d_scanner.yylex();
 }
 
 inline void Parser::print()      // use d_token, d_loc
 {}
 
 inline void Parser::prompt()
 {
     std::cout << "? " << std::flush;
 }
 
 inline void Parser::done()
 {
     std::cout << "Done\n";
     ACCEPT();
 }
 
 #endif
 
 
o
To complete the example, the following lexical scanner specification was used:
 %{
     #define _SKIP_YYFLEXLEXER_
     #include "scanner.ih"
 
     #include "../parser/parser.h"
 %}
 
 %option yyclass="Scanner" outfile="yylex.cc"
 %option c++ 8bit warn noyywrap yylineno
 
 %%
 
 [ \t]+                          // skip white space
 
 \n                              return Parser::EOLN;
 
 [0-9]+                          return Parser::NUMBER;
 
 .                               return yytext[0];
 
 
 %%
 
 
 
 
o
Since no member functions other than parse were defined in separate source files, only parse includes parser.ih. Since cerr is used in the grammar's actions, a using namespace std or comparable statement is required. This was effectuated from parser.ih Here is the implementation header declaring the standard namespace:
 // include this file in the sources of the class Calculator, 
 // and add any includes etc. that are only needed for 
 // the compilation of these sources.
 
 // include the file defining the parser class:
 #include "parser.h"
 
 // UN-comment if you don't want to prefix std:: 
 // for every symbol defined in the std. namespace:
 
 using namespace std;
 
 

The implementation of the parsing member function parse is basically irrelevant, since it should not be modified by the programmer. It was written on the file parse.cc.

o
Finally, here is the program offering our simple calculator:
 #include "parser/parser.h"
 
 int main()
 {
     Parser calculator;
     return calculator.parse();
 }
 
 

USING PARSER-CLASS SYMBOLS IN LEXICAL SCANNERS

Note here that although the file parserbase.h, defining the parser class' base-class, rather than the header file parser.h defining the parser class is included, the lexical scanner may simply return tokens of the class Calculator (e.g., Calculator::NUMBER rather than CalculatorBase::NUMBER). In fact, using a simple #define - #undef pair generated by the bisonc++ respectively at the end of the base class header the file and just before the definition of the parser class itself it is the possible to assume in the lexical scanner that all symbols defined in the the parser's base class are actually defined in the parser class itself. It the should be noted that this feature can only be used to access base class the enum and types. The actual parser class is not available by the time the the lexical scanner is defined, thus avoiding circular class dependencies.

FILES

o
bisonc++base.h: skeleton of the parser's base class;
o
bisonc++.h: skeleton of the parser class;
o
bisonc++.ih: skeleton of the implementation header;
o
bisonc++.cc: skeleton of the member parse.

SEE ALSO

bison(1), bison++(1), bison.info (using texinfo), flex++(1)

Lakos, J. (2001) Large Scale C++ Software Design, Addison Wesley.
Aho, A.V., Sethi, R., Ullman, J.D. (1986) Compilers, Addison Wesley.

BUGS

To avoid collisions with names defined by the parser's (base) class, the following identifiers should not be used as token nams:

o
Identifiers ending in two underscores;
o
Any of the following identifiers: ABORT, ACCEPT, ERROR, clearin, debug, error, or setDebug.

When re-using files generated by bisonc++ before version 2.0.0, minor hand-modification might be necessary. The identifiers in the following list (defined in the parser's base class) now have two underscores affixed to them: LTYPE, STYPE and Tokens. When using classes derived from the generated parser class, the following identifiers are available in such derived classes: DEFAULT_RECOVERY_MODE, ErrorRecovery, Return, UNEXPECTED_TOKEN, d_debug, d_loc, d_lsp, d_nErrors, d_nextToken, d_state, d_token, d_val, and d_vsp. When used in derived classes, they too need two underscores affixed to them.

The member function void lookup (< 1.00) was replaced by int lookup. When regenerating parsers created by early versions of bisonc++ (versions before version 1.00), lookup's prototype should be corrected by hand, since bisonc++ will not by itself rewrite the parser class's header file.

The Semantic parser, mentioned in bison++(1) is not implemented in bisonc++(1). According to bison++(1) the semantic parser was not available in bison++ either. It is possible that the Pure parser is now available via the --thread-safe option.

ABOUT bisonc++

Bisonc++ was based on bison++, originally developed by Alain Coetmeur (coetmeur@icdc.fr), R&D department (RDT), Informatique-CDC, France, who based his work on bison, GNU version 1.21.

Bisonc++ version 0.98 and beyond is a complete rewrite of an LALR-1 parser generator, closely following the construction process as described in Aho, Sethi and Ullman's (1986) book Compilers (i.e., the Dragon book). It the uses same grammar specification as bison and bison++, and it uses practically the same options and directives as bisonc++ versions earlier than 0.98. Variables, declarations and macros that are obsolete were removed. Since bisonc++ is a completely new program, it will most likely contain bugs. Please report bugs to the author:

AUTHOR

Frank B. Brokken (f.b.brokken@rug.nl).