Gogll uses the same input grammar for GLL and LR(1). GLL is the default output - to generate
an LR(1) parser the option -pager
or -knuth
must be used. The Knuth option is
provided only for reference. In general the Pager machine should be used as it
produces the smallest LR(1) tables.
Note: LR(1) grammars should be left recursive.
The following gogll usage options are relevant to generating LR(1) parsers. Get the
full help by running gogll -h
.
use: gogll [-a][-v] [-o <out dir>] [-go] [-rust] [-pager] [-knuth] [-resolve_conflicts] <source file>
to generate a lexer and parser.
<source file>: Mandatory. Name of the source file to be processed.
If the file extension is ".md" the bnf is extracted from markdown code
segments enclosed in triple backticks.
-a: Optional. Regenerate all files.
WARNING: This may destroy user editing in the LR(1) AST.
Default: false
-v: Optional. Produce verbose output, including first and follow sets,
LR(1) sets and lexer FSA sets.
-o <out dir>: Optional. The directory to which code will be generated.
Default: the same directory as <source file>.
-go: Optional. Generate Go code.
Default: true, but false if -rust is selected
-rust: Optional. Generate Rust code.
Default: false
-knuth: Optional. Generate a Knuth LR(1) parser
Default false
-pager: Optional. Generate a Pager PGM LR(1) parser.
Default false
-resolve_conflicts: Optional. Automatically resolve LR(1) conflicts.
Default: false. Only used when generating LR(1) parsers.
Gogll generates the following modules when LR(1) is used:
ast
: A module containing the action routines called by the LR(1) parser when reducing a production. Gogll generates the skeleton of the action routines, which must be implemented by the user. See Implementing the AST below.lexer
: A linear time FSA lexer. This should not be edited by the user.parser
: The files containing a table-driven LR(1) parser. These should not be edited by the user.token
: The token definitions used by the lexer and parser. This should not be edited by the user.
This section uses examples/lr1/g1/g1.md as example. It has the following grammar:
package "g1"
E1 : E1 "+" T1 | T1 ;
T1 : "a" ;
The following extract from the makefile shows how to generate
Go and Rust lexers and parsers from g1.md
:
go: $(GODIR)/go.mod
gogll -pager -o $(GODIR) g1.md
The command above generates a lexer and LR(1) parser (Pager) with matching lexer in Go.
rust: $(RUSTDIR)/Cargo.toml
gogll -rust -pager -o $(RUSTDIR) g1.md
The command above generates a lexer and LR(1) parser (Pager) with matching lexer in rust.
Every action routine returns a tuple: (interface{}, error)
. A non-zero error
is returned if the parameters of the action routine contain a semantic error.
Otherwise the routine constructs an AST node, which it returns with a null error
.
The parameters of an action routing match the symbols of its production. A non-terminal symbol will have been returned by another action routine.
A terminal symbol with be a token from the lexer.
If an action routine returns a non-null error
the parse will be terminated with
the error.
Goggl generates the following AST skeleton to be completed by the user. It contains an action routing called by the parser for every production alternate of the grammar.
// Generated by GoGLL.
package ast
import(
"fmt"
)
// G0 : E1 ;
func G00(p0 interface{})(interface{}, error){
fmt.Println("ast.G00 is unimplemented")
return nil, nil
}
func G00
is called when the parser accepts the start production of the grammar, E1
.
p0 will have the type returned by the action routine: func E10
. G0
is an
extra start production generated by gogll only for LR(1) parsers.
The value returned by func G00
will be returned by the parser when the parse is
complete.
// E1 : E1 + T1 ;
func E10(p0, p1, p2 interface{})(interface{}, error){
fmt.Println("ast.E10 is unimplemented")
return nil, nil
}
func E10
is the action routine called by the parser when the production
E1 : E1 + T1
has been parsed. It has three parameters:
p0
will have the type returned by func E10
or func E11
.
p1
will have type *token.Token
.
p2
will have the type returned by func T1
.
// E1 : T1 ;
func E11(p0 interface{})(interface{}, error){
fmt.Println("ast.E11 is unimplemented")
return nil, nil
}
func E11
is called by the parser when production E1 : T1
has been parsed.
p0 will have the type returned by func T10
.
// T1 : a ;
func T10(p0 interface{})(interface{}, error){
fmt.Println("ast.T10 is unimplemented")
return nil, nil
}
func T10
is called when the production T1 : a
.
p0
will have the type *token.Token
The following is the completed AST:
// Generated by GoGLL. Edited by user
package ast
import (
"g1/token"
)
// G0 : E1 ;
func G00(p0 interface{}) ([]string, error) {
return p0.([]string), nil
}
func G00
adds nothing the value returned by the alternates of
E1
, which is []string
.
// E1 : E1 + T1 ;
func E10(p0, p1, p2 interface{}) ([]string, error) {
e := append(
p0.([]string),
p1.(*token.Token).LiteralString(),
p2.(string))
return e, nil
}
func E10
appends the literal string of p1
(*token.Token
) and the string value
of p2
(returned by func T10
) to the string value of p0
(returned by func E10
or func E11
).
// E1 : T1 ;
func E11(p0 interface{}) ([]string, error) {
return []string{p0.(string)}, nil
}
func E11
constructs an []string
from the value of p0
// T1 : a ;
func T10(p0 interface{}) (string, error) {
return p0.(*token.Token).LiteralString(), nil
}
func T10
returns the literal string value of p0
(*token.Token
).
The generated Rust AST has the same structure as the Go. For g1.md
, please see: