Syntax Anlysis 2

Bottom-Up Parsing

Bottom-up Parsing is more efficient than Top-down parsing. it uses LR grammars: Left-recursive and right-most derivation.

Rely on Shift-reduce Parsers.

example:

E -> T | T + E
T -> int | int * T | (E)

// reduce
int * int + int | T -> int
int * T + int   | T -> int * T
T + int         | T -> int
T + T           | E -> T
T + E           | E -> T + E
E               |
// view as reversed, it seems right-most derivation

Shift-Reudce Parsing

E -> T | T + E
T -> int | int * T | (E)

Stack	Input	Applied Production
	`int * int + int`	shift
`int`	`* int + int`	shift
`int *`	`int + int`	shift
`int * int`	`+ int`	reduce `T -> int`
`int * T`	`+ int`	reduce `T -> int * T`
`T`	`+ int`	shift
`T +`	`int`	shift
`T + int`		reduce `T -> int`
`T + T`		reduce `E -> T`
`T + E`		reduce `E -> T + E`
`E`

Action Selection Problem Which action should we take? when shift, when reduce, which production applied?

LR-Style Grammars

LR(k): left-toright scanning, right most derivation and k symbol lookahead
LR(0) Grammar

LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using only the symbols in the stack.

represent shift-reduce parsing using an NFA, whose states are production with separator '.' on RHS. for eample, a production T -> (E) has four states: T -> .(E), T -> (.E), T -> (E.), T -> (E.). before . means already in stack, next item means expecting item. plus an additional dummy production S' -> S$ for a start and end state.

there are two types of transitions between the stats

shift transition
\epsilon transition:

example:

S -> (L) | id
L -> LS | L,S

S' -> S$
S -> (L) | id
L -> S | L,S

It can be represented as a NFA:

import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[22].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)

plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()

SLR(1) Parsing
LR(1) Grammar

import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[47].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)

plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()

LR(1) Parsing Table

is same as LR(0) parsing table construction except for reductions:

LALR(1) Grammar

LALR(1) generally has the same number of states as SLR (much less than LR(1)) for Pascal language, SLR requires several hundred states, LR(1) requires several thousand states.

Ambiguous Grammar

Ambiguity is mainly from

Precedence
- The production at higher levels will have operators with lower priorities (and vice versa).
- we can insert non-terminals to enforce precendence.
Associativity
- we should determine where to place recursion depending on the associativity

for example: if-then-else

Automatic Disambiguation

We can define precedence to use ambiguous grammars w/o shift-reduce conflicts.

AST

AST Construction LL

expr parse_S() {
    switch(token) {
        case num:
        case '(':
            expr child1 = parse_E();
            expr child2 = parse_Sp();
            return new S(child1, child2);
        default:
            parseError();

    }
}

3.9 KiB Raw Blame History