3.9 KiB
Syntax Anlysis 2
Bottom-Up Parsing
Bottom-up Parsing is more efficient than Top-down parsing. it uses LR grammars: Left-recursive and right-most derivation.
Rely on Shift-reduce Parsers.
example:
E -> T | T + E
T -> int | int * T | (E)
// reduce
int * int + int | T -> int
int * T + int | T -> int * T
T + int | T -> int
T + T | E -> T
T + E | E -> T + E
E |
// view as reversed, it seems right-most derivation
Shift-Reudce Parsing
E -> T | T + E
T -> int | int * T | (E)
| Stack | Input | Applied Production |
|---|---|---|
int * int + int |
shift | |
int |
* int + int |
shift |
int * |
int + int |
shift |
int * int |
+ int |
reduce T -> int |
int * T |
+ int |
reduce T -> int * T |
T |
+ int |
shift |
T + |
int |
shift |
T + int |
reduce T -> int |
|
T + T |
reduce E -> T |
|
T + E |
reduce E -> T + E |
|
E |
Action Selection Problem Which action should we take? when shift, when reduce, which production applied?
LR-Style Grammars
-
LR(k): left-toright scanning, right most derivation and k symbol lookahead
-
LR(0) Grammar
LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using only the symbols in the stack.
represent shift-reduce parsing using an NFA, whose states are production with separator '.' on RHS.
for eample, a production T -> (E) has four states: T -> .(E), T -> (.E), T -> (E.), T -> (E.).
before . means already in stack, next item means expecting item.
plus an additional dummy production S' -> S$ for a start and end state.
there are two types of transitions between the stats
- shift transition
\epsilontransition:
example:
S -> (L) | id
L -> LS | L,S
S' -> S$
S -> (L) | id
L -> S | L,S
It can be represented as a NFA:
import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[22].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()
-
SLR(1) Parsing
-
LR(1) Grammar
import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[47].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()
LR(1) Parsing Table
is same as LR(0) parsing table construction except for reductions:
- LALR(1) Grammar
LALR(1) generally has the same number of states as SLR (much less than LR(1)) for Pascal language, SLR requires several hundred states, LR(1) requires several thousand states.
Ambiguous Grammar
Ambiguity is mainly from
- Precedence
- The production at higher levels will have operators with lower priorities (and vice versa).
- we can insert non-terminals to enforce precendence.
- Associativity
- we should determine where to place recursion depending on the associativity
for example: if-then-else
Automatic Disambiguation
We can define precedence to use ambiguous grammars w/o shift-reduce conflicts.
AST
AST Construction LL
expr parse_S() {
switch(token) {
case num:
case '(':
expr child1 = parse_E();
expr child2 = parse_Sp();
return new S(child1, child2);
default:
parseError();
}
}