# Syntax Anlysis 2 ## Bottom-Up Parsing Bottom-up Parsing is more efficient than Top-down parsing. it uses LR grammars: Left-recursive and right-most derivation. Rely on **Shift-reduce Parsers**. example: ```c E -> T | T + E T -> int | int * T | (E) ``` ```c // reduce int * int + int | T -> int int * T + int | T -> int * T T + int | T -> int T + T | E -> T T + E | E -> T + E E | // view as reversed, it seems right-most derivation ``` ### Shift-Reudce Parsing ```c E -> T | T + E T -> int | int * T | (E) ``` | Stack | Input | Applied Production | | ----------- | ----------------- | --------------------- | | | `int * int + int` | shift | | `int` | `* int + int` | shift | | `int *` | `int + int` | shift | | `int * int` | `+ int` | reduce `T -> int` | | `int * T` | `+ int` | reduce `T -> int * T` | | `T` | `+ int` | shift | | `T +` | `int` | shift | | `T + int` | | reduce `T -> int` | | `T + T` | | reduce `E -> T` | | `T + E` | | reduce `E -> T + E` | | `E` | | | **Action Selection Problem** Which action should we take? when shift, when reduce, which production applied? #### LR-Style Grammars * LR(k): left-toright scanning, right most derivation and k symbol lookahead * LR(0) Grammar LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using **only the symbols in the stack**. represent shift-reduce parsing using an **NFA**, whose states are production with separator '`.`' on RHS. for eample, a production `T -> (E)` has four states: `T -> .(E)`, `T -> (.E)`, `T -> (E.)`, `T -> (E.)`. before `.` means already in stack, next item means expecting item. plus an additional dummy production `S' -> S$` for a start and end state. there are two types of transitions between the stats - shift transition - $\epsilon$ transition: example: ``` S -> (L) | id L -> LS | L,S ``` ``` S' -> S$ S -> (L) | id L -> S | L,S ``` It can be represented as a NFA: ```python {cmd matplotlib hide} import sys import pymupdf from PIL import Image doc = pymupdf.open("../pdf/L4.pdf") pix = doc[22].get_pixmap(dpi=360) img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) plt.imshow(img) plt.axis('off') plt.tight_layout() plt.show() ``` * SLR(1) Parsing * LR(1) Grammar ```python {cmd matplotlib hide} import sys import pymupdf from PIL import Image doc = pymupdf.open("../pdf/L4.pdf") pix = doc[47].get_pixmap(dpi=360) img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) plt.imshow(img) plt.axis('off') plt.tight_layout() plt.show() ``` LR(1) Parsing Table is same as LR(0) parsing table construction except for reductions: * LALR(1) Grammar LALR(1) generally has the same number of states as SLR (much less than LR(1)) for Pascal language, SLR requires several hundred states, LR(1) requires several thousand states. #### Ambiguous Grammar Ambiguity is mainly from * Precedence * The production at higher levels will have operators with lower priorities (and vice versa). * we can insert non-terminals to enforce precendence. * Associativity * we should determine where to place recursion depending on the associativity for example: `if-then-else` **Automatic Disambiguation** We can define precedence to use ambiguous grammars w/o shift-reduce conflicts. ## AST ### AST Construction LL ```c expr parse_S() { switch(token) { case num: case '(': expr child1 = parse_E(); expr child2 = parse_Sp(); return new S(child1, child2); default: parseError(); } } ``` ### AST Construction LR