170 lines
3.9 KiB
Markdown
170 lines
3.9 KiB
Markdown
# Syntax Anlysis 2
|
|
|
|
## Bottom-Up Parsing
|
|
|
|
Bottom-up Parsing is more efficient than Top-down parsing.
|
|
it uses LR grammars: Left-recursive and right-most derivation.
|
|
|
|
Rely on **Shift-reduce Parsers**.
|
|
|
|
example:
|
|
|
|
```c
|
|
E -> T | T + E
|
|
T -> int | int * T | (E)
|
|
```
|
|
|
|
```c
|
|
// reduce
|
|
int * int + int | T -> int
|
|
int * T + int | T -> int * T
|
|
T + int | T -> int
|
|
T + T | E -> T
|
|
T + E | E -> T + E
|
|
E |
|
|
// view as reversed, it seems right-most derivation
|
|
```
|
|
|
|
### Shift-Reudce Parsing
|
|
|
|
```c
|
|
E -> T | T + E
|
|
T -> int | int * T | (E)
|
|
```
|
|
|
|
| Stack | Input | Applied Production |
|
|
| ----------- | ----------------- | --------------------- |
|
|
| | `int * int + int` | shift |
|
|
| `int` | `* int + int` | shift |
|
|
| `int *` | `int + int` | shift |
|
|
| `int * int` | `+ int` | reduce `T -> int` |
|
|
| `int * T` | `+ int` | reduce `T -> int * T` |
|
|
| `T` | `+ int` | shift |
|
|
| `T +` | `int` | shift |
|
|
| `T + int` | | reduce `T -> int` |
|
|
| `T + T` | | reduce `E -> T` |
|
|
| `T + E` | | reduce `E -> T + E` |
|
|
| `E` | | |
|
|
|
|
|
|
**Action Selection Problem**
|
|
Which action should we take?
|
|
when shift, when reduce, which production applied?
|
|
|
|
#### LR-Style Grammars
|
|
|
|
* LR(k): left-toright scanning, right most derivation and k symbol lookahead
|
|
|
|
* LR(0) Grammar
|
|
|
|
LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using **only the symbols in the stack**.
|
|
|
|
represent shift-reduce parsing using an **NFA**, whose states are production with separator '`.`' on RHS.
|
|
for eample, a production `T -> (E)` has four states: `T -> .(E)`, `T -> (.E)`, `T -> (E.)`, `T -> (E.)`.
|
|
before `.` means already in stack, next item means expecting item.
|
|
plus an additional dummy production `S' -> S$` for a start and end state.
|
|
|
|
there are two types of transitions between the stats
|
|
- shift transition
|
|
- $\epsilon$ transition:
|
|
|
|
example:
|
|
|
|
```
|
|
S -> (L) | id
|
|
L -> LS | L,S
|
|
```
|
|
|
|
```
|
|
S' -> S$
|
|
S -> (L) | id
|
|
L -> S | L,S
|
|
```
|
|
|
|
It can be represented as a NFA:
|
|
|
|
```python {cmd matplotlib hide}
|
|
import sys
|
|
import pymupdf
|
|
from PIL import Image
|
|
doc = pymupdf.open("../pdf/L4.pdf")
|
|
pix = doc[22].get_pixmap(dpi=360)
|
|
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
|
|
|
|
plt.imshow(img)
|
|
plt.axis('off')
|
|
plt.tight_layout()
|
|
plt.show()
|
|
```
|
|
|
|
* SLR(1) Parsing
|
|
|
|
* LR(1) Grammar
|
|
|
|
|
|
```python {cmd matplotlib hide}
|
|
import sys
|
|
import pymupdf
|
|
from PIL import Image
|
|
doc = pymupdf.open("../pdf/L4.pdf")
|
|
pix = doc[47].get_pixmap(dpi=360)
|
|
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
|
|
|
|
plt.imshow(img)
|
|
plt.axis('off')
|
|
plt.tight_layout()
|
|
plt.show()
|
|
```
|
|
|
|
LR(1) Parsing Table
|
|
|
|
is same as LR(0) parsing table construction except for reductions:
|
|
|
|
|
|
* LALR(1) Grammar
|
|
|
|
LALR(1) generally has the same number of states as SLR (much less than LR(1))
|
|
for Pascal language, SLR requires several hundred states, LR(1) requires several thousand states.
|
|
|
|
#### Ambiguous Grammar
|
|
|
|
Ambiguity is mainly from
|
|
|
|
* Precedence
|
|
* The production at higher levels will have operators with lower priorities (and vice versa).
|
|
* we can insert non-terminals to enforce precendence.
|
|
* Associativity
|
|
* we should determine where to place recursion depending on the associativity
|
|
|
|
|
|
for example: `if-then-else`
|
|
|
|
|
|
**Automatic Disambiguation**
|
|
|
|
We can define precedence to use ambiguous grammars w/o shift-reduce conflicts.
|
|
|
|
|
|
## AST
|
|
|
|
### AST Construction LL
|
|
|
|
```c
|
|
expr parse_S() {
|
|
switch(token) {
|
|
case num:
|
|
case '(':
|
|
expr child1 = parse_E();
|
|
expr child2 = parse_Sp();
|
|
return new S(child1, child2);
|
|
default:
|
|
parseError();
|
|
|
|
}
|
|
}
|
|
|
|
```
|
|
|
|
### AST Construction LR
|
|
|