diff --git a/.gitattributes b/.gitattributes deleted file mode 100644 index 66271d0..0000000 --- a/.gitattributes +++ /dev/null @@ -1,3 +0,0 @@ -[attr]lfs-file filter=lfs diff=lfs merge=lfs -text - -*.pdf lfs-file \ No newline at end of file diff --git a/notes/3.md b/notes/3.md index b9a43a0..bb27d84 100644 --- a/notes/3.md +++ b/notes/3.md @@ -34,8 +34,22 @@ One of the purposes of the compiler is error handling. ## Parsing -### Top-down Parsing +* Top-down Parsing **Recursive Descent Parsing** -### Predictive Parsing \ No newline at end of file +by using backtracking + +* Predictive Parsing + + +Parsing Table: no need to backtrack. + + + +### Parser Implement + + + + + diff --git a/notes/4.md b/notes/4.md new file mode 100644 index 0000000..016b237 --- /dev/null +++ b/notes/4.md @@ -0,0 +1,104 @@ +# Syntax Anlysis 2 + +## Bottom-Up Parsing + +Bottom-up Parsing is more efficient than Top-down parsing. +it uses LR grammars: Left-recursive and right-most derivation. + +Rely on **Shift-reduce Parsers**. + +example: + +```c +E -> T | T + E +T -> int | int * T | (E) +``` + +```c +// reduce +int * int + int | T -> int +int * T + int | T -> int * T +T + int | T -> int +T + T | E -> T +T + E | E -> T + E +E | +// view as reversed, it seems right-most derivation +``` + +### Shift-Reudce Parsing + +```c +E -> T | T + E +T -> int | int * T | (E) +``` + +| Stack | Input | Applied Production | +| ----------- | ----------------- | --------------------- | +| | `int * int + int` | shift | +| `int` | `* int + int` | shift | +| `int *` | `int + int` | shift | +| `int * int` | `+ int` | reduce `T -> int` | +| `int * T` | `+ int` | reduce `T -> int * T` | +| `T` | `+ int` | shift | +| `T +` | `int` | shift | +| `T + int` | | reduce `T -> int` | +| `T + T` | | reduce `E -> T` | +| `T + E` | | reduce `E -> T + E` | +| `E` | | | + + +**Action Selection Problem** +Which action should we take? +when shift, when reduce, which production applied? + +#### LR-Style Grammars + +* LR(k): left-toright scanning, right most derivation and k symbol lookahead + +* LR(0) Grammar + +LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using **only the symbols in the stack**. + +represent shift-reduce parsing using an **NFA**, whose states are production with separator '`.`' on RHS. +for eample, a production `T -> (E)` has four states: `T -> .(E)`, `T -> (.E)`, `T -> (E.)`, `T -> (E.)`. +before `.` means already in stack, next item means expecting item. +plus an additional dummy production `S' -> S$` for a start and end state. + +there are two types of transitions between the stats +- shift transition +- $\epsilon$ transition: + +example: + +``` +S -> (L) | id +L -> LS | L,S +``` + +``` +S' -> S$ +S -> (L) | id +L -> S | L,S +``` + +It can be represented as a NFA: + +```python {cmd matplotlib} +import sys +import pymupdf +from PIL import Image +doc = pymupdf.open("../pdf/L4.pdf") +pix = doc[22].get_pixmap(dpi=500) +img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) + +plt.imshow(img) +plt.axis('off') +plt.tight_layout() +plt.show() +``` + + + + + +* LR(1) Grammar \ No newline at end of file diff --git a/pdf/L4.pdf b/pdf/L4.pdf new file mode 100644 index 0000000..d628d6a Binary files /dev/null and b/pdf/L4.pdf differ