From 2a7cb64862d2645259d794d9846a1764eeda493c Mon Sep 17 00:00:00 2001 From: yenru0 Date: Mon, 27 Oct 2025 17:19:12 +0900 Subject: [PATCH] update notes in midterm2 --- notes/3.md | 82 ++++++++++++++++++++++- notes/4.md | 191 +++++++++++++---------------------------------------- 2 files changed, 125 insertions(+), 148 deletions(-) diff --git a/notes/3.md b/notes/3.md index 255e510..b4f0938 100644 --- a/notes/3.md +++ b/notes/3.md @@ -210,7 +210,7 @@ As there can be conflicts: For a given state(stack + input) there can be multipl ### LR Grammars -* LR(k): left-toright scanning, right most derivation and $k$ symbol lookahead +* LR(k): left-to-right scanning, right most derivation and $k$ symbol lookahead * LR(0) Grammar LR(0) indicates grammars that can determine actions without any lookahead. There are **no reduce-reduce and shift-reduce conflicts**, because it should be determined by stacks. @@ -243,10 +243,86 @@ And table consists of four different actions: * `reduce x -> a`: pop a from the stack and push <`x`, `goto[curr_state, x]`> * accept(`S' -> S$.`) / Error -DFA states are converted to index of each rows. +Also DFA states are converted to index of each rows. + +But There is a limitation when there are multiple options to fill the parsing table, which should be solved with **lookahead**. ### SLR(1) Parsing A simple extension of LR(0). -For each reduction `X -> b`, look at the next symbol `c` and then apply reduction only if `c` is in `Follow(X)`. +For each reduction `X -> b`, look at the next symbol `c` and then apply reduction **only if `c` is in `Follow(X)`** which is a lookahead. + +### LR(1) Parsing + +LR(1) uses lookahead more delicately. For them, it uses a more complex state like `X -> a.b,c`, which means: +1. `a` is already matched at top of the stack +2. next expect to see `b` followed by `c` +Also `X -> a.b,{x1,x2,...,xn}` indicates: +* forall i in `{x1,...,xn}`, `X -> a.b,i` + +We extend the $\epsilon$-closure and `goto` operation. + +LR(1) closure identification: +* start with `Closure(S) = S` +* foreach item: `[X -> a.Bb,c]` in `S` + * add `{B -> .y,First(bc)}` +* Initalize the state with `[S' -> .S,$]` + +LR(1) `goto`: +Given an Item in the state I: `[X -> a.Bb,c]`, `Goto/Shift(I, B) = Closure([X -> aB.b,c])` + +LR(1) Parsing Table is same as LR(0) except for **reductions**. + +### LALR(1) Parsing + +LR(1) has too many states. LALR(1) Parsing. + +LR(1) parsing is a **LookAhead LR**. +Construct LR(1) DFA and merges any two LR(1) states whose items have the same production rule, but different lookahead. It reduces the number of parser table entries, but theoretically less powerful than LR(1). + +LR(1) generally has the same number of states as SLR(1) but much less than LR(1). +But we will not dive into the details of LALR(1). + +### LL/LR Grammars + +1. LL Parsing Tables + * Table[NT, T] = Production to apply + * Compute using First, Follow. +2. LR Parsing Tables + * Table[LR State, Term] = shift/reduce/error/accept + * Table[LR State, NT] = goto/err + * Computing using closure and goto operations on LR states + +## Automatic Disambiguation + +It is highly complex to propose unambiguous grammars: precedence, associativity. By defining precedence, using ambiguous grammars without shift-reduce conflicts: define precedence between terminals on the stack vs. terminals on the input. + + +## AST Data Structure + +LL/LR parsing implicitly build AST. +* LL parsing: AST represented by the productions +* LR parsing: AST represented by the reduction + +### AST Construction in LL + +```cpp +expr parse_S() { + switch (token) { + case num: + case '(': + expr child1 = parse_E(); + expr child2 = parse_S_(); + return new S(child1, child2); + default: ParseError(); + } +} +``` +### AST Construction in LR + +Construction mechanism: +* Store parts of the tree on the stack +* foreach nonterminal `X` on the stack, store the sub-tree for `X` +* After reduce operation for a production `X -> a`, create an AST node for `X` + diff --git a/notes/4.md b/notes/4.md index 7dfd737..60be980 100644 --- a/notes/4.md +++ b/notes/4.md @@ -1,169 +1,70 @@ -# Syntax Anlysis 2 +# Semantic Analysis -## Bottom-Up Parsing +Even after passing the lexical and syntax analysis, there are still erros: correct usage of variables, objects, functions. +**Semantic Analysis** ensures that the program satisfies a set of rules regarding the usage of programming constructs. -Bottom-up Parsing is more efficient than Top-down parsing. -it uses LR grammars: Left-recursive and right-most derivation. +There are two main categories of semantic analysis: +* Scopes +* Types -Rely on **Shift-reduce Parsers**. +## Scope -example: +Lexical scope is textual region in the program. +Scope of an identifier is the lexical scope its declaration refers to -```c -E -> T | T + E -T -> int | int * T | (E) -``` +### Symbol Tables -```c -// reduce -int * int + int | T -> int -int * T + int | T -> int * T -T + int | T -> int -T + T | E -> T -T + E | E -> T + E -E | -// view as reversed, it seems right-most derivation -``` +Symantic checks refer to properties of identifier in the program; it need an environment to store identifier info: **symbol table**. -### Shift-Reudce Parsing +In symbol tables each entry contains name of an identifier and additional info. -```c -E -> T | T + E -T -> int | int * T | (E) -``` +### Implementing Symbol Tables -| Stack | Input | Applied Production | -| ----------- | ----------------- | --------------------- | -| | `int * int + int` | shift | -| `int` | `* int + int` | shift | -| `int *` | `int + int` | shift | -| `int * int` | `+ int` | reduce `T -> int` | -| `int * T` | `+ int` | reduce `T -> int * T` | -| `T` | `+ int` | shift | -| `T +` | `int` | shift | -| `T + int` | | reduce `T -> int` | -| `T + T` | | reduce `E -> T` | -| `T + E` | | reduce `E -> T + E` | -| `E` | | | +Five operations: +* `Insert Scope` +* `Exit Scope` +* `Find Symbol(x)` +* `Add Symbol(x)` +* `Check Scope(x)` + +We can build the symbol tables during parsing or after constructing the AST. The symbol tables should be generated before semantic analysis. -**Action Selection Problem** -Which action should we take? -when shift, when reduce, which production applied? +### Function Declaration and Usage Types -#### LR-Style Grammars +* Declare the functions before usage(`cminus`) +* Can use functions before declaration(`py`) +* Separate body declaration(`C`) -* LR(k): left-toright scanning, right most derivation and k symbol lookahead +### Scope Analysis -* LR(0) Grammar +* Generate Symbol Table and do Scope Analysis +* Simultaneously -LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using **only the symbols in the stack**. +## Types -represent shift-reduce parsing using an **NFA**, whose states are production with separator '`.`' on RHS. -for eample, a production `T -> (E)` has four states: `T -> .(E)`, `T -> (.E)`, `T -> (E.)`, `T -> (E.)`. -before `.` means already in stack, next item means expecting item. -plus an additional dummy production `S' -> S$` for a start and end state. +* Type Checking: A set of rules which ensures the type consistency of different construct in the program. +* Type inferencing: fill missing type info. -there are two types of transitions between the stats -- shift transition -- $\epsilon$ transition: +### Type Checking -example: +Semantic checking to enforce the type safety of the program. -``` -S -> (L) | id -L -> LS | L,S -``` +There are three types of types. +* Statically typed +* Dynamically typed +* Untyped -``` -S' -> S$ -S -> (L) | id -L -> S | L,S -``` +Static Type Checking does not require additional type checking instructions at runtime. It guarantees that the executions are safe at compile time. But modern languages require both static and dynamic type checking (union, void pointer). -It can be represented as a NFA: +A **type** is a description of a set of values and a set of allowed operations on those values. +**Type expression** is the description the possible types in the program. +**Type System** defines types for language construct like nodes in AST. -```python {cmd matplotlib hide} -import sys -import pymupdf -from PIL import Image -doc = pymupdf.open("../pdf/L4.pdf") -pix = doc[22].get_pixmap(dpi=360) -img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) +Language usually have basic types aka primitive types. Using these types to build type expressions. -plt.imshow(img) -plt.axis('off') -plt.tight_layout() -plt.show() -``` - -* SLR(1) Parsing - -* LR(1) Grammar - - -```python {cmd matplotlib hide} -import sys -import pymupdf -from PIL import Image -doc = pymupdf.open("../pdf/L4.pdf") -pix = doc[47].get_pixmap(dpi=360) -img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) - -plt.imshow(img) -plt.axis('off') -plt.tight_layout() -plt.show() -``` - -LR(1) Parsing Table - -is same as LR(0) parsing table construction except for reductions: - - -* LALR(1) Grammar - -LALR(1) generally has the same number of states as SLR (much less than LR(1)) -for Pascal language, SLR requires several hundred states, LR(1) requires several thousand states. - -#### Ambiguous Grammar - -Ambiguity is mainly from - -* Precedence - * The production at higher levels will have operators with lower priorities (and vice versa). - * we can insert non-terminals to enforce precendence. -* Associativity - * we should determine where to place recursion depending on the associativity - - -for example: `if-then-else` - - -**Automatic Disambiguation** - -We can define precedence to use ambiguous grammars w/o shift-reduce conflicts. - - -## AST - -### AST Construction LL - -```c -expr parse_S() { - switch(token) { - case num: - case '(': - expr child1 = parse_E(); - expr child2 = parse_Sp(); - return new S(child1, child2); - default: - parseError(); - - } -} - -``` - -### AST Construction LR +### Type Comparison Implementation +There are two options: +1. Implement a method `Equals(T1, T2)`. +It must compare type trees of T1 and T2. For OOP languages also need sub-types