update notes in midterm2

This commit is contained in:
2025-10-27 17:19:12 +09:00
parent 13b27cc21e
commit 2a7cb64862
2 changed files with 125 additions and 148 deletions

View File

@@ -210,7 +210,7 @@ As there can be conflicts: For a given state(stack + input) there can be multipl
### LR Grammars ### LR Grammars
* LR(k): left-toright scanning, right most derivation and $k$ symbol lookahead * LR(k): left-to-right scanning, right most derivation and $k$ symbol lookahead
* LR(0) Grammar * LR(0) Grammar
LR(0) indicates grammars that can determine actions without any lookahead. LR(0) indicates grammars that can determine actions without any lookahead.
There are **no reduce-reduce and shift-reduce conflicts**, because it should be determined by stacks. There are **no reduce-reduce and shift-reduce conflicts**, because it should be determined by stacks.
@@ -243,10 +243,86 @@ And table consists of four different actions:
* `reduce x -> a`: pop a from the stack and push <`x`, `goto[curr_state, x]`> * `reduce x -> a`: pop a from the stack and push <`x`, `goto[curr_state, x]`>
* accept(`S' -> S$.`) / Error * accept(`S' -> S$.`) / Error
DFA states are converted to index of each rows. Also DFA states are converted to index of each rows.
But There is a limitation when there are multiple options to fill the parsing table, which should be solved with **lookahead**.
### SLR(1) Parsing ### SLR(1) Parsing
A simple extension of LR(0). A simple extension of LR(0).
For each reduction `X -> b`, look at the next symbol `c` and then apply reduction only if `c` is in `Follow(X)`. For each reduction `X -> b`, look at the next symbol `c` and then apply reduction **only if `c` is in `Follow(X)`** which is a lookahead.
### LR(1) Parsing
LR(1) uses lookahead more delicately. For them, it uses a more complex state like `X -> a.b,c`, which means:
1. `a` is already matched at top of the stack
2. next expect to see `b` followed by `c`
Also `X -> a.b,{x1,x2,...,xn}` indicates:
* forall i in `{x1,...,xn}`, `X -> a.b,i`
We extend the $\epsilon$-closure and `goto` operation.
LR(1) closure identification:
* start with `Closure(S) = S`
* foreach item: `[X -> a.Bb,c]` in `S`
* add `{B -> .y,First(bc)}`
* Initalize the state with `[S' -> .S,$]`
LR(1) `goto`:
Given an Item in the state I: `[X -> a.Bb,c]`, `Goto/Shift(I, B) = Closure([X -> aB.b,c])`
LR(1) Parsing Table is same as LR(0) except for **reductions**.
### LALR(1) Parsing
LR(1) has too many states. LALR(1) Parsing.
LR(1) parsing is a **LookAhead LR**.
Construct LR(1) DFA and merges any two LR(1) states whose items have the same production rule, but different lookahead. It reduces the number of parser table entries, but theoretically less powerful than LR(1).
LR(1) generally has the same number of states as SLR(1) but much less than LR(1).
But we will not dive into the details of LALR(1).
### LL/LR Grammars
1. LL Parsing Tables
* Table[NT, T] = Production to apply
* Compute using First, Follow.
2. LR Parsing Tables
* Table[LR State, Term] = shift/reduce/error/accept
* Table[LR State, NT] = goto/err
* Computing using closure and goto operations on LR states
## Automatic Disambiguation
It is highly complex to propose unambiguous grammars: precedence, associativity. By defining precedence, using ambiguous grammars without shift-reduce conflicts: define precedence between terminals on the stack vs. terminals on the input.
## AST Data Structure
LL/LR parsing implicitly build AST.
* LL parsing: AST represented by the productions
* LR parsing: AST represented by the reduction
### AST Construction in LL
```cpp
expr parse_S() {
switch (token) {
case num:
case '(':
expr child1 = parse_E();
expr child2 = parse_S_();
return new S(child1, child2);
default: ParseError();
}
}
```
### AST Construction in LR
Construction mechanism:
* Store parts of the tree on the stack
* foreach nonterminal `X` on the stack, store the sub-tree for `X`
* After reduce operation for a production `X -> a`, create an AST node for `X`

View File

@@ -1,169 +1,70 @@
# Syntax Anlysis 2 # Semantic Analysis
## Bottom-Up Parsing Even after passing the lexical and syntax analysis, there are still erros: correct usage of variables, objects, functions.
**Semantic Analysis** ensures that the program satisfies a set of rules regarding the usage of programming constructs.
Bottom-up Parsing is more efficient than Top-down parsing. There are two main categories of semantic analysis:
it uses LR grammars: Left-recursive and right-most derivation. * Scopes
* Types
Rely on **Shift-reduce Parsers**. ## Scope
example: Lexical scope is textual region in the program.
Scope of an identifier is the lexical scope its declaration refers to
```c ### Symbol Tables
E -> T | T + E
T -> int | int * T | (E)
```
```c Symantic checks refer to properties of identifier in the program; it need an environment to store identifier info: **symbol table**.
// reduce
int * int + int | T -> int
int * T + int | T -> int * T
T + int | T -> int
T + T | E -> T
T + E | E -> T + E
E |
// view as reversed, it seems right-most derivation
```
### Shift-Reudce Parsing In symbol tables each entry contains name of an identifier and additional info.
```c ### Implementing Symbol Tables
E -> T | T + E
T -> int | int * T | (E)
```
| Stack | Input | Applied Production | Five operations:
| ----------- | ----------------- | --------------------- | * `Insert Scope`
| | `int * int + int` | shift | * `Exit Scope`
| `int` | `* int + int` | shift | * `Find Symbol(x)`
| `int *` | `int + int` | shift | * `Add Symbol(x)`
| `int * int` | `+ int` | reduce `T -> int` | * `Check Scope(x)`
| `int * T` | `+ int` | reduce `T -> int * T` |
| `T` | `+ int` | shift | We can build the symbol tables during parsing or after constructing the AST. The symbol tables should be generated before semantic analysis.
| `T +` | `int` | shift |
| `T + int` | | reduce `T -> int` |
| `T + T` | | reduce `E -> T` |
| `T + E` | | reduce `E -> T + E` |
| `E` | | |
**Action Selection Problem** ### Function Declaration and Usage Types
Which action should we take?
when shift, when reduce, which production applied?
#### LR-Style Grammars * Declare the functions before usage(`cminus`)
* Can use functions before declaration(`py`)
* Separate body declaration(`C`)
* LR(k): left-toright scanning, right most derivation and k symbol lookahead ### Scope Analysis
* LR(0) Grammar * Generate Symbol Table and do Scope Analysis
* Simultaneously
LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using **only the symbols in the stack**. ## Types
represent shift-reduce parsing using an **NFA**, whose states are production with separator '`.`' on RHS. * Type Checking: A set of rules which ensures the type consistency of different construct in the program.
for eample, a production `T -> (E)` has four states: `T -> .(E)`, `T -> (.E)`, `T -> (E.)`, `T -> (E.)`. * Type inferencing: fill missing type info.
before `.` means already in stack, next item means expecting item.
plus an additional dummy production `S' -> S$` for a start and end state.
there are two types of transitions between the stats ### Type Checking
- shift transition
- $\epsilon$ transition:
example: Semantic checking to enforce the type safety of the program.
``` There are three types of types.
S -> (L) | id * Statically typed
L -> LS | L,S * Dynamically typed
``` * Untyped
``` Static Type Checking does not require additional type checking instructions at runtime. It guarantees that the executions are safe at compile time. But modern languages require both static and dynamic type checking (union, void pointer).
S' -> S$
S -> (L) | id
L -> S | L,S
```
It can be represented as a NFA: A **type** is a description of a set of values and a set of allowed operations on those values.
**Type expression** is the description the possible types in the program.
**Type System** defines types for language construct like nodes in AST.
```python {cmd matplotlib hide} Language usually have basic types aka primitive types. Using these types to build type expressions.
import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[22].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
plt.imshow(img) ### Type Comparison Implementation
plt.axis('off')
plt.tight_layout()
plt.show()
```
* SLR(1) Parsing
* LR(1) Grammar
```python {cmd matplotlib hide}
import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[47].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()
```
LR(1) Parsing Table
is same as LR(0) parsing table construction except for reductions:
* LALR(1) Grammar
LALR(1) generally has the same number of states as SLR (much less than LR(1))
for Pascal language, SLR requires several hundred states, LR(1) requires several thousand states.
#### Ambiguous Grammar
Ambiguity is mainly from
* Precedence
* The production at higher levels will have operators with lower priorities (and vice versa).
* we can insert non-terminals to enforce precendence.
* Associativity
* we should determine where to place recursion depending on the associativity
for example: `if-then-else`
**Automatic Disambiguation**
We can define precedence to use ambiguous grammars w/o shift-reduce conflicts.
## AST
### AST Construction LL
```c
expr parse_S() {
switch(token) {
case num:
case '(':
expr child1 = parse_E();
expr child2 = parse_Sp();
return new S(child1, child2);
default:
parseError();
}
}
```
### AST Construction LR
There are two options:
1. Implement a method `Equals(T1, T2)`.
It must compare type trees of T1 and T2. For OOP languages also need sub-types