update notes in midterm2

This commit is contained in:
2025-10-27 17:19:12 +09:00
parent 13b27cc21e
commit 2a7cb64862
2 changed files with 125 additions and 148 deletions

View File

@@ -1,169 +1,70 @@
# Syntax Anlysis 2
# Semantic Analysis
## Bottom-Up Parsing
Even after passing the lexical and syntax analysis, there are still erros: correct usage of variables, objects, functions.
**Semantic Analysis** ensures that the program satisfies a set of rules regarding the usage of programming constructs.
Bottom-up Parsing is more efficient than Top-down parsing.
it uses LR grammars: Left-recursive and right-most derivation.
There are two main categories of semantic analysis:
* Scopes
* Types
Rely on **Shift-reduce Parsers**.
## Scope
example:
Lexical scope is textual region in the program.
Scope of an identifier is the lexical scope its declaration refers to
```c
E -> T | T + E
T -> int | int * T | (E)
```
### Symbol Tables
```c
// reduce
int * int + int | T -> int
int * T + int | T -> int * T
T + int | T -> int
T + T | E -> T
T + E | E -> T + E
E |
// view as reversed, it seems right-most derivation
```
Symantic checks refer to properties of identifier in the program; it need an environment to store identifier info: **symbol table**.
### Shift-Reudce Parsing
In symbol tables each entry contains name of an identifier and additional info.
```c
E -> T | T + E
T -> int | int * T | (E)
```
### Implementing Symbol Tables
| Stack | Input | Applied Production |
| ----------- | ----------------- | --------------------- |
| | `int * int + int` | shift |
| `int` | `* int + int` | shift |
| `int *` | `int + int` | shift |
| `int * int` | `+ int` | reduce `T -> int` |
| `int * T` | `+ int` | reduce `T -> int * T` |
| `T` | `+ int` | shift |
| `T +` | `int` | shift |
| `T + int` | | reduce `T -> int` |
| `T + T` | | reduce `E -> T` |
| `T + E` | | reduce `E -> T + E` |
| `E` | | |
Five operations:
* `Insert Scope`
* `Exit Scope`
* `Find Symbol(x)`
* `Add Symbol(x)`
* `Check Scope(x)`
We can build the symbol tables during parsing or after constructing the AST. The symbol tables should be generated before semantic analysis.
**Action Selection Problem**
Which action should we take?
when shift, when reduce, which production applied?
### Function Declaration and Usage Types
#### LR-Style Grammars
* Declare the functions before usage(`cminus`)
* Can use functions before declaration(`py`)
* Separate body declaration(`C`)
* LR(k): left-toright scanning, right most derivation and k symbol lookahead
### Scope Analysis
* LR(0) Grammar
* Generate Symbol Table and do Scope Analysis
* Simultaneously
LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using **only the symbols in the stack**.
## Types
represent shift-reduce parsing using an **NFA**, whose states are production with separator '`.`' on RHS.
for eample, a production `T -> (E)` has four states: `T -> .(E)`, `T -> (.E)`, `T -> (E.)`, `T -> (E.)`.
before `.` means already in stack, next item means expecting item.
plus an additional dummy production `S' -> S$` for a start and end state.
* Type Checking: A set of rules which ensures the type consistency of different construct in the program.
* Type inferencing: fill missing type info.
there are two types of transitions between the stats
- shift transition
- $\epsilon$ transition:
### Type Checking
example:
Semantic checking to enforce the type safety of the program.
```
S -> (L) | id
L -> LS | L,S
```
There are three types of types.
* Statically typed
* Dynamically typed
* Untyped
```
S' -> S$
S -> (L) | id
L -> S | L,S
```
Static Type Checking does not require additional type checking instructions at runtime. It guarantees that the executions are safe at compile time. But modern languages require both static and dynamic type checking (union, void pointer).
It can be represented as a NFA:
A **type** is a description of a set of values and a set of allowed operations on those values.
**Type expression** is the description the possible types in the program.
**Type System** defines types for language construct like nodes in AST.
```python {cmd matplotlib hide}
import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[22].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
Language usually have basic types aka primitive types. Using these types to build type expressions.
plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()
```
* SLR(1) Parsing
* LR(1) Grammar
```python {cmd matplotlib hide}
import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[47].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()
```
LR(1) Parsing Table
is same as LR(0) parsing table construction except for reductions:
* LALR(1) Grammar
LALR(1) generally has the same number of states as SLR (much less than LR(1))
for Pascal language, SLR requires several hundred states, LR(1) requires several thousand states.
#### Ambiguous Grammar
Ambiguity is mainly from
* Precedence
* The production at higher levels will have operators with lower priorities (and vice versa).
* we can insert non-terminals to enforce precendence.
* Associativity
* we should determine where to place recursion depending on the associativity
for example: `if-then-else`
**Automatic Disambiguation**
We can define precedence to use ambiguous grammars w/o shift-reduce conflicts.
## AST
### AST Construction LL
```c
expr parse_S() {
switch(token) {
case num:
case '(':
expr child1 = parse_E();
expr child2 = parse_Sp();
return new S(child1, child2);
default:
parseError();
}
}
```
### AST Construction LR
### Type Comparison Implementation
There are two options:
1. Implement a method `Equals(T1, T2)`.
It must compare type trees of T1 and T2. For OOP languages also need sub-types