update notes in midterm2
This commit is contained in:
82
notes/3.md
82
notes/3.md
@@ -210,7 +210,7 @@ As there can be conflicts: For a given state(stack + input) there can be multipl
|
|||||||
|
|
||||||
### LR Grammars
|
### LR Grammars
|
||||||
|
|
||||||
* LR(k): left-toright scanning, right most derivation and $k$ symbol lookahead
|
* LR(k): left-to-right scanning, right most derivation and $k$ symbol lookahead
|
||||||
* LR(0) Grammar
|
* LR(0) Grammar
|
||||||
LR(0) indicates grammars that can determine actions without any lookahead.
|
LR(0) indicates grammars that can determine actions without any lookahead.
|
||||||
There are **no reduce-reduce and shift-reduce conflicts**, because it should be determined by stacks.
|
There are **no reduce-reduce and shift-reduce conflicts**, because it should be determined by stacks.
|
||||||
@@ -243,10 +243,86 @@ And table consists of four different actions:
|
|||||||
* `reduce x -> a`: pop a from the stack and push <`x`, `goto[curr_state, x]`>
|
* `reduce x -> a`: pop a from the stack and push <`x`, `goto[curr_state, x]`>
|
||||||
* accept(`S' -> S$.`) / Error
|
* accept(`S' -> S$.`) / Error
|
||||||
|
|
||||||
DFA states are converted to index of each rows.
|
Also DFA states are converted to index of each rows.
|
||||||
|
|
||||||
|
But There is a limitation when there are multiple options to fill the parsing table, which should be solved with **lookahead**.
|
||||||
|
|
||||||
### SLR(1) Parsing
|
### SLR(1) Parsing
|
||||||
|
|
||||||
A simple extension of LR(0).
|
A simple extension of LR(0).
|
||||||
|
|
||||||
For each reduction `X -> b`, look at the next symbol `c` and then apply reduction only if `c` is in `Follow(X)`.
|
For each reduction `X -> b`, look at the next symbol `c` and then apply reduction **only if `c` is in `Follow(X)`** which is a lookahead.
|
||||||
|
|
||||||
|
### LR(1) Parsing
|
||||||
|
|
||||||
|
LR(1) uses lookahead more delicately. For them, it uses a more complex state like `X -> a.b,c`, which means:
|
||||||
|
1. `a` is already matched at top of the stack
|
||||||
|
2. next expect to see `b` followed by `c`
|
||||||
|
Also `X -> a.b,{x1,x2,...,xn}` indicates:
|
||||||
|
* forall i in `{x1,...,xn}`, `X -> a.b,i`
|
||||||
|
|
||||||
|
We extend the $\epsilon$-closure and `goto` operation.
|
||||||
|
|
||||||
|
LR(1) closure identification:
|
||||||
|
* start with `Closure(S) = S`
|
||||||
|
* foreach item: `[X -> a.Bb,c]` in `S`
|
||||||
|
* add `{B -> .y,First(bc)}`
|
||||||
|
* Initalize the state with `[S' -> .S,$]`
|
||||||
|
|
||||||
|
LR(1) `goto`:
|
||||||
|
Given an Item in the state I: `[X -> a.Bb,c]`, `Goto/Shift(I, B) = Closure([X -> aB.b,c])`
|
||||||
|
|
||||||
|
LR(1) Parsing Table is same as LR(0) except for **reductions**.
|
||||||
|
|
||||||
|
### LALR(1) Parsing
|
||||||
|
|
||||||
|
LR(1) has too many states. LALR(1) Parsing.
|
||||||
|
|
||||||
|
LR(1) parsing is a **LookAhead LR**.
|
||||||
|
Construct LR(1) DFA and merges any two LR(1) states whose items have the same production rule, but different lookahead. It reduces the number of parser table entries, but theoretically less powerful than LR(1).
|
||||||
|
|
||||||
|
LR(1) generally has the same number of states as SLR(1) but much less than LR(1).
|
||||||
|
But we will not dive into the details of LALR(1).
|
||||||
|
|
||||||
|
### LL/LR Grammars
|
||||||
|
|
||||||
|
1. LL Parsing Tables
|
||||||
|
* Table[NT, T] = Production to apply
|
||||||
|
* Compute using First, Follow.
|
||||||
|
2. LR Parsing Tables
|
||||||
|
* Table[LR State, Term] = shift/reduce/error/accept
|
||||||
|
* Table[LR State, NT] = goto/err
|
||||||
|
* Computing using closure and goto operations on LR states
|
||||||
|
|
||||||
|
## Automatic Disambiguation
|
||||||
|
|
||||||
|
It is highly complex to propose unambiguous grammars: precedence, associativity. By defining precedence, using ambiguous grammars without shift-reduce conflicts: define precedence between terminals on the stack vs. terminals on the input.
|
||||||
|
|
||||||
|
|
||||||
|
## AST Data Structure
|
||||||
|
|
||||||
|
LL/LR parsing implicitly build AST.
|
||||||
|
* LL parsing: AST represented by the productions
|
||||||
|
* LR parsing: AST represented by the reduction
|
||||||
|
|
||||||
|
### AST Construction in LL
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
expr parse_S() {
|
||||||
|
switch (token) {
|
||||||
|
case num:
|
||||||
|
case '(':
|
||||||
|
expr child1 = parse_E();
|
||||||
|
expr child2 = parse_S_();
|
||||||
|
return new S(child1, child2);
|
||||||
|
default: ParseError();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
### AST Construction in LR
|
||||||
|
|
||||||
|
Construction mechanism:
|
||||||
|
* Store parts of the tree on the stack
|
||||||
|
* foreach nonterminal `X` on the stack, store the sub-tree for `X`
|
||||||
|
* After reduce operation for a production `X -> a`, create an AST node for `X`
|
||||||
|
|
||||||
|
|||||||
191
notes/4.md
191
notes/4.md
@@ -1,169 +1,70 @@
|
|||||||
# Syntax Anlysis 2
|
# Semantic Analysis
|
||||||
|
|
||||||
## Bottom-Up Parsing
|
Even after passing the lexical and syntax analysis, there are still erros: correct usage of variables, objects, functions.
|
||||||
|
**Semantic Analysis** ensures that the program satisfies a set of rules regarding the usage of programming constructs.
|
||||||
|
|
||||||
Bottom-up Parsing is more efficient than Top-down parsing.
|
There are two main categories of semantic analysis:
|
||||||
it uses LR grammars: Left-recursive and right-most derivation.
|
* Scopes
|
||||||
|
* Types
|
||||||
|
|
||||||
Rely on **Shift-reduce Parsers**.
|
## Scope
|
||||||
|
|
||||||
example:
|
Lexical scope is textual region in the program.
|
||||||
|
Scope of an identifier is the lexical scope its declaration refers to
|
||||||
|
|
||||||
```c
|
### Symbol Tables
|
||||||
E -> T | T + E
|
|
||||||
T -> int | int * T | (E)
|
|
||||||
```
|
|
||||||
|
|
||||||
```c
|
Symantic checks refer to properties of identifier in the program; it need an environment to store identifier info: **symbol table**.
|
||||||
// reduce
|
|
||||||
int * int + int | T -> int
|
|
||||||
int * T + int | T -> int * T
|
|
||||||
T + int | T -> int
|
|
||||||
T + T | E -> T
|
|
||||||
T + E | E -> T + E
|
|
||||||
E |
|
|
||||||
// view as reversed, it seems right-most derivation
|
|
||||||
```
|
|
||||||
|
|
||||||
### Shift-Reudce Parsing
|
In symbol tables each entry contains name of an identifier and additional info.
|
||||||
|
|
||||||
```c
|
### Implementing Symbol Tables
|
||||||
E -> T | T + E
|
|
||||||
T -> int | int * T | (E)
|
|
||||||
```
|
|
||||||
|
|
||||||
| Stack | Input | Applied Production |
|
Five operations:
|
||||||
| ----------- | ----------------- | --------------------- |
|
* `Insert Scope`
|
||||||
| | `int * int + int` | shift |
|
* `Exit Scope`
|
||||||
| `int` | `* int + int` | shift |
|
* `Find Symbol(x)`
|
||||||
| `int *` | `int + int` | shift |
|
* `Add Symbol(x)`
|
||||||
| `int * int` | `+ int` | reduce `T -> int` |
|
* `Check Scope(x)`
|
||||||
| `int * T` | `+ int` | reduce `T -> int * T` |
|
|
||||||
| `T` | `+ int` | shift |
|
We can build the symbol tables during parsing or after constructing the AST. The symbol tables should be generated before semantic analysis.
|
||||||
| `T +` | `int` | shift |
|
|
||||||
| `T + int` | | reduce `T -> int` |
|
|
||||||
| `T + T` | | reduce `E -> T` |
|
|
||||||
| `T + E` | | reduce `E -> T + E` |
|
|
||||||
| `E` | | |
|
|
||||||
|
|
||||||
|
|
||||||
**Action Selection Problem**
|
### Function Declaration and Usage Types
|
||||||
Which action should we take?
|
|
||||||
when shift, when reduce, which production applied?
|
|
||||||
|
|
||||||
#### LR-Style Grammars
|
* Declare the functions before usage(`cminus`)
|
||||||
|
* Can use functions before declaration(`py`)
|
||||||
|
* Separate body declaration(`C`)
|
||||||
|
|
||||||
* LR(k): left-toright scanning, right most derivation and k symbol lookahead
|
### Scope Analysis
|
||||||
|
|
||||||
* LR(0) Grammar
|
* Generate Symbol Table and do Scope Analysis
|
||||||
|
* Simultaneously
|
||||||
|
|
||||||
LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using **only the symbols in the stack**.
|
## Types
|
||||||
|
|
||||||
represent shift-reduce parsing using an **NFA**, whose states are production with separator '`.`' on RHS.
|
* Type Checking: A set of rules which ensures the type consistency of different construct in the program.
|
||||||
for eample, a production `T -> (E)` has four states: `T -> .(E)`, `T -> (.E)`, `T -> (E.)`, `T -> (E.)`.
|
* Type inferencing: fill missing type info.
|
||||||
before `.` means already in stack, next item means expecting item.
|
|
||||||
plus an additional dummy production `S' -> S$` for a start and end state.
|
|
||||||
|
|
||||||
there are two types of transitions between the stats
|
### Type Checking
|
||||||
- shift transition
|
|
||||||
- $\epsilon$ transition:
|
|
||||||
|
|
||||||
example:
|
Semantic checking to enforce the type safety of the program.
|
||||||
|
|
||||||
```
|
There are three types of types.
|
||||||
S -> (L) | id
|
* Statically typed
|
||||||
L -> LS | L,S
|
* Dynamically typed
|
||||||
```
|
* Untyped
|
||||||
|
|
||||||
```
|
Static Type Checking does not require additional type checking instructions at runtime. It guarantees that the executions are safe at compile time. But modern languages require both static and dynamic type checking (union, void pointer).
|
||||||
S' -> S$
|
|
||||||
S -> (L) | id
|
|
||||||
L -> S | L,S
|
|
||||||
```
|
|
||||||
|
|
||||||
It can be represented as a NFA:
|
A **type** is a description of a set of values and a set of allowed operations on those values.
|
||||||
|
**Type expression** is the description the possible types in the program.
|
||||||
|
**Type System** defines types for language construct like nodes in AST.
|
||||||
|
|
||||||
```python {cmd matplotlib hide}
|
Language usually have basic types aka primitive types. Using these types to build type expressions.
|
||||||
import sys
|
|
||||||
import pymupdf
|
|
||||||
from PIL import Image
|
|
||||||
doc = pymupdf.open("../pdf/L4.pdf")
|
|
||||||
pix = doc[22].get_pixmap(dpi=360)
|
|
||||||
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
|
|
||||||
|
|
||||||
plt.imshow(img)
|
### Type Comparison Implementation
|
||||||
plt.axis('off')
|
|
||||||
plt.tight_layout()
|
|
||||||
plt.show()
|
|
||||||
```
|
|
||||||
|
|
||||||
* SLR(1) Parsing
|
|
||||||
|
|
||||||
* LR(1) Grammar
|
|
||||||
|
|
||||||
|
|
||||||
```python {cmd matplotlib hide}
|
|
||||||
import sys
|
|
||||||
import pymupdf
|
|
||||||
from PIL import Image
|
|
||||||
doc = pymupdf.open("../pdf/L4.pdf")
|
|
||||||
pix = doc[47].get_pixmap(dpi=360)
|
|
||||||
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
|
|
||||||
|
|
||||||
plt.imshow(img)
|
|
||||||
plt.axis('off')
|
|
||||||
plt.tight_layout()
|
|
||||||
plt.show()
|
|
||||||
```
|
|
||||||
|
|
||||||
LR(1) Parsing Table
|
|
||||||
|
|
||||||
is same as LR(0) parsing table construction except for reductions:
|
|
||||||
|
|
||||||
|
|
||||||
* LALR(1) Grammar
|
|
||||||
|
|
||||||
LALR(1) generally has the same number of states as SLR (much less than LR(1))
|
|
||||||
for Pascal language, SLR requires several hundred states, LR(1) requires several thousand states.
|
|
||||||
|
|
||||||
#### Ambiguous Grammar
|
|
||||||
|
|
||||||
Ambiguity is mainly from
|
|
||||||
|
|
||||||
* Precedence
|
|
||||||
* The production at higher levels will have operators with lower priorities (and vice versa).
|
|
||||||
* we can insert non-terminals to enforce precendence.
|
|
||||||
* Associativity
|
|
||||||
* we should determine where to place recursion depending on the associativity
|
|
||||||
|
|
||||||
|
|
||||||
for example: `if-then-else`
|
|
||||||
|
|
||||||
|
|
||||||
**Automatic Disambiguation**
|
|
||||||
|
|
||||||
We can define precedence to use ambiguous grammars w/o shift-reduce conflicts.
|
|
||||||
|
|
||||||
|
|
||||||
## AST
|
|
||||||
|
|
||||||
### AST Construction LL
|
|
||||||
|
|
||||||
```c
|
|
||||||
expr parse_S() {
|
|
||||||
switch(token) {
|
|
||||||
case num:
|
|
||||||
case '(':
|
|
||||||
expr child1 = parse_E();
|
|
||||||
expr child2 = parse_Sp();
|
|
||||||
return new S(child1, child2);
|
|
||||||
default:
|
|
||||||
parseError();
|
|
||||||
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
### AST Construction LR
|
|
||||||
|
|
||||||
|
There are two options:
|
||||||
|
1. Implement a method `Equals(T1, T2)`.
|
||||||
|
It must compare type trees of T1 and T2. For OOP languages also need sub-types
|
||||||
|
|||||||
Reference in New Issue
Block a user