# Study Guide: Bayesian Networks & Probabilistic Inference

**Date:** 2025.12.01 (Final Lecture)
**Topic:** Bayesian Networks, Probabilistic Inference Examples, Marginalization.

---

### **1. Recap: Directed vs. Undirected Models**
The lecture begins by briefly contrasting the two types of graphical models discussed:
* **Undirected Graphs (MRF):** Use potential functions ($\psi$) defined on maximal cliques. Requires a normalization constant (partition function $Z$) to become a probability distribution.
* **Directed Graphs (Bayesian Networks):** Use conditional probability distributions (CPDs). The joint distribution is the product of local conditional probabilities.
    $$P(X) = \prod_{i} P(x_i | \text{parents}(x_i))$$

---

### **2. Example 1: The "Alarm" Network (Burglary/Earthquake)**
This is a classic example used to demonstrate inference in Bayesian Networks.

#### **Scenario & Structure**
* **Nodes:**
    * **B:** Burglary (Parent, no prior causes).
    * **E:** Earthquake (Parent, no prior causes).
    * **A:** Alarm (Triggered by Burglary or Earthquake).
    * **J:** JohnCalls (Triggered by Alarm).
    * **M:** MaryCalls (Triggered by Alarm).
* **Dependencies:** $B \rightarrow A \leftarrow E$, $A \rightarrow J$, $A \rightarrow M$.
* **Probabilities (Given):**
    * $P(B) = 0.05$, $P(E) = 0.1$.
    * $P(A|B, E)$: Table given (e.g., $P(A|B, \neg E) = 0.85$, $P(A|\neg B, \neg E) = 0.05$, etc.).
    * $P(J|A) = 0.7$, $P(M|A) = 0.8$.

#### **Task 1: Calculate a Specific Joint Probability**
Calculate the probability of the event: **Burglary, No Earthquake, Alarm rings, John calls, Mary does not call**.
$$P(B, \neg E, A, J, \neg M)$$

* **Decomposition:** Apply the Chain Rule based on the graph structure.
    $$= P(B) \cdot P(\neg E) \cdot P(A | B, \neg E) \cdot P(J | A) \cdot P(\neg M | A)$$
* **Calculation:**
    $$= 0.05 \times 0.9 \times 0.85 \times 0.7 \times 0.2$$

#### **Task 2: Inference (Conditional Probability)**
Calculate the probability that a **Burglary occurred**, given that **John called** and **Mary did not call**.
$$P(B | J, \neg M)$$

* **Formula (Bayes Rule):**
    $$P(B | J, \neg M) = \frac{P(B, J, \neg M)}{P(J, \neg M)}$$

* **Numerator Calculation ($P(B, J, \neg M)$):**
    We must **marginalize out** the unknown variables ($A$ and $E$) from the joint distribution.
    $$P(B, J, \neg M) = \sum_{A \in \{T,F\}} \sum_{E \in \{T,F\}} P(B, E, A, J, \neg M)$$
    This involves summing 4 terms (combinations of A and E).

* **Denominator Calculation ($P(J, \neg M)$):**
    We further marginalize out $B$ from the numerator result.
    $$P(J, \neg M) = P(B, J, \neg M) + P(\neg B, J, \neg M)$$

---

### **3. Example 2: 4-Node Tree Structure**
A simpler example to demonstrate how sums simplify during marginalization.

#### **Scenario & Structure**
* **Nodes:** $X_1, X_2, X_3, X_4 \in \{0, 1\}$ (Binary).
* **Dependencies:**
    * $X_1 \rightarrow X_2$
    * $X_2 \rightarrow X_3$
    * $X_2 \rightarrow X_4$
* **Decomposition:** $P(X) = P(X_1)P(X_2|X_1)P(X_3|X_2)P(X_4|X_2)$.
* **Given Tables:** Probabilities for all priors and conditionals are provided.

#### **Task: Calculate Marginal Probability $P(X_3 = 1)$**
We need to find the probability of $X_3=1$ regardless of the other variables.

* **Definition:** Sum the joint probability over all other variables ($X_1, X_2, X_4$).
    $$P(X_3=1) = \sum_{x_1} \sum_{x_2} \sum_{x_4} P(x_1, x_2, x_3=1, x_4)$$

* **Step 1: Expand using Graph Structure**
    $$= \sum_{x_1} \sum_{x_2} \sum_{x_4} P(x_1)P(x_2|x_1)P(X_3=1|x_2)P(x_4|x_2)$$

* **Step 2: Simplify (Key Insight)**
    Move the summation signs to push them as far right as possible. The sum over $x_4$ only affects the last term $P(x_4|x_2)$.
    $$= \sum_{x_1} \sum_{x_2} P(x_1)P(x_2|x_1)P(X_3=1|x_2) \left[ \sum_{x_4} P(x_4|x_2) \right]$$
   
    * **Property:** $\sum_{x_4} P(x_4|x_2) = 1$ (Sum of probabilities for a variable given a condition is always 1).
    * Therefore, the $X_4$ term vanishes. This makes sense intuitively: $X_4$ is a "leaf" node distinct from $X_3$; knowing nothing about it doesn't change $X_3$'s probability if $X_2$ is handled.

* **Step 3: Final Calculation**
    We are left with summing over $X_1$ and $X_2$:
    $$= \sum_{x_1} \sum_{x_2} P(x_1)P(x_2|x_1)P(X_3=1|x_2)$$
    This expands to 4 terms (combinations of $x_1 \in \{0,1\}$ and $x_2 \in \{0,1\}$).

---

### **4. Semester Summary & Conclusion**
The lecture concludes the semester's material.

* **Key Themes Covered:**
    * **Discriminative vs. Generative Methods:** The fundamental difference in approach (boundary vs. distribution).
    * **Objective Functions:** Designing Loss functions vs. Likelihood functions.
    * **Optimization:** Parameter estimation via derivatives (MLE).
    * **Graphical Models:** Reducing parameter complexity using independence assumptions (Bayes Nets, MRFs).
* **Final Exam:** Scheduled for Thursday, December 11th. It will cover the concepts discussed, focusing on understanding the fundamentals (e.g., Likelihood, Generative principles) rather than rote memorization.