# Study Guide: Bayesian Networks & Probabilistic Inference **Date:** 2025.12.01 (Final Lecture) **Topic:** Bayesian Networks, Probabilistic Inference Examples, Marginalization. --- ### **1. Recap: Directed vs. Undirected Models** The lecture begins by briefly contrasting the two types of graphical models discussed: * **Undirected Graphs (MRF):** Use potential functions ($\psi$) defined on maximal cliques. Requires a normalization constant (partition function $Z$) to become a probability distribution. * **Directed Graphs (Bayesian Networks):** Use conditional probability distributions (CPDs). The joint distribution is the product of local conditional probabilities. $$P(X) = \prod_{i} P(x_i | \text{parents}(x_i))$$ --- ### **2. Example 1: The "Alarm" Network (Burglary/Earthquake)** This is a classic example used to demonstrate inference in Bayesian Networks. #### **Scenario & Structure** * **Nodes:** * **B:** Burglary (Parent, no prior causes). * **E:** Earthquake (Parent, no prior causes). * **A:** Alarm (Triggered by Burglary or Earthquake). * **J:** JohnCalls (Triggered by Alarm). * **M:** MaryCalls (Triggered by Alarm). * **Dependencies:** $B \rightarrow A \leftarrow E$, $A \rightarrow J$, $A \rightarrow M$. * **Probabilities (Given):** * $P(B) = 0.05$, $P(E) = 0.1$. * $P(A|B, E)$: Table given (e.g., $P(A|B, \neg E) = 0.85$, $P(A|\neg B, \neg E) = 0.05$, etc.). * $P(J|A) = 0.7$, $P(M|A) = 0.8$. #### **Task 1: Calculate a Specific Joint Probability** Calculate the probability of the event: **Burglary, No Earthquake, Alarm rings, John calls, Mary does not call**. $$P(B, \neg E, A, J, \neg M)$$ * **Decomposition:** Apply the Chain Rule based on the graph structure. $$= P(B) \cdot P(\neg E) \cdot P(A | B, \neg E) \cdot P(J | A) \cdot P(\neg M | A)$$ * **Calculation:** $$= 0.05 \times 0.9 \times 0.85 \times 0.7 \times 0.2$$ #### **Task 2: Inference (Conditional Probability)** Calculate the probability that a **Burglary occurred**, given that **John called** and **Mary did not call**. $$P(B | J, \neg M)$$ * **Formula (Bayes Rule):** $$P(B | J, \neg M) = \frac{P(B, J, \neg M)}{P(J, \neg M)}$$ * **Numerator Calculation ($P(B, J, \neg M)$):** We must **marginalize out** the unknown variables ($A$ and $E$) from the joint distribution. $$P(B, J, \neg M) = \sum_{A \in \{T,F\}} \sum_{E \in \{T,F\}} P(B, E, A, J, \neg M)$$ This involves summing 4 terms (combinations of A and E). * **Denominator Calculation ($P(J, \neg M)$):** We further marginalize out $B$ from the numerator result. $$P(J, \neg M) = P(B, J, \neg M) + P(\neg B, J, \neg M)$$ --- ### **3. Example 2: 4-Node Tree Structure** A simpler example to demonstrate how sums simplify during marginalization. #### **Scenario & Structure** * **Nodes:** $X_1, X_2, X_3, X_4 \in \{0, 1\}$ (Binary). * **Dependencies:** * $X_1 \rightarrow X_2$ * $X_2 \rightarrow X_3$ * $X_2 \rightarrow X_4$ * **Decomposition:** $P(X) = P(X_1)P(X_2|X_1)P(X_3|X_2)P(X_4|X_2)$. * **Given Tables:** Probabilities for all priors and conditionals are provided. #### **Task: Calculate Marginal Probability $P(X_3 = 1)$** We need to find the probability of $X_3=1$ regardless of the other variables. * **Definition:** Sum the joint probability over all other variables ($X_1, X_2, X_4$). $$P(X_3=1) = \sum_{x_1} \sum_{x_2} \sum_{x_4} P(x_1, x_2, x_3=1, x_4)$$ * **Step 1: Expand using Graph Structure** $$= \sum_{x_1} \sum_{x_2} \sum_{x_4} P(x_1)P(x_2|x_1)P(X_3=1|x_2)P(x_4|x_2)$$ * **Step 2: Simplify (Key Insight)** Move the summation signs to push them as far right as possible. The sum over $x_4$ only affects the last term $P(x_4|x_2)$. $$= \sum_{x_1} \sum_{x_2} P(x_1)P(x_2|x_1)P(X_3=1|x_2) \left[ \sum_{x_4} P(x_4|x_2) \right]$$ * **Property:** $\sum_{x_4} P(x_4|x_2) = 1$ (Sum of probabilities for a variable given a condition is always 1). * Therefore, the $X_4$ term vanishes. This makes sense intuitively: $X_4$ is a "leaf" node distinct from $X_3$; knowing nothing about it doesn't change $X_3$'s probability if $X_2$ is handled. * **Step 3: Final Calculation** We are left with summing over $X_1$ and $X_2$: $$= \sum_{x_1} \sum_{x_2} P(x_1)P(x_2|x_1)P(X_3=1|x_2)$$ This expands to 4 terms (combinations of $x_1 \in \{0,1\}$ and $x_2 \in \{0,1\}$). --- ### **4. Semester Summary & Conclusion** The lecture concludes the semester's material. * **Key Themes Covered:** * **Discriminative vs. Generative Methods:** The fundamental difference in approach (boundary vs. distribution). * **Objective Functions:** Designing Loss functions vs. Likelihood functions. * **Optimization:** Parameter estimation via derivatives (MLE). * **Graphical Models:** Reducing parameter complexity using independence assumptions (Bayes Nets, MRFs). * **Final Exam:** Scheduled for Thursday, December 11th. It will cover the concepts discussed, focusing on understanding the fundamentals (e.g., Likelihood, Generative principles) rather than rote memorization.