5.0 KiB
Study Guide: Bayesian Networks & Probabilistic Inference
Date: 2025.12.01 (Final Lecture) Topic: Bayesian Networks, Probabilistic Inference Examples, Marginalization.
1. Recap: Directed vs. Undirected Models
The lecture begins by briefly contrasting the two types of graphical models discussed:
- Undirected Graphs (MRF): Use potential functions (
\psi) defined on maximal cliques. Requires a normalization constant (partition functionZ) to become a probability distribution. - Directed Graphs (Bayesian Networks): Use conditional probability distributions (CPDs). The joint distribution is the product of local conditional probabilities.
P(X) = \prod_{i} P(x_i | \text{parents}(x_i))
2. Example 1: The "Alarm" Network (Burglary/Earthquake)
This is a classic example used to demonstrate inference in Bayesian Networks.
Scenario & Structure
- Nodes:
- B: Burglary (Parent, no prior causes).
- E: Earthquake (Parent, no prior causes).
- A: Alarm (Triggered by Burglary or Earthquake).
- J: JohnCalls (Triggered by Alarm).
- M: MaryCalls (Triggered by Alarm).
- Dependencies:
B \rightarrow A \leftarrow E,A \rightarrow J,A \rightarrow M. - Probabilities (Given):
P(B) = 0.05,P(E) = 0.1.P(A|B, E): Table given (e.g.,P(A|B, \neg E) = 0.85,P(A|\neg B, \neg E) = 0.05, etc.).P(J|A) = 0.7,P(M|A) = 0.8.
Task 1: Calculate a Specific Joint Probability
Calculate the probability of the event: Burglary, No Earthquake, Alarm rings, John calls, Mary does not call.
P(B, \neg E, A, J, \neg M)
- Decomposition: Apply the Chain Rule based on the graph structure.
= P(B) \cdot P(\neg E) \cdot P(A | B, \neg E) \cdot P(J | A) \cdot P(\neg M | A) - Calculation:
= 0.05 \times 0.9 \times 0.85 \times 0.7 \times 0.2
Task 2: Inference (Conditional Probability)
Calculate the probability that a Burglary occurred, given that John called and Mary did not call.
P(B | J, \neg M)
-
Formula (Bayes Rule):
P(B | J, \neg M) = \frac{P(B, J, \neg M)}{P(J, \neg M)} -
Numerator Calculation (
P(B, J, \neg M)): We must marginalize out the unknown variables (AandE) from the joint distribution.P(B, J, \neg M) = \sum_{A \in \{T,F\}} \sum_{E \in \{T,F\}} P(B, E, A, J, \neg M)This involves summing 4 terms (combinations of A and E).
-
Denominator Calculation (
P(J, \neg M)): We further marginalize outBfrom the numerator result.P(J, \neg M) = P(B, J, \neg M) + P(\neg B, J, \neg M)
3. Example 2: 4-Node Tree Structure
A simpler example to demonstrate how sums simplify during marginalization.
Scenario & Structure
- Nodes:
X_1, X_2, X_3, X_4 \in \{0, 1\}(Binary). - Dependencies:
X_1 \rightarrow X_2X_2 \rightarrow X_3X_2 \rightarrow X_4
- Decomposition:
P(X) = P(X_1)P(X_2|X_1)P(X_3|X_2)P(X_4|X_2). - Given Tables: Probabilities for all priors and conditionals are provided.
Task: Calculate Marginal Probability $P(X_3 = 1)$
We need to find the probability of X_3=1 regardless of the other variables.
-
Definition: Sum the joint probability over all other variables (
X_1, X_2, X_4).P(X_3=1) = \sum_{x_1} \sum_{x_2} \sum_{x_4} P(x_1, x_2, x_3=1, x_4) -
Step 1: Expand using Graph Structure
= \sum_{x_1} \sum_{x_2} \sum_{x_4} P(x_1)P(x_2|x_1)P(X_3=1|x_2)P(x_4|x_2) -
Step 2: Simplify (Key Insight) Move the summation signs to push them as far right as possible. The sum over
x_4only affects the last termP(x_4|x_2).= \sum_{x_1} \sum_{x_2} P(x_1)P(x_2|x_1)P(X_3=1|x_2) \left[ \sum_{x_4} P(x_4|x_2) \right]- Property:
\sum_{x_4} P(x_4|x_2) = 1(Sum of probabilities for a variable given a condition is always 1). - Therefore, the
X_4term vanishes. This makes sense intuitively:X_4is a "leaf" node distinct fromX_3; knowing nothing about it doesn't change $X_3$'s probability ifX_2is handled.
- Property:
-
Step 3: Final Calculation We are left with summing over
X_1andX_2:= \sum_{x_1} \sum_{x_2} P(x_1)P(x_2|x_1)P(X_3=1|x_2)This expands to 4 terms (combinations of
x_1 \in \{0,1\}andx_2 \in \{0,1\}).
4. Semester Summary & Conclusion
The lecture concludes the semester's material.
- Key Themes Covered:
- Discriminative vs. Generative Methods: The fundamental difference in approach (boundary vs. distribution).
- Objective Functions: Designing Loss functions vs. Likelihood functions.
- Optimization: Parameter estimation via derivatives (MLE).
- Graphical Models: Reducing parameter complexity using independence assumptions (Bayes Nets, MRFs).
- Final Exam: Scheduled for Thursday, December 11th. It will cover the concepts discussed, focusing on understanding the fundamentals (e.g., Likelihood, Generative principles) rather than rote memorization.