4.0 KiB
Study Guide: Undirected Graphical Models (Markov Random Fields)
Date: 2025.11.27 Topic: Potential Functions, Partition Function, and Conditional Independence in MRFs.
1. Recap: Decomposition in Undirected Graphs
Unlike Directed Graphical Models (Bayesian Networks) which use conditional probabilities, Undirected Graphical Models (Markov Random Fields - MRFs) cannot directly use probabilities because there is no direction/causality. Instead, they decompose the joint distribution based on Maximal Cliques.
- Cliques: Subsets of nodes where every node is connected to every other node.
- Maximal Clique: A clique that cannot be expanded (e.g., in the example graph, the maximal cliques covers the graph).
- Decomposition Rule: The joint distribution is the product of functions defined over these maximal cliques.
2. Potential Functions (\psi)
- Definition: For each maximal clique
C, we define a Potential Function\psi_C(x_C)(often denoted as\phior\psi).- It is a positive function (
\psi(x) \ge 0) mapping the state of the clique variables to a real number. - It represents the "compatibility" or "energy" of that configuration.
- It is a positive function (
- Key Distinction: A potential function is NOT a probability. It does not sum to 1. It is just a score (non-negative function).
- Example:
\psi_{12}(x_1, x_2)scores the interaction betweenx_1andx_2.
- Example:
3. The Partition Function (Z)
Since the product of potential functions is not a probability distribution (it doesn't sum to 1), we must normalize it.
- Definition: The normalization constant is called the Partition Function (
Z).Z = \sum_{x} \prod_{C} \psi_C(x_C) - Role: It ensures that the resulting distribution sums to 1, making it a valid probability distribution.
P(x) = \frac{1}{Z} \prod_{C} \psi_C(x_C) - Calculation: To find
Z, we must sum the product of potentials over all possible states (combinations) of the random variables. This summation is often computationally expensive.
Example Calculation
The lecture walks through a simple example with 4 binary variables and two cliques: \{x_1, x_2, x_3\} and \{x_3, x_4\}.
- Step 1: Define potential tables for
\psi_{123}and\psi_{34}. - Step 2: Calculate the score for every combination.
- Step 3: Sum all scores to get
Z. In the example,Z=10. - Step 4: The probability of any specific state (e.g.,
P(1,0,0,0)) is its specific score divided byZ(e.g.,(1 \times 3)/10or similar depending on values).
4. Parameter Estimation
- Discrete Case: If variables are discrete (like the email spam example), the parameters are the entries in the potential tables. We estimate these values from data to maximize the likelihood.
- Continuous Case: If variables are continuous, potential functions are typically Gaussian distributions. We estimate means and covariances.
- Reduction: Just like in Bayesian Networks, using the graph structure reduces the number of parameters.
- Without Graph: A full table for 4 binary variables needs
2^4 = 16entries. - With Graph: We only need tables for the cliques, significantly reducing complexity.
- Without Graph: A full table for 4 binary variables needs
5. Verifying Conditional Independence
The lecture demonstrates analytically that the potential function formulation preserves the conditional independence properties of the graph.
- Scenario: Graph with structure
x_1 - x_2 - x_3 - x_4.- Is
x_4independent ofx_1givenx_3?
- Is
- Analytical Check:
- We calculate
P(x_4=1 | x_1=0, x_2=1, x_3=1). - We also calculate
P(x_4=1 | x_1=0, x_2=0, x_3=1).
- We calculate
- Result: The calculation shows that as long as
x_3is fixed (given), the value ofx_1andx_2cancels out in the probability ratio.P(x_4|x_1, x_2, x_3) = \frac{\phi_1}{\phi_1 + \phi_0}(depends only on potentials involvingx_4andx_3).
- Conclusion: This confirms that
x_4 \perp \{x_1, x_2\} | x_3. The formulation correctly encodes the global Markov property.