Study Guide: Undirected Graphical Models (Markov Random Fields)

Date: 2025.11.27 Topic: Potential Functions, Partition Function, and Conditional Independence in MRFs.

1. Recap: Decomposition in Undirected Graphs

Unlike Directed Graphical Models (Bayesian Networks) which use conditional probabilities, Undirected Graphical Models (Markov Random Fields - MRFs) cannot directly use probabilities because there is no direction/causality. Instead, they decompose the joint distribution based on Maximal Cliques.

Cliques: Subsets of nodes where every node is connected to every other node.
Maximal Clique: A clique that cannot be expanded (e.g., in the example graph, the maximal cliques covers the graph).
Decomposition Rule: The joint distribution is the product of functions defined over these maximal cliques.

2. Potential Functions (`\psi`)

Definition: For each maximal clique C, we define a Potential Function \psi_C(x_C) (often denoted as \phi or \psi).
- It is a positive function (\psi(x) \ge 0) mapping the state of the clique variables to a real number.
- It represents the "compatibility" or "energy" of that configuration.
Key Distinction: A potential function is NOT a probability. It does not sum to 1. It is just a score (non-negative function).
- Example: \psi_{12}(x_1, x_2) scores the interaction between x_1 and x_2.

3. The Partition Function (`Z`)

Since the product of potential functions is not a probability distribution (it doesn't sum to 1), we must normalize it.

Definition: The normalization constant is called the Partition Function (Z). Z = \sum_{x} \prod_{C} \psi_C(x_C)
Role: It ensures that the resulting distribution sums to 1, making it a valid probability distribution. P(x) = \frac{1}{Z} \prod_{C} \psi_C(x_C)
Calculation: To find Z, we must sum the product of potentials over all possible states (combinations) of the random variables. This summation is often computationally expensive.

Example Calculation

The lecture walks through a simple example with 4 binary variables and two cliques: \{x_1, x_2, x_3\} and \{x_3, x_4\}.

Step 1: Define potential tables for \psi_{123} and \psi_{34}.
Step 2: Calculate the score for every combination.
Step 3: Sum all scores to get Z. In the example, Z=10.
Step 4: The probability of any specific state (e.g., P(1,0,0,0)) is its specific score divided by Z (e.g., (1 \times 3)/10 or similar depending on values).

4. Parameter Estimation

Discrete Case: If variables are discrete (like the email spam example), the parameters are the entries in the potential tables. We estimate these values from data to maximize the likelihood.
Continuous Case: If variables are continuous, potential functions are typically Gaussian distributions. We estimate means and covariances.
Reduction: Just like in Bayesian Networks, using the graph structure reduces the number of parameters.
- Without Graph: A full table for 4 binary variables needs 2^4 = 16 entries.
- With Graph: We only need tables for the cliques, significantly reducing complexity.

5. Verifying Conditional Independence

The lecture demonstrates analytically that the potential function formulation preserves the conditional independence properties of the graph.

Scenario: Graph with structure x_1 - x_2 - x_3 - x_4.
- Is x_4 independent of x_1 given x_3?
Analytical Check:
- We calculate P(x_4=1 | x_1=0, x_2=1, x_3=1).
- We also calculate P(x_4=1 | x_1=0, x_2=0, x_3=1).
Result: The calculation shows that as long as x_3 is fixed (given), the value of x_1 and x_2 cancels out in the probability ratio.
- P(x_4|x_1, x_2, x_3) = \frac{\phi_1}{\phi_1 + \phi_0} (depends only on potentials involving x_4 and x_3).
Conclusion: This confirms that x_4 \perp \{x_1, x_2\} | x_3. The formulation correctly encodes the global Markov property.

4.0 KiB Raw Permalink Blame History