Study Guide: Generative Methods & Multivariate Gaussian Distributions

Date: 2025.12.01 Topic: Generative vs. Discriminative Models, Multivariate Gaussian Properties, Conditional and Marginal Distributions.

1. Generative vs. Discriminative Methods

The lecture begins by contrasting the new topic (Generative Methods) with previous topics (Discriminative Methods like Linear Regression, Logistic Regression, and SVM).

Discriminative Methods (Separating):
- These methods focus on finding a boundary (separating line or hyperplane) between classes.
- Limitation: They cannot generate new data samples because they do not model the data distribution; they only know the boundary.
- Hypothesis: They assume a linear line or function as the hypothesis to separate data.
Generative Methods (Inferring Distribution):
- Goal: To infer the underlying distribution (the rule or pattern) from which the data samples were drawn.
- Assumption: Data is not random; it follows a specific probabilistic structure (e.g., drawn from a distribution).
- Capabilities: Once the Joint Probability Distribution (underlying distribution) is known:
  1. Classification: Can be performed using Bayes' Rule.
  2. Generation: New samples can be created that follow the same patterns as the training data (e.g., generating new images or text).

2. The Gaussian (Normal) Distribution

The Gaussian distribution is the most popular choice for modeling the "hypothesis" of the underlying distribution in generative models.

Why Gaussian?

Simplicity: Defined entirely by two parameters: Mean (\mu) and Covariance (\Sigma).
Central Limit Theorem: Sums of independent random events tend to follow a Gaussian distribution.
Mathematical "Closure": The most critical reason for its use in AI is that Conditional and Marginal distributions of a Multivariate Gaussian are also Gaussian.

Multivariate Gaussian Definition

For a $D$-dimensional vector x:

P(x) = \frac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right)

\mu: Mean vector ($D$-dimensional).
\Sigma: Covariance Matrix (D \times D).

[Image of multivariate gaussian distribution 3d plot]

Properties of the Covariance Matrix (`\Sigma`)

Symmetric: \Sigma_{ij} = \Sigma_{ji}.
Positive Definite: All eigenvalues are positive.
Diagonal Terms: Represent the variance of individual variables.
Off-Diagonal Terms: Represent the correlation (covariance) between variables.
- If \sigma_{12} = 0, the variables are independent (for Gaussians).
- The matrix shape determines the geometry of the distribution contours (spherical vs. elliptical).

3. Independence and Factorization

If the Covariance Matrix is diagonal (all off-diagonal elements are 0), the variables are independent.

Mathematically, the inverse matrix \Sigma^{-1} is also diagonal.
The joint probability factorizes into the product of marginals: P(x_1, x_2) = P(x_1)P(x_2)
The "quadratic form" inside the exponential splits into a sum of separate squared terms.

4. Conditional Gaussian Distribution

The lecture derives what happens when we observe a subset of variables (e.g., x_2) and want to determine the distribution of the remaining variables (x_1). This is P(x_1 | x_2).

Concept: Visually, this is equivalent to "slicing" the joint distribution at a specific value of x_2 (fixed constant).
Result: The resulting cross-section is also a Gaussian distribution.
Parameters: If we partition x, \mu, and \Sigma into subsets, the conditional mean (\mu_{1|2}) and covariance (\Sigma_{1|2}) are given by:
- Mean: \mu_{1|2} = \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2 - \mu_2).
- Covariance: \Sigma_{1|2} = \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}. (Note: The derivation involves completing the square to identify the Gaussian form).

5. Marginal Gaussian Distribution

The lecture explains how to find the distribution of a subset of variables (x_1) by ignoring the others (x_2). This is P(x_1).

Concept: This is equivalent to integrating out the unobserved variables: P(x_1) = \int P(x_1, x_2) dx_2
Result: The marginal distribution is also a Gaussian distribution.
Parameters: Unlike the conditional case, calculating the marginal parameters is trivial. You simply select the corresponding sub-vector and sub-matrix from the joint parameters.
- Mean: \mu_1.
- Covariance: \Sigma_{11}.

Summary Table

Distribution	Type	Parameters Derived From Joint `(\mu, \Sigma)`
Joint `P(x)`	Gaussian	Given as `\mu, \Sigma`
Conditional `P(x_1 \\| x_2)`	Gaussian	Complex formula (involves matrix inversion of `\Sigma_{22}`)
Marginal `P(x_1)`	Gaussian	Simple subset (extract `\mu_1` and `\Sigma_{11}`)

The lecture concludes by emphasizing that understanding these Gaussian properties is essential for the second half of the semester, as they form the basis for probabilistic generative models.

5.2 KiB Raw Permalink Blame History