# Lecture Summary: Generative Methods & Probability Review

**Date:** 2025.11.06
**Topic:** Discriminative vs. Generative Models, Probability Theory, Probabilistic Inference, and Gaussian Distributions.

---

### 1. Classification Approaches: Discriminative vs. Generative

The lecture begins by distinguishing between two fundamental approaches to machine learning classification, specifically for binary problems (labels 0 or 1).

#### **Discriminative Methods (e.g., Logistic Regression)**
* **Goal:** Directly model the decision boundary or the conditional probability $P(y|x)$.
* **Mechanism:** Focuses on distinguishing classes. It learns a function that maps inputs $x$ directly to class labels $y$.
* **Limitation:** It does not model the underlying distribution of the data itself.

#### **Generative Methods**
* **Goal:** Model the joint probability or the class-conditional density $P(x|y)$ and the class prior $P(y)$.
* **Mechanism:** It learns "how the data is generated" for each class.
* **Classification:** To classify a new point, it uses **Bayes' Rule** to invert the probabilities:
    $$P(y|x) = \frac{P(x|y)P(y)}{P(x)}$$
* **Advantage:** If you know the generative model, you can solve the classification problem *and* generate new data samples.

---

### 2. Probability Theory Review

To understand Generative Methods, a strong foundation in probability is required.

#### **Random Variables**
* **Definition:** A random variable is technically a **function** (mapping) that assigns a real number to an outcome (event $\omega$) in the sample space $\Omega$.
* **Example:** Tossing a coin 4 times. An event might be "HHTH", and the random variable $X(\omega)$ could be "number of heads" (which equals 3).

#### **Probability vs. Probability Density Function (PDF)**
The lecture emphasizes distinguishing between discrete probability ($P$) and continuous density ($p$).

* **Discrete Probability ($P$):** Defined as the ratio of cardinalities (counts) or areas in discrete sets (e.g., Venn diagrams).
    * **Probability Density Function ($p$):** Used for continuous variables.
    * **Properties:** $p(x) \ge 0$ for all $x$, and $\int p(x)dx = 1$.
    * **Relationship:** The probability of $x$ falling within a range is the **integral** (area under the curve) of the PDF. The probability of a specific point $P(x=x_0)$ is 0.
    
#### **Key Statistics**
* **Expectation ($E[x]$):** The mean or weighted average of a random variable.
    $$E[x] = \int x p(x) dx$$
* **Covariance:** Measures the spread or variance of the data. For vectors, this results in a Covariance Matrix.
    $$Cov[x] = E[(x - \mu)(x - \mu)^T]$$

---

### 3. The Trinity of Distributions: Joint, Conditional, and Marginal

Understanding the relationship between these three is crucial for probabilistic modeling.

#### **Joint PDF ($P(x_1, x_2)$)**
* This represents the probability of $x_1$ and $x_2$ occurring together.
* **Importance:** If you know the Joint PDF, you know *everything* about the system. You can derive all other probabilities (marginal, conditional) from it.

#### **Conditional PDF ($P(x_1 | x_2)$)**
* Represents the probability of $x_1$ given that $x_2$ is fixed to a specific value.
* Visually, this is like taking a "slice" of the joint distribution 3D surface at $x_2 = a$.

#### **Marginal PDF ($P(x_1)$)**
* Represents the probability of $x_1$ regardless of $x_2$.
* **Calculation:** You "marginalize out" (integrate or sum) the other variables.
    * Continuous: $P(x_1) = \int P(x_1, x_2) dx_2$.
    * Discrete: Summing rows or columns in a probability table.

---

### 4. Probabilistic Inference

**Inference** is defined as calculating a desired probability (e.g., a prediction) starting from the Joint Probability function using rules like Bayes' theorem and marginalization.

#### **Handling Missing Data**
A major practical benefit of generative models (Joint PDF modeling) over discriminative models (like Logistic Regression) is robust handling of missing data.
* **Scenario:** You have a model predicting disease ($y$) based on Age ($x_1$), Blood Pressure ($x_2$), and Oxygen ($x_3$).
* **Problem:** A patient arrives, but you cannot measure Age ($x_1$). A discriminative model might fail or require value imputation (guessing averages).
* **Probabilistic Solution:** You integrate (marginalize) out the missing variable $x_1$ from the joint distribution to get the probability based only on observed data:
    $$P(y | x_2, x_3) = \frac{\int p(x_1, x_2, x_3, y) dx_1}{P(x_2, x_3)}$$.

---

### 5. The Gaussian Distribution

The lecture concludes with a review of the Gaussian (Normal) distribution, the most important function in AI/ML.

* **Univariate Gaussian:** Defined by mean $\mu$ and variance $\sigma^2$.
* **Multivariate Gaussian:** Defined for a vector $x \in R^D$.
    $$P(x) = \frac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right)$$.
* **Parameters:**
    * $\mu$: Mean vector ($D$-dimensional).
    * $\Sigma$: Covariance Matrix ($D \times D$). It must be **Symmetric** and **Positive Definite**.