# Lecture Summary: Generative Methods & Probability Review **Date:** 2025.11.06 **Topic:** Discriminative vs. Generative Models, Probability Theory, Probabilistic Inference, and Gaussian Distributions. --- ### 1. Classification Approaches: Discriminative vs. Generative The lecture begins by distinguishing between two fundamental approaches to machine learning classification, specifically for binary problems (labels 0 or 1). #### **Discriminative Methods (e.g., Logistic Regression)** * **Goal:** Directly model the decision boundary or the conditional probability $P(y|x)$. * **Mechanism:** Focuses on distinguishing classes. It learns a function that maps inputs $x$ directly to class labels $y$. * **Limitation:** It does not model the underlying distribution of the data itself. #### **Generative Methods** * **Goal:** Model the joint probability or the class-conditional density $P(x|y)$ and the class prior $P(y)$. * **Mechanism:** It learns "how the data is generated" for each class. * **Classification:** To classify a new point, it uses **Bayes' Rule** to invert the probabilities: $$P(y|x) = \frac{P(x|y)P(y)}{P(x)}$$ * **Advantage:** If you know the generative model, you can solve the classification problem *and* generate new data samples. --- ### 2. Probability Theory Review To understand Generative Methods, a strong foundation in probability is required. #### **Random Variables** * **Definition:** A random variable is technically a **function** (mapping) that assigns a real number to an outcome (event $\omega$) in the sample space $\Omega$. * **Example:** Tossing a coin 4 times. An event might be "HHTH", and the random variable $X(\omega)$ could be "number of heads" (which equals 3). #### **Probability vs. Probability Density Function (PDF)** The lecture emphasizes distinguishing between discrete probability ($P$) and continuous density ($p$). * **Discrete Probability ($P$):** Defined as the ratio of cardinalities (counts) or areas in discrete sets (e.g., Venn diagrams). * **Probability Density Function ($p$):** Used for continuous variables. * **Properties:** $p(x) \ge 0$ for all $x$, and $\int p(x)dx = 1$. * **Relationship:** The probability of $x$ falling within a range is the **integral** (area under the curve) of the PDF. The probability of a specific point $P(x=x_0)$ is 0. #### **Key Statistics** * **Expectation ($E[x]$):** The mean or weighted average of a random variable. $$E[x] = \int x p(x) dx$$ * **Covariance:** Measures the spread or variance of the data. For vectors, this results in a Covariance Matrix. $$Cov[x] = E[(x - \mu)(x - \mu)^T]$$ --- ### 3. The Trinity of Distributions: Joint, Conditional, and Marginal Understanding the relationship between these three is crucial for probabilistic modeling. #### **Joint PDF ($P(x_1, x_2)$)** * This represents the probability of $x_1$ and $x_2$ occurring together. * **Importance:** If you know the Joint PDF, you know *everything* about the system. You can derive all other probabilities (marginal, conditional) from it. #### **Conditional PDF ($P(x_1 | x_2)$)** * Represents the probability of $x_1$ given that $x_2$ is fixed to a specific value. * Visually, this is like taking a "slice" of the joint distribution 3D surface at $x_2 = a$. #### **Marginal PDF ($P(x_1)$)** * Represents the probability of $x_1$ regardless of $x_2$. * **Calculation:** You "marginalize out" (integrate or sum) the other variables. * Continuous: $P(x_1) = \int P(x_1, x_2) dx_2$. * Discrete: Summing rows or columns in a probability table. --- ### 4. Probabilistic Inference **Inference** is defined as calculating a desired probability (e.g., a prediction) starting from the Joint Probability function using rules like Bayes' theorem and marginalization. #### **Handling Missing Data** A major practical benefit of generative models (Joint PDF modeling) over discriminative models (like Logistic Regression) is robust handling of missing data. * **Scenario:** You have a model predicting disease ($y$) based on Age ($x_1$), Blood Pressure ($x_2$), and Oxygen ($x_3$). * **Problem:** A patient arrives, but you cannot measure Age ($x_1$). A discriminative model might fail or require value imputation (guessing averages). * **Probabilistic Solution:** You integrate (marginalize) out the missing variable $x_1$ from the joint distribution to get the probability based only on observed data: $$P(y | x_2, x_3) = \frac{\int p(x_1, x_2, x_3, y) dx_1}{P(x_2, x_3)}$$. --- ### 5. The Gaussian Distribution The lecture concludes with a review of the Gaussian (Normal) distribution, the most important function in AI/ML. * **Univariate Gaussian:** Defined by mean $\mu$ and variance $\sigma^2$. * **Multivariate Gaussian:** Defined for a vector $x \in R^D$. $$P(x) = \frac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right)$$. * **Parameters:** * $\mu$: Mean vector ($D$-dimensional). * $\Sigma$: Covariance Matrix ($D \times D$). It must be **Symmetric** and **Positive Definite**.