2025-02-AI/final/1027.md

# Large Margin Classifiers and Optimization

**Date:** 2025.10.27
**Topic:** Large Margin Classifiers, Optimization, Margin Definition

---

### 1. Introduction to Robust Classification
The lecture begins by shifting focus from generative methods to discriminative methods, specifically within a **linearly separable setting**.
* **Problem Setting:** The goal is to classify data that can be perfectly separated by a linear boundary (hyperplane).
* **Robustness:** While infinite linear classifiers may separate the data, the objective is to find the "best" one. The best classifier is defined as the one that is most **robust**, meaning it generalizes well to new test data and handles potential outliers effectively.
* **Intuition:** A robust classifier places the decision boundary in the middle of the gap between classes, maximizing the distance to the nearest data points.

### 2. Defining the Margin
The concept of the **margin** is introduced to mathematically define robustness.
* **Definition:** The margin is the distance between the decision hyperplane and the closest data points.
* **Hyperplane Equation:** The decision boundary is defined as $w^T x - b = 0$.
* **Support Lines:** To define the margin, we establish two parallel lines passing through the closest data points:
    * $w^T x - b = 1$ (for class +1).
    * $w^T x - b = -1$ (for class -1).
    * The region between these lines contains no data points.

### 3. Calculating the Margin Width
The lecture derives the mathematical expression for the margin width using vector projection.
* **Vector Projection:** The margin is calculated by projecting the vector connecting a point on the boundary ($x_0$) to a support vector ($x$) onto the normal vector $w$.
* **Derivation:**
    * The distance is the projection of vector $(x - x_0)$ onto the unit normal vector $\frac{w}{||w||}$.
    * Using the constraint $w^T x - b = 1$ and $w^T x_0 - b = 0$, the derived margin distance is $\frac{1}{||w||}$.
* **Conclusion:** Maximizing the margin is equivalent to **minimizing the norm of the weight vector $||w||$**.

### 4. The Optimization Problem
The task of finding the best classifier is formulated as a constrained optimization problem.

* **Objective Function:**
    $$\min ||w||^2$$
    (Note: Minimizing $||w||$ is computationally equivalent to minimizing $||w||^2$)

* **Constraints:** All data points must be correctly classified and lie outside the margin. This is formalized as:
    * $w^T x_i - b \ge 1$ for $y_i = 1$.
    * $w^T x_i - b \le -1$ for $y_i = -1$.
    * **Combined Constraint:** $y_i (w^T x_i - b) \ge 1$ for all $i$.

### 5. Optimization with Constraints (Lagrange Multipliers)
The lecture explains how to solve this optimization problem using **Lagrange Multipliers**, using a general example first.

* **Problem Setup:** Minimize an objective function $L(x)$ subject to a constraint $g(x) \ge 0$.
* **Lagrangian:** A new objective function is defined by combining the original loss and the constraint with a multiplier $\lambda$:
    $$L'(x) = L(x) - \lambda g(x)$$
    (Note: The transcript discusses combining components; the sign depends on the specific maximization/minimization formulation)

* **Solution Cases:**
    The solution involves taking the derivative $\frac{dL'}{dx} = 0$ and considering two cases:
    1.  **Feasible Region ($\lambda = 0$):** The unconstrained minimum of $L(x)$ naturally satisfies the constraint ($g(x) > 0$). In this case, the constraint is inactive.
    2.  **Boundary Case ($\lambda > 0$):** The unconstrained minimum violates the constraint. Therefore, the optimal solution lies *on* the boundary where $g(x) = 0$.

### 6. Example: Constrained Minimization
A specific mathematical example is worked through to demonstrate the method.
* **Objective:** Minimize $x_1^2 + x_2^2$ (distance from origin).
* **Constraint:** $x_2 - x_1^2 - 1 \ge 0$ (must be above a parabola).
* **Solving:**
    * The Lagrangian is set up: $L' = x_1^2 + x_2^2 - \lambda(x_2 - x_1^2 - 1)$.
    * **Case 1 ($\lambda = 0$):** Leads to $x_1=0, x_2=0$, which violates the constraint ($0 - 0 - 1 = -1 \not\ge 0$). This solution is discarded.
    * **Case 2 (Boundary, $\lambda \ne 0$):** The solution must lie on $x_2 - x_1^2 - 1 = 0$. Solving the system of equations yields the valid minimum at $x_1=0, x_2=1$.

### 7. Next Steps: Support Vector Machines
The lecture concludes by linking this optimization framework back to the classifier.
* **Support Vectors:** The data points that lie exactly on the margin boundary ($g(x)=0$) are called "Support Vectors".
* **Future Topic:** This foundation leads into the **Support Vector Machine (SVM)** algorithm, which will be discussed in the next session to handle non-linearly separable data.