69 lines
4.6 KiB
Markdown
69 lines
4.6 KiB
Markdown
# Large Margin Classifiers and Optimization
|
|
|
|
**Date:** 2025.10.27
|
|
**Topic:** Large Margin Classifiers, Optimization, Margin Definition
|
|
|
|
---
|
|
|
|
### 1. Introduction to Robust Classification
|
|
The lecture begins by shifting focus from generative methods to discriminative methods, specifically within a **linearly separable setting**.
|
|
* **Problem Setting:** The goal is to classify data that can be perfectly separated by a linear boundary (hyperplane).
|
|
* **Robustness:** While infinite linear classifiers may separate the data, the objective is to find the "best" one. The best classifier is defined as the one that is most **robust**, meaning it generalizes well to new test data and handles potential outliers effectively.
|
|
* **Intuition:** A robust classifier places the decision boundary in the middle of the gap between classes, maximizing the distance to the nearest data points.
|
|
|
|
### 2. Defining the Margin
|
|
The concept of the **margin** is introduced to mathematically define robustness.
|
|
* **Definition:** The margin is the distance between the decision hyperplane and the closest data points.
|
|
* **Hyperplane Equation:** The decision boundary is defined as $w^T x - b = 0$.
|
|
* **Support Lines:** To define the margin, we establish two parallel lines passing through the closest data points:
|
|
* $w^T x - b = 1$ (for class +1).
|
|
* $w^T x - b = -1$ (for class -1).
|
|
* The region between these lines contains no data points.
|
|
|
|
### 3. Calculating the Margin Width
|
|
The lecture derives the mathematical expression for the margin width using vector projection.
|
|
* **Vector Projection:** The margin is calculated by projecting the vector connecting a point on the boundary ($x_0$) to a support vector ($x$) onto the normal vector $w$.
|
|
* **Derivation:**
|
|
* The distance is the projection of vector $(x - x_0)$ onto the unit normal vector $\frac{w}{||w||}$.
|
|
* Using the constraint $w^T x - b = 1$ and $w^T x_0 - b = 0$, the derived margin distance is $\frac{1}{||w||}$.
|
|
* **Conclusion:** Maximizing the margin is equivalent to **minimizing the norm of the weight vector $||w||$**.
|
|
|
|
### 4. The Optimization Problem
|
|
The task of finding the best classifier is formulated as a constrained optimization problem.
|
|
|
|
* **Objective Function:**
|
|
$$\min ||w||^2$$
|
|
(Note: Minimizing $||w||$ is computationally equivalent to minimizing $||w||^2$)
|
|
|
|
* **Constraints:** All data points must be correctly classified and lie outside the margin. This is formalized as:
|
|
* $w^T x_i - b \ge 1$ for $y_i = 1$.
|
|
* $w^T x_i - b \le -1$ for $y_i = -1$.
|
|
* **Combined Constraint:** $y_i (w^T x_i - b) \ge 1$ for all $i$.
|
|
|
|
### 5. Optimization with Constraints (Lagrange Multipliers)
|
|
The lecture explains how to solve this optimization problem using **Lagrange Multipliers**, using a general example first.
|
|
|
|
* **Problem Setup:** Minimize an objective function $L(x)$ subject to a constraint $g(x) \ge 0$.
|
|
* **Lagrangian:** A new objective function is defined by combining the original loss and the constraint with a multiplier $\lambda$:
|
|
$$L'(x) = L(x) - \lambda g(x)$$
|
|
(Note: The transcript discusses combining components; the sign depends on the specific maximization/minimization formulation)
|
|
|
|
* **Solution Cases:**
|
|
The solution involves taking the derivative $\frac{dL'}{dx} = 0$ and considering two cases:
|
|
1. **Feasible Region ($\lambda = 0$):** The unconstrained minimum of $L(x)$ naturally satisfies the constraint ($g(x) > 0$). In this case, the constraint is inactive.
|
|
2. **Boundary Case ($\lambda > 0$):** The unconstrained minimum violates the constraint. Therefore, the optimal solution lies *on* the boundary where $g(x) = 0$.
|
|
|
|
### 6. Example: Constrained Minimization
|
|
A specific mathematical example is worked through to demonstrate the method.
|
|
* **Objective:** Minimize $x_1^2 + x_2^2$ (distance from origin).
|
|
* **Constraint:** $x_2 - x_1^2 - 1 \ge 0$ (must be above a parabola).
|
|
* **Solving:**
|
|
* The Lagrangian is set up: $L' = x_1^2 + x_2^2 - \lambda(x_2 - x_1^2 - 1)$.
|
|
* **Case 1 ($\lambda = 0$):** Leads to $x_1=0, x_2=0$, which violates the constraint ($0 - 0 - 1 = -1 \not\ge 0$). This solution is discarded.
|
|
* **Case 2 (Boundary, $\lambda \ne 0$):** The solution must lie on $x_2 - x_1^2 - 1 = 0$. Solving the system of equations yields the valid minimum at $x_1=0, x_2=1$.
|
|
|
|
### 7. Next Steps: Support Vector Machines
|
|
The lecture concludes by linking this optimization framework back to the classifier.
|
|
* **Support Vectors:** The data points that lie exactly on the margin boundary ($g(x)=0$) are called "Support Vectors".
|
|
* **Future Topic:** This foundation leads into the **Support Vector Machine (SVM)** algorithm, which will be discussed in the next session to handle non-linearly separable data.
|