Sigmoid函数导数推导详解
- 在逻辑回归中,Sigmoid函数的导数推导是一个关键步骤,它使得梯度下降算法能够高效地计算。
1. Sigmoid函数定义
首先回顾Sigmoid函数的定义:
g ( z ) = 1 1 + e − z g(z) = \frac{1}{1 + e^{-z}} g(z)=1+e−z1
2. 导数推导过程
-
从Sigmoid函数出发:
g ( z ) = 1 1 + e − z g(z) = \frac{1}{1 + e^{-z}} g(z)=1+e−z1 -
令 u = 1 + e − z u = 1 + e^{-z} u=1+e−z,则 g ( z ) = u − 1 g(z) = u^{-1} g(z)=u−1
-
使用链式法则:
d g d z = d g d u ⋅ d u d z = − u − 2 ⋅ ( − e − z ) = e − z ( 1 + e − z ) 2 \frac{dg}{dz} = \frac{dg}{du} \cdot \frac{du}{dz} = -u^{-2} \cdot (-e^{-z}) = \frac{e^{-z}}{(1 + e^{-z})^2} dzdg=dudg⋅dzdu=−u−2⋅(−e−z)=(1+e−z)2e−z -
现在,我们将其表示为 g ( z ) g(z) g(z)的函数:
e − z 1 + e − z = 1 − 1 1 + e − z = 1 − g ( z ) \frac{e^{-z}}{1 + e^{-z}} = 1 - \frac{1}{1 + e^{-z}} = 1 - g(z) 1+e−ze−z=1−1+e−z1=1−g(z) -
因此:
g ′ ( z ) = 1 1 + e − z ⋅ e − z 1 + e − z = g ( z ) ⋅ ( 1 − g ( z ) ) g'(z) = \frac{1}{1 + e^{-z}} \cdot \frac{e^{-z}}{1 + e^{-z}} = g(z) \cdot (1 - g(z)) g′(z)=1+e−z1⋅1+e−ze−z=g(z)⋅(1−g(z))
3. 代码实现
import numpy as np
import matplotlib.pyplot as pltdef sigmoid(z):return 1 / (1 + np.exp(-z))def sigmoid_derivative(z):return sigmoid(z) * (1 - sigmoid(z))z = np.linspace(-10, 10, 100)
plt.figure(figsize=(10, 6))
plt.plot(z, sigmoid(z), label="Sigmoid function")
plt.plot(z, sigmoid_derivative(z), label="Sigmoid derivative")
plt.xlabel("z")
plt.ylabel("g(z)")
plt.title("Sigmoid Function and its Derivative")
plt.legend()
plt.grid(True)
plt.show()
4. 导数性质分析
- 最大值:当 g ( z ) = 0.5 g(z) = 0.5 g(z)=0.5时,导数达到最大值 0.25 0.25 0.25
- 对称性:导数在 z = 0 z=0 z=0时最大,随着 ∣ z ∣ |z| ∣z∣增大而迅速减小
- 非负性:导数始终非负,因为 0 < g ( z ) < 1 0 < g(z) < 1 0<g(z)<1
5. 导数形式的重要型
- 在逻辑回归的梯度下降中,需要计算损失函数对参数的导数。由于损失函数中包含Sigmoid函数,这个导数形式使得计算变得非常简洁:
∂ ∂ θ j J ( θ ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \frac{\partial}{\partial \theta_j}J(\theta) = \frac{1}{m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} ∂θj∂J(θ)=m1i=1∑m(hθ(x(i))−y(i))xj(i)
- 其中 h θ ( x ) = g ( θ T x ) h_\theta(x) = g(\theta^T x) hθ(x)=g(θTx)。如果没有这个简洁的导数形式,梯度计算会复杂得多。
- 推导损失函数对 θ j \theta_j θj的偏导数:
∂ ∂ θ j J ( θ ) = − 1 m ∑ i = 1 m ( y i 1 h θ ( x i ) − ( 1 − y i ) 1 1 − h θ ( x i ) ) ∂ ∂ θ j h θ ( x i ) = − 1 m ∑ i = 1 m ( y i 1 g ( θ T x i ) − ( 1 − y i ) 1 1 − g ( θ T x i ) ) g ( θ T x i ) ( 1 − g ( θ T x i ) ) x i j = − 1 m ∑ i = 1 m ( y i ( 1 − g ( θ T x i ) ) − ( 1 − y i ) g ( θ T x i ) ) x i j = 1 m ∑ i = 1 m ( h θ ( x i ) − y i ) x i j \begin{align*} \frac{\partial}{\partial \theta_j} J(\theta) &= -\frac{1}{m}\sum_{i=1}^m \left(y_i \frac{1}{h_\theta(x_i)} - (1-y_i)\frac{1}{1-h_\theta(x_i)}\right) \frac{\partial}{\partial \theta_j} h_\theta(x_i) \\ &= -\frac{1}{m}\sum_{i=1}^m \left(y_i \frac{1}{g(\theta^T x_i)} - (1-y_i)\frac{1}{1-g(\theta^T x_i)}\right) g(\theta^T x_i)(1-g(\theta^T x_i)) x_i^j \\ &= -\frac{1}{m}\sum_{i=1}^m \left(y_i(1-g(\theta^T x_i)) - (1-y_i)g(\theta^T x_i)\right) x_i^j \\ &= \frac{1}{m}\sum_{i=1}^m (h_\theta(x_i) - y_i) x_i^j \end{align*} ∂θj∂J(θ)=−m1i=1∑m(yihθ(xi)1−(1−yi)1−hθ(xi)1)∂θj∂hθ(xi)=−m1i=1∑m(yig(θTxi)1−(1−yi)1−g(θTxi)1)g(θTxi)(1−g(θTxi))xij=−m1i=1∑m(yi(1−g(θTxi))−(1−yi)g(θTxi))xij=m1i=1∑m(hθ(xi)−yi)xij