LR两种表达
表达一
假设样本集$\brace{(\x^{(i)},y^{(i)})}_{i=1}^m$ ,特征$\x\in \R^n$,类别$y\in\brace{0,1}$,则LR模型:
\[\begin{aligned} p(y=1|\x,\w) &=\sigma(\w\tr\x) &=\frac{\exp(\w\tr\x)}{1+\exp(\w\tr\x)} \\ p(y=0|\x,\w) &=1-\sigma(\w\tr\x) &=\frac1{1+\exp(\w\tr\x)} \end{aligned}\]所谓“几率”(odds),反映了$x$作为正例的相对可能性。对几率取对数则得到“对数几率”(log odds,即logit),是一个线性函数: \(\mathrm{logit}(p) = \log\frac{p}{1-p}=\w\tr\x\)
设 $p(y=1\vert\x,\w)=\pi(\x), \quad p(y=0\vert\x,\w)=1-\pi(x)$,则MLE:
\[\max_\w \prod_{i=1}^m \left[\pi(\x^{(i)})\right]^{y^{(i)}} \left[1-\pi(\x^{(i)})\right]^{1-y^{(i)}}\] \[\begin{aligned} \ell(w) &= \sum_{i=1}^m\left[y^{(i)}\log \pi(\x^{(i)}) + (1-y^{(i)})\log\left(1-\pi(\x^{(i)})\right)\right] \\ &=\sum_{i=1}^m\left[y^{(i)}\log\frac{\pi(\x^{(i)})}{1-\pi(\x^{(i)})} + \log\left(1-\pi(\x^{(i)})\right)\right] \\ &=\sum_{i=1}^m\left[y^{(i)}(\w\tr\x^{(i)}) - \log\left(1+\exp(\w\tr\x^{(i)})\right)\right] \end{aligned}\]单个样本求梯度:
\[\nabla_\w\ell =\left(y-\sigma(\w\tr \x)\right)\x\]CS229第一章P18是这种写法: http://cs229.stanford.edu/notes/cs229-notes1.pdf
\[\nabla_\w\ell=\left(y-h_\w(\x)\right)\x\]表达二
假设样本集$\brace{(\x^{(i)},y^{(i)})}_{i=1}^m$ ,特征$\x\in \R^n$,类别$y\in\brace{-1,1}$,则LR模型:
\[p(y=\pm1 | \x, \w) = \sigma(y\w\tr\x)=\frac1{1+\exp(-y\w\tr\x)}\]MLE:
\[\max_\w \prod_{i=1}^m p(y^{(i)}=\pm1)\] \[\ell(\w) = -\sum_{i=1}^m\log\left(1+\exp(-y^{(i)}\w\tr\x^{(i)})\right)\]单个样本求梯度:
\[\nabla_\w\ell=y\left(\sigma(y\w\tr\x)-1\right)\x\]Spark里的一个LR demo(无正则):
val data = spark.textFile(...).map(readPoint).cache()
var w = Vector.random(D)
for (i <- 1 to ITERATIONS) {
val gradient = data.map(p =>
(1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x
).reduce(_ + _)
w -= gradient
}
println("Final w: " + w)
参考
- 《统计学习方法》李航,第6章 LR与MaxEnt模型
- 证明最大熵模型和LR模型等价