逻辑回归的两种数学表达

\[\newcommand{\R}{\mathbb{R}} \newcommand{\w}{\mathbf{w}} \newcommand{\x}{\mathbf{x}}\]

LR两种表达

表达一

假设样本集$\brace{(\x^{(i)},y^{(i)})}_{i=1}^m$ ,特征$\x\in \R^n$,类别$y\in\brace{0,1}$,则LR模型:

\[\begin{aligned} p(y=1|\x,\w) &=\sigma(\w\tr\x) &=\frac{\exp(\w\tr\x)}{1+\exp(\w\tr\x)} \\ p(y=0|\x,\w) &=1-\sigma(\w\tr\x) &=\frac1{1+\exp(\w\tr\x)} \end{aligned}\]

所谓“几率”(odds),反映了$x$作为正例的相对可能性。对几率取对数则得到“对数几率”(log odds,即logit),是一个线性函数: \(\mathrm{logit}(p) = \log\frac{p}{1-p}=\w\tr\x\)

设 $p(y=1\vert\x,\w)=\pi(\x), \quad p(y=0\vert\x,\w)=1-\pi(x)$,则MLE:

\[\max_\w \prod_{i=1}^m \left[\pi(\x^{(i)})\right]^{y^{(i)}} \left[1-\pi(\x^{(i)})\right]^{1-y^{(i)}}\] \[\begin{aligned} \ell(w) &= \sum_{i=1}^m\left[y^{(i)}\log \pi(\x^{(i)}) + (1-y^{(i)})\log\left(1-\pi(\x^{(i)})\right)\right] \\ &=\sum_{i=1}^m\left[y^{(i)}\log\frac{\pi(\x^{(i)})}{1-\pi(\x^{(i)})} + \log\left(1-\pi(\x^{(i)})\right)\right] \\ &=\sum_{i=1}^m\left[y^{(i)}(\w\tr\x^{(i)}) - \log\left(1+\exp(\w\tr\x^{(i)})\right)\right] \end{aligned}\]

单个样本求梯度:

\[\nabla_\w\ell =\left(y-\sigma(\w\tr \x)\right)\x\]

CS229第一章P18是这种写法: http://cs229.stanford.edu/notes/cs229-notes1.pdf

\[\nabla_\w\ell=\left(y-h_\w(\x)\right)\x\]

表达二

假设样本集$\brace{(\x^{(i)},y^{(i)})}_{i=1}^m$ ,特征$\x\in \R^n$,类别$y\in\brace{-1,1}$,则LR模型:

\[p(y=\pm1 | \x, \w) = \sigma(y\w\tr\x)=\frac1{1+\exp(-y\w\tr\x)}\]

MLE:

\[\max_\w \prod_{i=1}^m p(y^{(i)}=\pm1)\] \[\ell(\w) = -\sum_{i=1}^m\log\left(1+\exp(-y^{(i)}\w\tr\x^{(i)})\right)\]

单个样本求梯度:

\[\nabla_\w\ell=y\left(\sigma(y\w\tr\x)-1\right)\x\]

Spark里的一个LR demo(无正则):

val data = spark.textFile(...).map(readPoint).cache()
var w = Vector.random(D)
for (i <- 1 to ITERATIONS) {
  val gradient = data.map(p =>
    (1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x
  ).reduce(_ + _)
  w -= gradient
}
println("Final w: " + w)

参考