admin管理员组

文章数量:1516870

PLSA+EM

  1. 加入隐变量的联合概率,条件概率等为:
    p(di,zk,wj)=p(di)p(zk∣di)p(wj∣zk)p\left(d_{i}, z_{k}, w_{j}\right)=p\left(d_{i}\right) p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)p(di​,zk​,wj​)=p(di​)p(zk​∣di​)p(wj​∣zk​)
    P(wj∣di)=∑k=1KP(zk∣di)P(wj∣zk)P(di,wj)=P(di)∑k=1KP(wj∣zk)P(zk∣di)\begin{array}{c} P\left(w_{j} | d_{i}\right)=\sum_{k=1}^{K} P\left(z_{k} | d_{i}\right) P\left(w_{j} | z_{k}\right) \\ P\left(d_{i}, w_{j}\right)=P\left(d_{i}\right) \sum_{k=1}^{K} P\left(w_{j} | z_{k}\right) P\left(z_{k} | d_{i}\right) \end{array}P(wj​∣di​)=∑k=1K​P(zk​∣di​)P(wj​∣zk​)P(di​,wj​)=P(di​)∑k=1K​P(wj​∣zk​)P(zk​∣di​)​

  2. 得到对数似然函数:
    L=∑i=1N∑j=1M[n(di,wj)log⁡P(di)+n(di,wj)log⁡∑k=1KP(wj∣zk)P(zk∣di)]L=\sum_{i=1}^{N} \sum_{j=1}^{M}\left[n\left(d_{i}, w_{j}\right) \log P\left(d_{i}\right)+n\left(d_{i}, w_{j}\right) \log \sum_{k=1}^{K} P\left(w_{j} | z_{k}\right) P\left(z_{k} | d_{i}\right)\right]L=i=1∑N​j=1∑M​[n(di​,wj​)logP(di​)+n(di​,wj​)logk=1∑K​P(wj​∣zk​)P(zk​∣di​)]

  3. 求E-step,即是求解后验概率,根据步骤一的已知可以得到:
    γ(zijk)=p(zk∣di,wj)=p(di)p(zk∣di)p(wj∣zk)∑k=1Kp(di)p(zk∣di)p(wj∣zk)\gamma\left(z_{i j k}\right)=p\left(z_{k} | d_{i}, w_{j}\right)=\frac{p\left(d_{i}\right) p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)}{\sum_{k=1}^{K} p\left(d_{i}\right) p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)}γ(zijk​)=p(zk​∣di​,wj​)=∑k=1K​p(di​)p(zk​∣di​)p(wj​∣zk​)p(di​)p(zk​∣di​)p(wj​∣zk​)​
    和p(di)p(d_i)p(di​)参数无关,消去得到:
    γ(zijk)=p(zk∣di)p(wj∣zk)∑k=1Kp(zk∣di)p(wj∣zk)\gamma\left(z_{i j k}\right)=\frac{p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)}{\sum_{k=1}^{K} p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)}γ(zijk​)=∑k=1K​p(zk​∣di​)p(wj​∣zk​)p(zk​∣di​)p(wj​∣zk​)​

  4. M-step
    (1)求Q函数,对于一对样本而言,有期望函数为:
    ∑k=1Kγ(zijk)log⁡p(di,zk,wj)=∑k=1Kγ(zijk)(log⁡p(zk∣di)p(wj∣zk)+log⁡p(di))\begin{array}{l} \sum_{k=1}^{K} \gamma\left(z_{i j k}\right) \log p\left(d_{i}, z_{k}, w_{j}\right) =\sum_{k=1}^{K} \gamma\left(z_{i j k}\right)\left(\log p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)+\log p\left(d_{i}\right)\right) \end{array}∑k=1K​γ(zijk​)logp(di​,zk​,wj​)=∑k=1K​γ(zijk​)(logp(zk​∣di​)p(wj​∣zk​)+logp(di​))​
    由于和单个样本的logP(di)logP(d_i)logP(di​)为常数,可以不考虑在优化中,简化为:
    ∑k=1Kγ(zijk)(log⁡p(zk∣di)p(wj∣zk))\begin{array}{l} \sum_{k=1}^{K} \gamma\left(z_{i j k}\right)\left(\log p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)\right) \end{array}∑k=1K​γ(zijk​)(logp(zk​∣di​)p(wj​∣zk​))​
    (2)对全部样本有:

Q=∑i=1N∑j=1Mn(di,wj)∑k=1Kγ(zijk)(log⁡p(zk∣di)p(wj∣zk))Q=\sum_{i=1}^{N} \sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \sum_{k=1}^{K} \gamma\left(z_{i j k}\right)\left(\log p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)\right)Q=i=1∑N​j=1∑M​n(di​,wj​)k=1∑K​γ(zijk​)(logp(zk​∣di​)p(wj​∣zk​))

(3)最大化Q函数,结合约束项∑k=1Kp(zk∣d)=1\sum_{k=1}^{K} p\left(z_{k} | d\right)=1∑k=1K​p(zk​∣d)=1和约束项∑w∈Vp(w∣zk)=1\sum_{w \in V} p\left(w | z_{k}\right)=1∑w∈V​p(w∣zk​)=1分别可求到如下:

1)对于p(zk∣di)p\left(z_{k} | d_{i}\right)p(zk​∣di​),根据拉格朗日乘子法:
Lg=Q(θ,θold)+λ(∑k=1Kp(zk∣di)−1)Lg=Q\left(\theta, \theta^{o l d}\right)+\lambda\left(\sum_{k=1}^{K} p\left(z_{k} | d_{i}\right)-1\right)Lg=Q(θ,θold)+λ(k=1∑K​p(zk​∣di​)−1)
2)对p(zk∣di)p\left(z_{k} | d_{i}\right)p(zk​∣di​)求偏导有,
−∑j=1Mn(di,wj)γ(zijk)=λp(zk∣di)-\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)=\lambda p\left(z_{k} | d_{i}\right)−j=1∑M​n(di​,wj​)γ(zijk​)=λp(zk​∣di​)
3)由于∑k=1Kγ(zijk)=1\sum_{k=1}^{K}\gamma\left(z_{i j k}\right)=1∑k=1K​γ(zijk​)=1和∑k=1Kp(zk∣di)=1\sum_{k=1}^{K}p\left(z_{k} | d_{i}\right)=1∑k=1K​p(zk​∣di​)=1,带入上式有:

λ=−∑j=1Mn(di,wj)\lambda=-\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right)λ=−j=1∑M​n(di​,wj​)
4)把λ\lambdaλ带入到上上式中,得到p(zk∣di)p\left(z_{k} | d_{i}\right)p(zk​∣di​)的表达式:
p(zk∣di)=∑j=1Mn(di,wj)γ(zijk)∑j=1Mn(di,wj)p\left(z_{k} | d_{i}\right)=\frac{\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}{\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right)}p(zk​∣di​)=∑j=1M​n(di​,wj​)∑j=1M​n(di​,wj​)γ(zijk​)​

同理,采用拉格朗日乘子法也可以求得p(wj∣zk)p\left(w_{j} | z_{k}\right)p(wj​∣zk​)的表达,过程如下:
1)表达式:
Lg=Q(θ,θold)+λ(∑k=1Kp(wj∣zk)−1)Lg=Q\left(\theta, \theta^{\text {old}}\right)+\lambda\left(\sum_{k=1}^{K} p\left(w_{j} | z_{k}\right)-1\right)Lg=Q(θ,θold)+λ(k=1∑K​p(wj​∣zk​)−1)
2)求偏导得:
−∑i=1Nn(di,wj)γ(zijk)=λp(wj∣zk)-\sum_{i=1}^{N} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)=\lambda p\left(w_{j} | z_{k}\right)−i=1∑N​n(di​,wj​)γ(zijk​)=λp(wj​∣zk​)
3)对参数jjj的词累加得:
λ=−∑i=1N∑j=1Mn(di,wj)γ(zijk)\lambda=-\sum_{i=1}^{N} \sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)λ=−i=1∑N​j=1∑M​n(di​,wj​)γ(zijk​)
4)再带入(2)中,求得:
p(wj∣zk)=∑i=1Nn(di,wj)γ(zijk)∑i=1N∑j=1Mn(di,wj)γ(zijk)p\left(w_{j} | z_{k}\right)=\frac{\sum_{i=1}^{N} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}{\sum_{i=1}^{N} \sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}p(wj​∣zk​)=∑i=1N​∑j=1M​n(di​,wj​)γ(zijk​)∑i=1N​n(di​,wj​)γ(zijk​)​

  1. 总结得到优化的步骤为:
    E-step,求后验概率:
    γ(zijk)=p(zk∣di)p(wj∣zk)∑k=1Kp(zk∣di)p(wj∣zk)\gamma\left(z_{i j k}\right)=\frac{p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)}{\sum_{k=1}^{K} p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)}γ(zijk​)=∑k=1K​p(zk​∣di​)p(wj​∣zk​)p(zk​∣di​)p(wj​∣zk​)​
    M-step:
    p(zk∣di)=∑j=1Mn(di,wj)γ(zijk)∑j=1Mn(di,wj)p\left(z_{k} | d_{i}\right)=\frac{\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}{\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right)}p(zk​∣di​)=∑j=1M​n(di​,wj​)∑j=1M​n(di​,wj​)γ(zijk​)​

p(wj∣zk)=∑i=1Nn(di,wj)γ(zijk)∑i=1N∑j=1Mn(di,wj)γ(zijk)p\left(w_{j} | z_{k}\right)=\frac{\sum_{i=1}^{N} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}{\sum_{i=1}^{N} \sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}p(wj​∣zk​)=∑i=1N​∑j=1M​n(di​,wj​)γ(zijk​)∑i=1N​n(di​,wj​)γ(zijk​)​

本文标签: PLSAEM