文章目录
- 1. 基本概念
- 1.1 HMM模型定义
- 1.2 盒子和球模型
- 1.3 观测序列生成过程
- 1.4 HMM模型3个基本问题
- 2. 概率计算问题
- 2.1 直接计算法
- 2.2 前向算法
- 2.2.1 前向公式证明
- 2.2.2 盒子和球例子
- 2.2.3 前向算法Python代码
- 2.3 后向算法
- 2.3.1 后向公式证明
- 2.3.2 后向算法Python代码
- 2.4 一些概率与期望值
- 3. 学习算法
- 3.1 监督学习方法
- 3.2 无监督 BaumBaumBaum-WelchWelchWelch 算法
- 4. 预测算法
- 4.1 近似算法
- 4.2 维特比 ViterbiViterbiViterbi 算法
- 4.3 维特比算法Python代码
- 5. PosTagging词性标注 实践
隐马尔科夫模型(hidden Markov model,HMM)是可用于 标注问题的统计学习模型,描述由隐藏的马尔可夫链随机生成观测序列的过程,属于生成模型。隐马尔可夫模型在语音识别、自然语言处理、生物信息、模式识别等领域有着广泛的应用。
本文内容部分引用于 李航《统计学习方法》
1. 基本概念
1.1 HMM模型定义
- 隐马尔可夫模型是关于时序的概率模型,描述由一个隐藏的马尔可夫链随机生成不可观测的状态随机序列,再由各个状态生成一个观测从而产生观测随机序列的过程。
- 隐藏的马尔可夫链随机生成的状态的序列,称为状态序列(state sequence);
- 每个状态生成一个观测,而由此产生的观测的随机序列,称为观测序列(observation sequence)。
- 序列的每一个位置又可以看作是一个时刻。
一个不一定恰当的例子:你只知道一个人今天是散步了,逛街了,还是在家打扫卫生,推测今天的天气;这个人在下雨天可能在家打扫卫生,也有可能雨中散步,晴天可能去逛街,也有可能散步,或者打扫卫生
隐藏状态 Y(不可见):下雨,晴天
观察状态 X(可见): 散步,逛街,打扫房间
假设我们观测到的状态是XXX序列,X=(x1,x2,x3,...xn)X=(x_1,x_2,x_3,...x_n)X=(x1,x2,x3,...xn), 隐藏状态是YYY序列,Y=(y1,y2,y3,...yn)Y=(y_1,y_2,y_3,...y_n)Y=(y1,y2,y3,...yn)
我们在知道观测序列XXX的情况下,求有多大概率是该隐藏状态序列YYY
- 贝叶斯公式 P(B∣A)=P(AB)P(A)=P(A∣B)P(B)P(A)P(B|A)=\frac {P(AB)}{P(A)}=\frac{P(A|B)P(B)}{P(A)}P(B∣A)=P(A)P(AB)=P(A)P(A∣B)P(B)
现在我们要求的就是P(Y∣X)P(Y|X)P(Y∣X)
P(Y∣X)=P(y1,y2,y3,⋯,yn∣x1,x2,x3,⋯,xn)=P(x1,x2,x3,⋯,xn∣y1,y2,y3,⋯,yn)P(y1,y2,y3,⋯,yn)P(x1,x2,x3,⋯,xn)(0)P(y1,y2,y3,⋯,yn)=P(y1)P(y2,y3,y4,⋯,yn∣y1)=P(y1)P(y2∣y1)P(y3,y4,⋯,yn∣y1,y2)=P(y1)P(y2∣y1)P(y3∣y1,y2)P(y4,⋯,yn∣y1,y2,y3)⋮=P(y1)P(y2∣y1)P(y3∣y1,y2)⋯P(yn−1∣y1,y2,⋯,yn−2)P(yn∣y1,y2,⋯,yn−1)=P(y1)∏i=2nP(yi∣y1,y2,y3,⋯,yi−1)=P(y1)∏i=2nP(yi∣yi−1)(1)\begin{aligned} P(Y|X) & = P(y_1,y_2,y_3,\cdots,y_n|x_1,x_2,x_3,\cdots,x_n)\\ & =\frac{P(x_1,x_2,x_3,\cdots,x_n|y_1,y_2,y_3,\cdots,y_n)P(y_1,y_2,y_3,\cdots,y_n)}{P(x_1,x_2,x_3,\cdots,x_n)}\quad\quad(0)\\ P(y_1,y_2,y_3,\cdots,y_n) & =P(y_1){\color{red}P(y_2,y_3,y_4,\cdots,y_n|y_1)}\\ & =P(y_1)P(y_2|y_1){\color{red}P(y_3,y_4,\cdots,y_n|y_1,y_2)}\\ & =P(y_1)P(y_2|y_1)P(y_3|y_1,y_2){\color{red}P(y_4,\cdots,y_n|y_1,y_2,y_3)}\\ \vdots\\ & =P(y_1)P(y_2|y_1)P(y_3|y_1,y_2)\cdots P(y_{n-1}|y_1,y_2,\cdots,y_{n-2})P(y_n|y_1,y_2,\cdots,y_{n-1})\\ & =P(y_1)\prod_{i=2}^{n}P(y_i|y_1,y_2,y_3,\cdots,y_{i-1})\\ & =\color{blue}{P(y_1)\prod_{i=2}^{n}P(y_i|y_{i-1})}\quad\quad\quad(1)\\ \end{aligned} P(Y∣X)P(y1,y2,y3,⋯,yn)⋮=P(y1,y2,y3,⋯,yn∣x1,x2,x3,⋯,xn)=P(x1,x2,x3,⋯,xn)P(x1,x2,x3,⋯,xn∣y1,y2,y3,⋯,yn)P(y1,y2,y3,⋯,yn)(0)=P(y1)P(y2,y3,y4,⋯,yn∣y1)=P(y1)P(y2∣y1)P(y3,y4,⋯,yn∣y1,y2)=P(y1)P(y2∣y1)P(y3∣y1,y2)P(y4,⋯,yn∣y1,y2,y3)=P(y1)P(y2∣y1)P(y3∣y1,y2)⋯P(yn−1∣y1,y2,⋯,yn−2)P(yn∣y1,y2,⋯,yn−1)=P(y1)i=2∏nP(yi∣y1,y2,y3,⋯,yi−1)=P(y1)i=2∏nP(yi∣yi−1)(1)
隐马尔可夫模型 两个假设
- 上式最后一步是隐马尔可夫的 齐次性 假设:当前状态 yiy_iyi 仅依赖于前一个状态 yi−1y_{i-1}yi−1
- 下面式子最后一步是隐马尔可夫的 观测独立性 假设:观测值 xix_ixi 只跟他的隐含状态 yiy_iyi 相关
P(x1,x2,x3,⋯,xn∣y1,y2,y3,⋯,yn)=∏i=1nP(xi∣x1,x2,x3,⋯,xi−1,y1,y2,y3,⋯,yn)=∏i=1nP(xi∣yi)(2)\begin{aligned} &P(x_1,x_2,x_3,\cdots,x_n|y_1,y_2,y_3,\cdots,y_n)\\ =&\prod_{i=1}^{n}P(x_i|x_1,x_2,x_3,\cdots,x_{i-1},y_1,y_2,y_3,\cdots,y_n)\\ =&\color{blue}\prod_{i=1}^{n}P(x_i|y_i)\quad\quad\quad(2)\\ \end{aligned} ==P(x1,x2,x3,⋯,xn∣y1,y2,y3,⋯,yn)i=1∏nP(xi∣x1,x2,x3,⋯,xi−1,y1,y2,y3,⋯,yn)i=1∏nP(xi∣yi)(2)
- 由(1)(2)(1)(2)(1)(2), 且(0)(0)(0)式忽略分母
P(Y∣X)=P(y1,y2,y3,⋯,yn∣x1,x2,x3,⋯,xn)=P(x1,x2,x3,⋯,xn∣y1,y2,y3,⋯,yn)P(y1,y2,y3,⋯,yn)P(x1,x2,x3,⋯,xn)∝P(y1)∏i=2nP(yi∣yi−1)∏i=1nP(xi∣yi)\begin{aligned} P(Y|X) =&P(y_1,y_2,y_3,\cdots,y_n|x_1,x_2,x_3,\cdots,x_n)\\ =&\frac{P(x_1,x_2,x_3,\cdots,x_n|y_1,y_2,y_3,\cdots,y_n)P(y_1,y_2,y_3,\cdots,y_n)}{P(x_1,x_2,x_3,\cdots,x_n)}\\ \propto & \color{blue}{P(y_1)\prod_{i=2}^{n}P(y_i|y_{i-1})}\prod_{i=1}^{n}P(x_i|y_i) \end{aligned} P(Y∣X)==∝P(y1,y2,y3,⋯,yn∣x1,x2,x3,⋯,xn)P(x1,x2,x3,⋯,xn)P(x1,x2,x3,⋯,xn∣y1,y2,y3,⋯,yn)P(y1,y2,y3,⋯,yn)P(y1)i=2∏nP(yi∣yi−1)i=1∏nP(xi∣yi)
隐马尔可夫模型由 三要素 决定
- 初始状态概率向量 π\piπ,(初始处于各个隐藏状态 yiy_iyi 的概率)
- 状态转移概率矩阵 AAA,(即式(1)中的 P(yi∣yi−1)P(y_i|y_{i-1})P(yi∣yi−1) 的各种项构成的矩阵)
- 观测概率矩阵 BBB, (即式(2)中的 P(xi∣yi)P(x_i|y_i)P(xi∣yi) 的各种项构成的矩阵)
π\piπ 和 AAA 决定状态序列,BBB 决定观测序列。隐马尔可夫模型 λ\lambdaλ 可以用三元符号表示,λ=(A,B,π)\lambda=(A,B,\pi)λ=(A,B,π)
- 状态转移概率矩阵 AAA 与 初始状态概率向量 π\piπ 确定了隐藏的马尔可夫链,生成不可观测的状态序列。
- 观测概率矩阵 BBB 确定了如何从隐藏状态 yiy_iyi 生成观测 xix_ixi,与状态序列综合确定了如何产生观测序列。
1.2 盒子和球模型
假设有4个盒子,每个盒子里都装有红、白两种颜色的球,盒子里的红、白球数由表列出。
按下面的方法抽球,产生一个球的颜色的观测序列 XXX:
- 等概率随机选1个盒子,从这个盒子里随机抽出1个球,记录颜色,放回;
- 从当前盒子随机转移到下一个盒子,规则是:如果当前盒子是盒子1,那么下一盒子一定是盒子2;如果当前是盒子2或3,那么分别以概率0.4和0.6转移到左边或右边的盒子;如果当前是盒子4,那么各以0.5的概率停留在盒子4或转移到盒子3;
- 确定转移的盒子后,再从这个盒子里随机抽出1个球,记录颜色,放回;
- 重复5次,得到一个球的颜色的观测序列
- 各个隐含状态的初始概率 P(yi)P(y_i)P(yi) : π=(0.25,0.25,0.25,0.25)T\pi = (0.25,0.25,0.25,0.25)^Tπ=(0.25,0.25,0.25,0.25)T
- 各个隐含状态间的转移概率 P(yi∣yi−1)P(y_i|y_{i-1})P(yi∣yi−1) : A=[01000.400.6000.400.6000.50.5]A = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0.4 & 0 & 0.6 & 0 \\ 0 & 0.4 & 0 & 0.6 \\ 0 & 0 & 0.5 & 0.5 \\ \end{bmatrix}A=⎣⎢⎢⎡00.400100.4000.600.5000.60.5⎦⎥⎥⎤
- 观测(发射)概率 P(xi∣yi)P(x_i|y_i)P(xi∣yi):B=[0.50.50.30.70.60.40.80.2]B = \begin{bmatrix} 0.5 & 0.5 \\ 0.3 & 0.7 \\ 0.6 & 0.4 \\ 0.8 & 0.2 \\ \end{bmatrix}B=⎣⎢⎢⎡0.50.30.60.80.50.70.40.2⎦⎥⎥⎤
1.3 观测序列生成过程
- 输入:隐马模型 λ=(A,B,π)\lambda=(A,B,\pi)λ=(A,B,π),观测序列长度 TTT (该例子为 5)
- 输出:观测序列 X=(x1,x2,x3,x4,x5)X = (x_1,x_2,x_3,x_4,x_5)X=(x1,x2,x3,x4,x5) , (红、白球)
- 跟据 π\piπ 随机产生一个 yiy_iyi 状态(盒子), 序列长度 t = 0
- 在 yiy_iyi 中,按照其观测概率 BBB 对应的行,产生一个观测序列 XXX 的元素 xix_ixi(红球 or 白球),计数 t++
- 根据 yiy_iyi 的状态转移概率 AAA 对应的行,产生下一个状态 yi+1y_{i+1}yi+1 ,
while(t < T), 重复2,3步骤
1.4 HMM模型3个基本问题
- 概率计算问题:给定模型 λ=(A,B,π)\lambda=(A,B,\pi)λ=(A,B,π) 和 观测序列 X=(x1,x2.....,xn)X = (x_1,x_2.....,x_n)X=(x1,x2.....,xn), 计算在模型 λ\lambdaλ 下,观测序列 XXX 出现的概率 P(X∣λ)P(X|\lambda)P(X∣λ)
- 学习问题:已知观测序列 X=(x1,x2.....,xn)X = (x_1,x_2.....,x_n)X=(x1,x2.....,xn),估计模型 λ=(A,B,π)\lambda=(A,B,\pi)λ=(A,B,π) 的参数,使得在该模型下,观测序列概率 P(X∣λ)P(X|\lambda)P(X∣λ) 最大。极大似然估计的方法估计参数
- 预测问题:也称解码(decoding)问题。已知模型 λ=(A,B,π)\lambda=(A,B,\pi)λ=(A,B,π) 和 观测序列 X=(x1,x2.....,xn)X = (x_1,x_2.....,x_n)X=(x1,x2.....,xn),求对给定观测序列条件概率 P(Y∣X)P(Y|X)P(Y∣X) 最大的状态序列 Y=(y1,y2......,yn)Y = (y_1,y_2......,y_n)Y=(y1,y2......,yn)。即给定观测序列 XXX,求最有可能的对应隐含状态序列 YYY
2. 概率计算问题
给定模型 λ=(A,B,π)\lambda=(A,B,\pi)λ=(A,B,π) 和 观测序列 X=(x1,x2.....,xn)X = (x_1,x_2.....,x_n)X=(x1,x2.....,xn), 计算在模型 λ\lambdaλ 下,观测序列 XXX 出现的概率 P(X∣λ)P(X|\lambda)P(X∣λ)
2.1 直接计算法
- 列举所有的长度为 TTT 的状态序列 Y=(y1,y2......,yt)Y = (y_1,y_2......,y_t)Y=(y1,y2......,yt)
- 求 各个 状态序列 YYY 与 观测序列 X=(x1,x2.....,xt)X = (x_1,x_2.....,x_t)X=(x1,x2.....,xt) 的联合概率 P(XY∣λ)P(XY|\lambda)P(XY∣λ)
- 然后对上面所有的状态序列求和 Σ\SigmaΣ,得到 P(X∣λ)P(X|\lambda)P(X∣λ)
- 状态序列 Y=(y1,y2......,yt)Y = (y_1,y_2......,y_t)Y=(y1,y2......,yt) 的概率是:P(Y∣λ)=πy1Ay1,y2Ay2,y3......Ayt−1,ytP(Y|\lambda)=\pi_{y_1}A_{y_1,y_2}A_{y_2,y_3}......A_{y_{t-1},y_t}P(Y∣λ)=πy1Ay1,y2Ay2,y3......Ayt−1,yt
- 对固定的状态序列 Y=(y1,y2......,yt)Y = (y_1,y_2......,y_t)Y=(y1,y2......,yt) ,观测序列 X=(x1,x2.....,xt)X = (x_1,x_2.....,x_t)X=(x1,x2.....,xt) 的概率是:P(X∣Y,λ)=By1,x1By2,x2......Byt,xtP(X|Y,\lambda)=B_{y_1,x_1}B_{y_2,x_2}......B_{y_t,x_t}P(X∣Y,λ)=By1,x1By2,x2......Byt,xt
- X,YX,YX,Y 同时出现的联合概率为:P(XY∣λ)=P(X∣Y,λ)P(Y∣λ)=πy1By1,x1Ay1,y2By2,x2......Ayt−1,ytByt,xtP(XY|\lambda)=P(X|Y,\lambda)P(Y|\lambda)=\pi_{y_1}B_{y_1,x_1}A_{y_1,y_2}B_{y_2,x_2}......A_{y_{t-1},y_t}B_{y_t,x_t}P(XY∣λ)=P(X∣Y,λ)P(Y∣λ)=πy1By1,x1Ay1,y2By2,x2......Ayt−1,ytByt,xt
- 对上式在所有可能的状态序列 YiY_iYi 的情况下求和 Σ\SigmaΣ,即可得到 P(X∣λ)=∑Y1YnP(X∣Y,λ)P(Y∣λ)P(X|\lambda)=\sum_{Y_1}^{Y_n} P(X|Y,\lambda)P(Y|\lambda)P(X∣λ)=∑Y1YnP(X∣Y,λ)P(Y∣λ)
但是上面计算量很大,复杂度 O(TNT)O(TN^T)O(TNT),不可行
2.2 前向算法
-
前向概率 概念:
给定HMM模型 λ\lambdaλ,定义到时刻 t 部分观测序列 Xpart=(x1,x2.....,xt)X_{part} = (x_1,x_2.....,x_t)Xpart=(x1,x2.....,xt) 且 t 时刻的状态 yty_tyt 为 qiq_iqi 的概率为前向概率,记为:at(i)=P(x1,x2,...,xt,yt=qi∣λ)a_t(i)=P(x_1,x_2,...,x_t,y_t=q_i | \lambda)at(i)=P(x1,x2,...,xt,yt=qi∣λ) -
递推求解 前向概率 at(i)a_t(i)at(i) 及 观测序列概率 P(X∣λ)P(X|\lambda)P(X∣λ)
输入:给定HMM模型 λ\lambdaλ, 观测序列 XXX
输出:观测序列概率 P(X∣λ)P(X|\lambda)P(X∣λ)
- 初值: a1(i)=πiBi,x1i=1,2,...,Na_1(i) = \pi_iB_{i,x_1}\quad\quad i = 1,2,...,Na1(i)=πiBi,x1i=1,2,...,N ,NNN 为盒子个数
- 递推:对 t=1,2,...,T−1t=1,2,...,T-1t=1,2,...,T−1,at+1(i)=[∑j=1Nat(j)Aj,i]Bi,xt+1i=1,2,...,Na_{t+1}(i)= \begin{bmatrix} \sum_{j=1}^N a_t(j)A_{j,i} \end{bmatrix}B_{i,x_{t+1}}\quad i = 1,2,...,Nat+1(i)=[∑j=1Nat(j)Aj,i]Bi,xt+1i=1,2,...,N
- 终止: P(X∣λ)=∑i=1NaT(i)P(X|\lambda) = \sum_{i=1}^N a_T(i)P(X∣λ)=∑i=1NaT(i)(看前向概率定义,全概率公式)
算法解释:
-
初始时刻 t=1t=1t=1,观测到的是 x1x_1x1, 可能的状态是盒子的编号 i=1,2,...,Ni = 1,2,...,Ni=1,2,...,N,概率为 πi\pi_iπi,在 iii 盒子发射出 x1x_1x1 颜色球的概率为 Bi,x1B_{i,x_1}Bi,x1,所以 a1(i)=πiBi,x1a_1(i) = \pi_iB_{i,x_1}a1(i)=πiBi,x1
-
递推:上一时刻 ttt 的 N 种情况下 都可能转移到 t+1t+1t+1 时的 qiq_iqi 状态,对应的前向概率乘以转移概率,并求和,得到 状态 qiq_iqi 的概率,再乘以发射概率 Bi,xt+1B_{i,x_{t+1}}Bi,xt+1,就是 t+1t+1t+1 时的前向概率
-
最后一个时刻 TTT 时的所有前向概率求和就是 P(X∣λ)=∑i=1NaT(i)P(X|\lambda) = \sum_{i=1}^N a_T(i)P(X∣λ)=∑i=1NaT(i)
-
前向算法是基于路径的,t+1t+1t+1 时刻,直接用 ttt 时刻的结果,时间复杂度 O(TN2)O(TN^2)O(TN2)
2.2.1 前向公式证明
首先有公式联合概率 P(ABC)=P(A)P(B∣A)P(C∣AB)P(ABC) = P(A)P(B|A)P(C|AB)P(ABC)=P(A)P(B∣A)P(C∣AB),对任意多个项都成立
递推公式证明:
at(j)Aj,i=P(x1,x2,...,xt,yt=qj∣λ)P(yt+1=qi∣yt=qj,λ)=P(x1,x2,...,xt∣yt=qj,λ)P(yt=qj∣λ)P(yt+1=qi∣yt=qj,λ)=P(x1,x2,...,xt∣yt=qj,yt+1=qi,λ)P(yt=qj,yt+1=qi∣λ)=P(x1,x2,...,xt,yt=qj,yt+1=qi∣λ){∑j=1Nat(j)Aj,i}Bi,xt+1=P(x1,x2,...,xt,yt+1=qi∣λ)P(xt+1∣yt+1=qi,λ)=P(x1,x2,...,xt∣yt+1=qi,λ)P(yt+1=qi∣λ)P(xt+1∣yt+1=qi,λ)=P(x1,x2,...,xt∣yt+1=qi,xt+1,λ)P(xt+1,yt+1=qi∣λ)=P(x1,x2,...,xt,xt+1,yt+1=qi∣λ)=at+1(i)\begin{aligned} a_t(j)A_{j,i} &= P(x_1,x_2,...,x_t,y_t=q_j|\lambda)P(y_{t+1}=q_i|y_t=q_j,\lambda)\\ &={\color{red}P(x_1,x_2,...,x_t|y_t=q_j,\lambda)}P(y_t=q_j|\lambda)P(y_{t+1}=q_i|y_t=q_j,\lambda)\\ &= {\color{red}P(x_1,x_2,...,x_t|y_t=q_j,{\color{blue}y_{t+1}=q_i},\lambda)}P(y_t=q_j,y_{t+1}=q_i|\lambda)\\ &=P(x_1,x_2,...,x_t,y_t=q_j,y_{t+1}=q_i|\lambda)\\ \{\sum_{j=1}^N a_t(j)A_{j,i}\} B_{i,x_{t+1}} &= P(x_1,x_2,...,x_t,y_{t+1}=q_i|\lambda)P(x_{t+1}|y_{t+1}=q_i,\lambda)\\ &={\color{red}P(x_1,x_2,...,x_t|y_{t+1}=q_i,\lambda)}P(y_{t+1}=q_i|\lambda)P(x_{t+1}|y_{t+1}=q_i,\lambda)\\ &={\color{red}P(x_1,x_2,...,x_t|y_{t+1}=q_i,{\color{blue}x_{t+1}},\lambda)}P(x_{t+1},y_{t+1}=q_i|\lambda)\\ &=P(x_1,x_2,...,x_t,x_{t+1},y_{t+1}=q_i|\lambda)=a_{t+1}(i) \end{aligned} at(j)Aj,i{j=1∑Nat(j)Aj,i}Bi,xt+1=P(x1,x2,...,xt,yt=qj∣λ)P(yt+1=qi∣yt=qj,λ)=P(x1,x2,...,xt∣yt=qj,λ)P(yt=qj∣λ)P(yt+1=qi∣yt=qj,λ)=P(x1,x2,...,xt∣yt=qj,yt+1=qi,λ)P(yt=qj,yt+1=qi∣λ)=P(x1,x2,...,xt,yt=qj,yt+1=qi∣λ)=P(x1,x2,...,xt,yt+1=qi∣λ)P(xt+1∣yt+1=qi,λ)=P(x1,x2,...,xt∣yt+1=qi,λ)P(yt+1=qi∣λ)P(xt+1∣yt+1=qi,λ)=P(x1,x2,...,xt∣yt+1=qi,xt+1,λ)P(xt+1,yt+1=qi∣λ)=P(x1,x2,...,xt,xt+1,yt+1=qi∣λ)=at+1(i)
第一个蓝色处:前 ttt 个观测序列,显然跟 t+1t+1t+1 时刻的状态 yt+1y_{t+1}yt+1无关,第二个蓝色处:观测独立性
终止公式证明(全概率公式):
∑i=1NaT(i)=P(x1,x2,...,xT,yT=q1∣λ)+P(x1,x2,...,xT,yT=q2∣λ)+P(x1,x2,...,xT,yT=qN∣λ)=P(x1,x2,...,xT∣λ)=P(X∣λ)\begin{aligned} \sum_{i=1}^N a_T(i)&=P(x_1,x_2,...,x_T,y_T=q_1|\lambda)+P(x_1,x_2,...,x_T,y_T=q_2|\lambda)+P(x_1,x_2,...,x_T,y_T=q_N|\lambda)\\ &= P(x_1,x_2,...,x_T|\lambda)=P(X|\lambda) \end{aligned} i=1∑NaT(i)=P(x1,x2,...,xT,yT=q1∣λ)+P(x1,x2,...,xT,yT=q2∣λ)+P(x1,x2,...,xT,yT=qN∣λ)=P(x1,x2,...,xT∣λ)=P(X∣λ)
2.2.2 盒子和球例子
考虑盒子和球模型 λ=(A,B,π)\lambda = (A,B,\pi)λ=(A,B,π),状态集合(盒子的编号)Q={1,2,3}Q=\{1,2,3\}Q={1,2,3},观测集合 V={红,白}V=\{红,白\}V={红,白}
π=[0.20.40.4],A=[0.50.20.30.30.50.20.20.30.5],B=[0.50.50.40.60.70.3]\pi=\begin{bmatrix} 0.2 \\ 0.4 \\ 0.4 \\ \end{bmatrix}, A = \begin{bmatrix} 0.5 & 0.2 & 0.3 \\ 0.3 & 0.5 & 0.2 \\ 0.2 & 0.3 & 0.5 \\ \end{bmatrix}, B = \begin{bmatrix} 0.5 & 0.5 \\ 0.4 & 0.6 \\ 0.7 & 0.3 \\ \end{bmatrix} π=⎣⎡0.20.40.4⎦⎤,A=⎣⎡0.50.30.20.20.50.30.30.20.5⎦⎤,B=⎣⎡0.50.40.70.50.60.3⎦⎤
设总的时间长度 T=3T=3T=3, 观测序列 X=(红,白,红)X=(红,白,红)X=(红,白,红),用前向算法计算 P(X∣λ)P(X|\lambda)P(X∣λ)
解 :
-
计算前向概率初值:a1(i)=πiBi,x1a_1(i) = \pi_iB_{i,x_1}a1(i)=πiBi,x1
a1(1)=π1B1,1=0.2∗0.5=0.10a_1(1) = \pi_1B_{1,1}=0.2*0.5=0.10a1(1)=π1B1,1=0.2∗0.5=0.10,(1号盒子,时刻1摸出红色(第一列)的概率)
a1(2)=π2B2,1=0.4∗0.4=0.16a_1(2) = \pi_2B_{2,1}=0.4*0.4=0.16a1(2)=π2B2,1=0.4∗0.4=0.16,(2号盒子,时刻1摸出红色的概率)
a1(3)=π3B3,1=0.4∗0.7=0.28a_1(3) = \pi_3B_{3,1}=0.4*0.7=0.28a1(3)=π3B3,1=0.4∗0.7=0.28,(3号盒子,时刻1摸出红色的概率) -
递推计算:对 t=1,2,...,T−1t=1,2,...,T-1t=1,2,...,T−1,at+1(i)=[∑j=1Nat(j)Aj,i]Bi,xt+1i=1,2,...,Na_{t+1}(i)= \begin{bmatrix} \sum_{j=1}^N a_t(j)A_{j,i} \end{bmatrix}B_{i,x_{t+1}}\quad i = 1,2,...,Nat+1(i)=[∑j=1Nat(j)Aj,i]Bi,xt+1i=1,2,...,N
a2(1)={∑j=13a1(j)Aj,1}B1,2=(0.10∗0.5+0.16∗0.3+0.28∗0.2)∗0.5=0.077a_2(1)=\{ \sum_{j=1}^3 a_1(j)A_{j,1} \} B_{1,2}=(0.10*0.5+0.16*0.3+0.28*0.2)*0.5=0.077a2(1)={∑j=13a1(j)Aj,1}B1,2=(0.10∗0.5+0.16∗0.3+0.28∗0.2)∗0.5=0.077
(时刻2时,从1号盒子摸出白色的概率)
a2(2)={∑j=13a1(j)Aj,2}B2,2=(0.10∗0.2+0.16∗0.5+0.28∗0.3)∗0.6=0.1104a_2(2)=\{ \sum_{j=1}^3 a_1(j)A_{j,2} \} B_{2,2}=(0.10*0.2+0.16*0.5+0.28*0.3)*0.6=0.1104a2(2)={∑j=13a1(j)Aj,2}B2,2=(0.10∗0.2+0.16∗0.5+0.28∗0.3)∗0.6=0.1104
(时刻2时,从2号盒子摸出白色的概率)
a2(3)={∑j=13a1(j)Aj,3}B3,2=(0.10∗0.3+0.16∗0.2+0.28∗0.5)∗0.3=0.0606a_2(3)=\{ \sum_{j=1}^3 a_1(j)A_{j,3} \} B_{3,2}=(0.10*0.3+0.16*0.2+0.28*0.5)*0.3=0.0606a2(3)={∑j=13a1(j)Aj,3}B3,2=(0.10∗0.3+0.16∗0.2+0.28∗0.5)∗0.3=0.0606
(时刻2时,从3号盒子摸出白色的概率)
a3(1)={∑j=13a2(j)Aj,1}B1,1=(0.077∗0.5+0.1104∗0.3+0.0606∗0.2)∗0.5=0.04187a_3(1)=\{ \sum_{j=1}^3 a_2(j)A_{j,1} \} B_{1,1}=(0.077*0.5+0.1104*0.3+0.0606*0.2)*0.5=0.04187a3(1)={∑j=13a2(j)Aj,1}B1,1=(0.077∗0.5+0.1104∗0.3+0.0606∗0.2)∗0.5=0.04187
(时刻3时,从1号盒子摸出红色的概率)
a3(2)={∑j=13a2(j)Aj,2}B2,1=(0.077∗0.2+0.1104∗0.5+0.0606∗0.3)∗0.4=0.035512a_3(2)=\{ \sum_{j=1}^3 a_2(j)A_{j,2} \} B_{2,1}=(0.077*0.2+0.1104*0.5+0.0606*0.3)*0.4=0.035512a3(2)={∑j=13a2(j)Aj,2}B2,1=(0.077∗0.2+0.1104∗0.5+0.0606∗0.3)∗0.4=0.035512
(时刻3时,从2号盒子摸出红色的概率)
a3(3)={∑j=13a2(j)Aj,3}B3,1=(0.077∗0.3+0.1104∗0.2+0.0606∗0.5)∗0.7=0.052836a_3(3)=\{ \sum_{j=1}^3 a_2(j)A_{j,3} \} B_{3,1}=(0.077*0.3+0.1104*0.2+0.0606*0.5)*0.7=0.052836a3(3)={∑j=13a2(j)Aj,3}B3,1=(0.077∗0.3+0.1104∗0.2+0.0606∗0.5)∗0.7=0.052836
(时刻3时,从3号盒子摸出红色的概率)
- 终止:P(X∣λ)=∑i=1NaT(i)=∑i=13a3(i)=0.04187+0.035512+0.052836=0.130218P(X|\lambda) = \sum_{i=1}^N a_T(i) = \sum_{i=1}^3 a_3(i)=0.04187+0.035512+0.052836=0.130218P(X∣λ)=∑i=1NaT(i)=∑i=13a3(i)=0.04187+0.035512+0.052836=0.130218
2.2.3 前向算法Python代码
- 借鉴他人代码,手敲一遍并理解
# -*- coding:utf-8 -*-
# python3.7
# @Time: 2019/12/17 22:33
# @Author: Michael Ming
# @Website: https://michael.blog.csdn.net/
# @File: hiddenMarkov.py
import numpy as np
class HiddenMarkov:def forward(self, Q, V, A, B, X, PI): # 前向算法N = len(Q) # 隐藏状态数量M = len(X) # 观测序列大小alphas = np.zeros((N, M)) # 前向概率 alphas 矩阵T = M # 时刻长度,即观测序列长度for t in range(T): # 遍历每个时刻,计算前向概率 alphasindexOfXi = V.index(X[t]) # 观测值Xi对应的V中的索引for i in range(N): # 对每个状态进行遍历if t == 0: # 计算初值alphas[i][t] = PI[t][i] * B[i][indexOfXi] # a1(i)=πi B(i,x1)print("alphas1(%d)=p%dB%d(x1)=%f" %(i,i,i,alphas[i][t]))else:alphas[i][t] = np.dot([alpha[t-1] for alpha in alphas], # 取的alphas的t-1列[a[i] for a in A]) *B[i][indexOfXi] # 递推公式print("alpha%d(%d)=[sigma alpha%d(i)ai%d]B%d(x%d)=%f"%(t, i, t-1, i, i, t, alphas[i][t]))print(alphas)P = sum(alpha[M-1] for alpha in alphas) # 最后一列概率的和print("P=%f" % P)Q = [1, 2, 3]
V = ['红', '白']
A = [[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]]
B = [[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]]
# X = ['红', '白', '红', '红', '白', '红', '白', '白']
X = ['红', '白', '红'] # 书上的例子
PI = [[0.2, 0.4, 0.4]]hmm = HiddenMarkov()
hmm.forward(Q, V, A, B, X, PI)
关于numpy的一些操作:
>>> a
array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
>>> b
array([[ 7, 8],[ 9, 10],[11, 12]])
>>> np.dot(a[0],[B[1] for B in b])
64
>>> np.dot([A[0] for A in a],[B[1] for B in b])
132
>>> np.multiply(a[0],[B[1] for B in b])
array([ 8, 20, 36])
>>> np.multiply([A[0] for A in a],[B[1] for B in b])
array([ 8, 40, 84])
运行结果:(跟上面计算一致)
alphas1(0)=p0B0(x1)=0.100000
alphas1(1)=p1B1(x1)=0.160000
alphas1(2)=p2B2(x1)=0.280000
alpha1(0)=[sigma alpha0(i)ai0]B0(x1)=0.077000
[[0.1 0.077 0. ][0.16 0. 0. ][0.28 0. 0. ]]
alpha1(1)=[sigma alpha0(i)ai1]B1(x1)=0.110400
[[0.1 0.077 0. ][0.16 0.1104 0. ][0.28 0. 0. ]]
alpha1(2)=[sigma alpha0(i)ai2]B2(x1)=0.060600
[[0.1 0.077 0. ][0.16 0.1104 0. ][0.28 0.0606 0. ]]
alpha2(0)=[sigma alpha1(i)ai0]B0(x2)=0.041870
[[0.1 0.077 0.04187][0.16 0.1104 0. ][0.28 0.0606 0. ]]
alpha2(1)=[sigma alpha1(i)ai1]B1(x2)=0.035512
[[0.1 0.077 0.04187 ][0.16 0.1104 0.035512][0.28 0.0606 0. ]]
alpha2(2)=[sigma alpha1(i)ai2]B2(x2)=0.052836
[[0.1 0.077 0.04187 ][0.16 0.1104 0.035512][0.28 0.0606 0.052836]]
P=0.130218
2.3 后向算法
-
后向概率 概念:
给定HMM模型 λ\lambdaλ,定义到时刻 ttt 状态 yty_tyt 为 qiq_iqi 的条件下,从 t+1t+1t+1 到 TTT 的部分观测序列 Xpart=(xt+1,xt+2.....,xT)X_{part} = (x_{t+1},x_{t+2}.....,x_T)Xpart=(xt+1,xt+2.....,xT) 的概率为后向概率,记为:βt(i)=P(xt+1,xt+2.....,xT∣yt=qi,λ)\beta_t(i)=P(x_{t+1},x_{t+2}.....,x_T|y_t=q_i , \lambda)βt(i)=P(xt+1,xt+2.....,xT∣yt=qi,λ) -
递推求解 后向概率 βt(i)\beta_t(i)βt(i) 及 观测序列概率 P(X∣λ)P(X|\lambda)P(X∣λ)
输入:给定HMM模型 λ\lambdaλ, 观测序列 XXX
输出:观测序列概率 P(X∣λ)P(X|\lambda)P(X∣λ)
- 初值: βT(i)=1i=1,2,...,N\beta_T(i) = 1\quad\quad i = 1,2,...,NβT(i)=1i=1,2,...,N ,NNN 为盒子个数
- 递推:对 t=T−1,T−2,...,1t=T-1,T-2,...,1t=T−1,T−2,...,1,βt(i)=∑j=1NAi,jBj,xt+1βt+1(j)i=1,2,...,N\beta_{t}(i)=\sum_{j=1}^N A_{i,j}B_{j,x_{t+1}}\beta_{t+1}(j)\quad i = 1,2,...,Nβt(i)=∑j=1NAi,jBj,xt+1βt+1(j)i=1,2,...,N
- 终止: P(X∣λ)=∑i=1NπiBi,x1β1(i)P(X|\lambda) = \sum_{i=1}^N \pi_iB_{i,x_1}\beta_1(i)P(X∣λ)=∑i=1NπiBi,x1β1(i)
算法解释:
2.3.1 后向公式证明
递推公式证明:
∑j=1NAi,jBj,xt+1βt+1(j)=∑j=1NAi,jP(xt+1∣yt+1=qj,λ)P(xt+2,...,xT∣yt+1=qj,λ)=∑j=1NAi,jP(xt+1∣xt+2,...,xT,yt+1=qj,λ)P(xt+2,...,xT∣yt+1=qj,λ)=∑j=1NAi,jP(xt+1,xt+2,...,xT∣yt+1=qj,λ)=∑j=1NAi,jP(xt+1,xt+2,...,xT∣yt+1=qj,yt=qi,λ)=∑j=1NP(yt+1=qj∣yt=qi,λ)P(xt+1,xt+2,...,xT∣yt+1=qj,yt=qi,λ)=∑j=1NP(xt+1,xt+2,...,xT,yt+1=qj∣yt=qi,λ)=P(xt+1,xt+2,...,xT∣yt=qi,λ)=βt(i)\begin{aligned} \sum_{j=1}^N A_{i,j}B_{j,x_{t+1}}\beta_{t+1}(j) &= \sum_{j=1}^N A_{i,j}{\color{red}P(x_{t+1}|y_{t+1}=q_j,\lambda)}P(x_{t+2},...,x_T|y_{t+1}=q_j,\lambda)\\ &= \sum_{j=1}^N A_{i,j}{\color{red}P(x_{t+1}|{\color{blue}x_{t+2},...,x_T},y_{t+1}=q_j,\lambda)}P(x_{t+2},...,x_T|y_{t+1}=q_j,\lambda)\\ &= \sum_{j=1}^N A_{i,j}P(x_{t+1},x_{t+2},...,x_T|{\color{red}y_{t+1}=q_j},\lambda)\\ &= \sum_{j=1}^N A_{i,j}P(x_{t+1},x_{t+2},...,x_T|{\color{red}y_{t+1}=q_j},{\color{blue}y_t=q_i},\lambda)\\ &= \sum_{j=1}^N P(y_{t+1}=q_j|y_t=q_i,\lambda)P(x_{t+1},x_{t+2},...,x_T|{\color{red}y_{t+1}=q_j},{\color{blue}y_t=q_i},\lambda)\\ &= \sum_{j=1}^N P(x_{t+1},x_{t+2},...,x_T,{\color{red}y_{t+1}=q_j}|{\color{blue}y_t=q_i},\lambda)\\ &= P(x_{t+1},x_{t+2},...,x_T|{\color{blue}y_t=q_i},\lambda)=\beta_{t}(i)\\ \end{aligned}\\ j=1∑NAi,jBj,xt+1βt+1(j)=j=1∑NAi,jP(xt+1∣yt+1=qj,λ)P(xt+2,...,xT∣yt+1=qj,λ)=j=1∑NAi,jP(xt+1∣xt+2,...,xT,yt+1=qj,λ)P(xt+2,...,xT∣yt+1=qj,λ)=j=1∑NAi,jP(xt+1,xt+2,...,xT∣yt+1=qj,λ)=j=1∑NAi,jP(xt+1,xt+2,...,xT∣yt+1=qj,yt=qi,λ)=j=1∑NP(yt+1=qj∣yt=qi,λ)P(xt+1,xt+2,...,xT∣yt+1=qj,yt=qi,λ)=j=1∑NP(xt+1,xt+2,...,xT,yt+1=qj∣yt=qi,λ)=P(xt+1,xt+2,...,xT∣yt=qi,λ)=βt(i)
第一个蓝色处:观测独立性;第二个蓝色处:观测独立性(xt+1,...xTx_{t+1},...x_Txt+1,...xT都与yty_tyt无关)
终止公式证明:
πiBi,x1β1(i)=P(y1=qi∣λ)P(x1∣y1=qi,λ)P(x2,x3,...,xT∣y1=qi,λ)=P(x1,y1=qi∣λ)P(x2,x3,...,xT∣x1,y1=qi,λ)=P(x1,x2,...,xT,y1=qi∣λ)∑i=1NπiBi,x1β1(i)=∑i=1NP(x1,x2,...,xT,y1=qi∣λ)=P(x1,x2,...,xT∣λ)=P(X∣λ)\begin{aligned} \pi_iB_{i,x_1}\beta_1(i)&=P(y_1=q_i|\lambda)P(x_1|y_1=q_i,\lambda)\color{red}P(x_2,x_3,...,x_T|y_1=q_i,\lambda)\\ &=P(x_1,y_1=q_i|\lambda)\color{red}P(x_2,x_3,...,x_T|{\color{blue}{x_1}},y_1=q_i,\lambda)\\ &=P(x_1,x_2,...,x_T,y_1=q_i|\lambda)\\ \sum_{i=1}^N \pi_iB_{i,x_1}\beta_1(i) &= \sum_{i=1}^N P(x_1,x_2,...,x_T,y_1=q_i|\lambda)=P(x_1,x_2,...,x_T|\lambda)=P(X|\lambda)\\ \end{aligned} πiBi,x1β1(i)i=1∑NπiBi,x1β1(i)=P(y1=qi∣λ)P(x1∣y1=qi,λ)P(x2,x3,...,xT∣y1=qi,λ)=P(x1,y1=qi∣λ)P(x2,x3,...,xT∣x1,y1=qi,λ)=P(x1,x2,...,xT,y1=qi∣λ)=i=1∑NP(x1,x2,...,xT,y1=qi∣λ)=P(x1,x2,...,xT∣λ)=P(X∣λ)
利用 前向 和 后向 概率的定义,可以将观测序列概率 P(X∣λ)P(X|\lambda)P(X∣λ) 写成:
P(X∣λ)=∑i=1N∑j=1Nαt(i)Ai,jBj,xt+1βt+1(j),t=1,2,...,T−1P(X|\lambda)=\sum_{i=1}^N\sum_{j=1}^N \alpha_t(i)A_{i,j}B_{j,x_{t+1}}\beta_{t+1}(j),\quad t= 1,2,...,T-1P(X∣λ)=∑i=1N∑j=1Nαt(i)Ai,jBj,xt+1βt+1(j),t=1,2,...,T−1
证明如下:
∑i=1N∑j=1Nαt(i)Ai,jBj,xt+1βt+1(j)=∑i=1N∑j=1NP(x1,x2,...,xt,yt=qi∣λ)P(yt+1=qj∣yt=qi,λ)∗P(xt+1∣yt+1=qj,λ)P(xt+2,xt+3,...,xT∣yt+1=qj,λ)=∑i=1N∑j=1NP(x1,x2,...,xt∣yt=qi,λ)P(yt=qi∣λ)P(yt+1=qj∣yt=qi,λ)∗P(xt+1∣xt+2,xt+3,...,xT,yt+1=qj,λ)P(xt+2,xt+3,...,xT∣yt+1=qj,λ)=∑i=1N∑j=1NP(x1,x2,...,xt∣yt=qi,yt+1=qj,λ)P(yt=qi∣λ)P(yt+1=qj∣yt=qi,λ)∗P(xt+1∣xt+2,xt+3,...,xT,yt+1=qj,λ)P(xt+2,xt+3,...,xT∣yt+1=qj,λ)=∑i=1N∑j=1NP(x1,x2,...,xt∣yt=qi,yt+1=qj,λ)P(yt=qi∣λ)P(yt+1=qj∣yt=qi,λ)∗P(xt+1,xt+2,xt+3,...,xT∣yt+1=qj,λ)=∑i=1N∑j=1NP(x1,x2,...,xt,yt+1=qj∣yt=qi,λ)P(yt=qi∣λ)∗P(xt+1,xt+2,xt+3,...,xT∣x1,x2,...,xt,yt+1=qj,yt=qi,λ)=∑i=1N∑j=1NP(yt=qi∣λ)P(x1,x2,...,xt,yt+1=qj∣yt=qi,λ)∗P(xt+1,xt+2,xt+3,...,xT∣x1,x2,...,xt,yt+1=qj,yt=qi,λ)=∑i=1N∑j=1NP(yt=qi∣λ)P(x1,x2,...,xt,yt+1=qj,xt+1,xt+2,xt+3,...,xT∣yt=qi,λ)=∑i=1NP(yt=qi∣λ)P(x1,x2,...,xt,xt+1,xt+2,xt+3,...,xT∣yt=qi,λ)=∑i=1NP(x1,x2,...,xt,xt+1,xt+2,xt+3,...,xT,yt=qi∣λ)=P(x1,x2,...,xT∣λ)=P(X∣λ)\begin{aligned} \sum_{i=1}^N\sum_{j=1}^N \alpha_t(i)A_{i,j}B_{j,x_{t+1}}\beta_{t+1}(j) &= \sum_{i=1}^N\sum_{j=1}^N P(x_1,x_2,...,x_t,y_t=q_i|\lambda)P(y_{t+1}=q_j|y_t=q_i,\lambda)\\&\quad\quad\quad\quad\quad*P(x_{t+1}|y_{t+1}=q_j,\lambda)P(x_{t+2},x_{t+3},...,x_T|y_{t+1}=q_j,\lambda)\\ &= \sum_{i=1}^N\sum_{j=1}^N P(x_1,x_2,...,x_t|y_t=q_i,\lambda)P(y_t=q_i|\lambda)P(y_{t+1}=q_j|y_t=q_i,\lambda)\\&\quad\quad\quad\quad\quad*P(x_{t+1}|{\color{red}x_{t+2},x_{t+3},...,x_{T}},y_{t+1}=q_j,\lambda)P(x_{t+2},x_{t+3},...,x_T|y_{t+1}=q_j,\lambda)\\ &= \sum_{i=1}^N\sum_{j=1}^N P(x_1,x_2,...,x_t|y_t=q_i,{\color{blue}y_{t+1}=q_j},\lambda)P(y_t=q_i|\lambda)P(y_{t+1}=q_j|y_t=q_i,\lambda)\\&\quad\quad\quad\quad\quad*P(x_{t+1}|{x_{t+2},x_{t+3},...,x_{T}},y_{t+1}=q_j,\lambda)P(x_{t+2},x_{t+3},...,x_T|y_{t+1}=q_j,\lambda)\\ &= \sum_{i=1}^N\sum_{j=1}^N P(x_1,x_2,...,x_t|y_t=q_i,{\color{blue}y_{t+1}=q_j},\lambda)P(y_t=q_i|\lambda)P(y_{t+1}=q_j|y_t=q_i,\lambda)\\&\quad\quad\quad\quad\quad*P(x_{t+1},{x_{t+2},x_{t+3},...,x_{T}}|y_{t+1}=q_j,\lambda)\\ &= \sum_{i=1}^N\sum_{j=1}^N P(x_1,x_2,...,x_t,{\color{blue}y_{t+1}=q_j}|y_t=q_i,\lambda)P(y_t=q_i|\lambda)\\&\quad\quad\quad\quad\quad*P(x_{t+1},{x_{t+2},x_{t+3},...,x_{T}}|{\color{red}x_1,x_2,...,x_t},y_{t+1}=q_j,{\color{blue}y_t=q_i},\lambda)\\ &= \sum_{i=1}^N\sum_{j=1}^N P(y_t=q_i|\lambda)P({\color{orangered}x_1,x_2,...,x_t,y_{t+1}=q_j}|y_t=q_i,\lambda)\\&\quad\quad\quad\quad\quad*P(x_{t+1},{x_{t+2},x_{t+3},...,x_{T}}|{\color{orangered}x_1,x_2,...,x_t,y_{t+1}=q_j},{\color{blue}y_t=q_i},\lambda)\\ &= \sum_{i=1}^N\sum_{j=1}^N P(y_t=q_i|\lambda)P({\color{orangered}x_1,x_2,...,x_t,y_{t+1}=q_j},x_{t+1},{x_{t+2},x_{t+3},...,x_{T}}|{\color{blue}y_t=q_i},\lambda)\\ &= \sum_{i=1}^N P(y_t=q_i|\lambda)P({\color{orangered}x_1,x_2,...,x_t},x_{t+1},{x_{t+2},x_{t+3},...,x_{T}}|{\color{blue}y_t=q_i},\lambda)\\ &= \sum_{i=1}^N P({\color{orangered}x_1,x_2,...,x_t},x_{t+1},{x_{t+2},x_{t+3},...,x_{T}},{\color{blue}y_t=q_i}|\lambda)\\ &= P(x_1,x_2,...,x_T|\lambda)=P(X|\lambda)\\ \end{aligned} i=1∑Nj=1∑Nαt(i)Ai,jBj,xt+1βt+1(j)=i=1∑Nj=1∑NP(x1,x2,...,xt,yt=qi∣λ)P(yt+1=qj∣yt=qi,λ)∗P(xt+1∣yt+1=qj,λ)P(xt+2,xt+3,...,xT∣yt+1=qj,λ)=i=1∑Nj=1∑NP(x1,x2,...,xt∣yt=qi,λ)P(yt=qi∣λ)P(yt+1=qj∣yt=qi,λ)∗P(xt+1∣xt+2,xt+3,...,xT,yt+1=qj,λ)P(xt+2,xt+3,...,xT∣yt+1=qj,λ)=i=1∑Nj=1∑NP(x1,x2,...,xt∣yt=qi,yt+1=qj,λ)P(yt=qi∣λ)P(yt+1=qj∣yt=qi,λ)∗P(xt+1∣xt+2,xt+3,...,xT,yt+1=qj,λ)P(xt+2,xt+3,...,xT∣yt+1=qj,λ)=i=1∑Nj=1∑NP(x1,x2,...,xt∣yt=qi,yt+1=qj,λ)P(yt=qi∣λ)P(yt+1=qj∣yt=qi,λ)∗P(xt+1,xt+2,xt+3,...,xT∣yt+1=qj,λ)=i=1∑Nj=1∑NP(x1,x2,...,xt,yt+1=qj∣yt=qi,λ)P(yt=qi∣λ)∗P(xt+1,xt+2,xt+3,...,xT∣x1,x2,...,xt,yt+1=qj,yt=qi,λ)=i=1∑Nj=1∑NP(yt=qi∣λ)P(x1,x2,...,xt,yt+1=qj∣yt=qi,λ)∗P(xt+1,xt+2,xt+3,...,xT∣x1,x2,...,xt,yt+1=qj,yt=qi,λ)=i=1∑Nj=1∑NP(yt=qi∣λ)P(x1,x2,...,xt,yt+1=qj,xt+1,xt+2,xt+3,...,xT∣yt=qi,λ)=i=1∑NP(yt=qi∣λ)P(x1,x2,...,xt,xt+1,xt+2,xt+3,...,xT∣yt=qi,λ)=i=1∑NP(x1,x2,...,xt,xt+1,xt+2,xt+3,...,xT,yt=qi∣λ)=P(x1,x2,...,xT∣λ)=P(X∣λ)
2.3.2 后向算法Python代码
import numpy as np
class HiddenMarkov:def backward(self, Q, V, A, B, X, PI): # 后向算法N = len(Q) # 隐藏状态数量M = len(X) # 观测序列大小betas = np.ones((N, M)) # 后向概率矩阵 betasfor i in range(N):print("beta%d(%d)=1" %(M, i)) # 后向概率初始为1for t in range(M-2, -1, -1):indexOfXi = V.index(X[t+1]) # 观测值Xi对应的V中的索引for i in range(N):betas[i][t] = np.dot(np.multiply(A[i], [b[indexOfXi] for b in B]), # A的i行 B的Xi列 = 行[beta[t+1] for beta in betas]) # 递推公式realT = t+1 # 真实的时间,t从1开始realI = i+1 # 状态标号,从1开始print("beta%d(%d)=[sigma A%djBj(x%d)beta%d(j)]=("%(realT, realI, realI, realT+1, realT+1), end='') # end,表示以该符号空开,不换行for j in range(N):print("%.2f*%.2f*%.2f+" %(A[i][j], B[j][indexOfXi],betas[j][t+1]), end=' ')print("0)=%.3f" % betas[i][t])print(betas)indexOfXi = V.index(X[0])P = np.dot(np.multiply(PI, [b[indexOfXi] for b in B]),[beta[0] for beta in betas])print("P(X|lambda)=", end=" ")for i in range(N):print("%.1f*%.1f*%.5f+" % (PI[0][i], B[i][indexOfXi], betas[i][0]), end=" ")print("0=%f" % P)Q = [1, 2, 3]
V = ['红', '白']
A = [[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]]
B = [[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]]
# X = ['红', '白', '红', '红', '白', '红', '白', '白']
X = ['红', '白', '红'] # 书上的例子
PI = [[0.2, 0.4, 0.4]]hmm = HiddenMarkov()
hmm.backward(Q, V, A, B, X, PI)
运行结果:
beta3(0)=1
beta3(1)=1
beta3(2)=1
beta2(1)=[sigma A1jBj(x3)beta3(j)]=(0.50*0.50*1.00+ 0.20*0.40*1.00+ 0.30*0.70*1.00+ 0)=0.540
beta2(2)=[sigma A2jBj(x3)beta3(j)]=(0.30*0.50*1.00+ 0.50*0.40*1.00+ 0.20*0.70*1.00+ 0)=0.490
beta2(3)=[sigma A3jBj(x3)beta3(j)]=(0.20*0.50*1.00+ 0.30*0.40*1.00+ 0.50*0.70*1.00+ 0)=0.570
beta1(1)=[sigma A1jBj(x2)beta2(j)]=(0.50*0.50*0.54+ 0.20*0.60*0.49+ 0.30*0.30*0.57+ 0)=0.245
beta1(2)=[sigma A2jBj(x2)beta2(j)]=(0.30*0.50*0.54+ 0.50*0.60*0.49+ 0.20*0.30*0.57+ 0)=0.262
beta1(3)=[sigma A3jBj(x2)beta2(j)]=(0.20*0.50*0.54+ 0.30*0.60*0.49+ 0.50*0.30*0.57+ 0)=0.228
[[0.2451 0.54 1. ][0.2622 0.49 1. ][0.2277 0.57 1. ]]
P(X|lambda)= 0.2*0.5*0.24510+ 0.4*0.4*0.26220+ 0.4*0.7*0.22770+ 0=0.130218
# 概率跟前向算法计算的是一致的
2.4 一些概率与期望值
结论1:P(X,yt=qi∣λ)=αt(i)βt(i)P(X,y_t=q_i|\lambda)=\alpha_t(i)\beta_t(i)P(X,yt=qi∣λ)=αt(i)βt(i)
证明:
P(X,yt=qi∣λ)=P(x1,x2,...,xt,xt+1,...,xT∣yt=qi,λ)P(yt=qi∣λ)=P(x1,x2,...,xt∣yt=qi,λ)P(xt+1,...,xT∣yt=qi,λ)P(yt=qi∣λ)=P(x1,x2,...,xt,yt=qi∣λ)P(xt+1,...,xT∣yt=qi,λ)=αt(i)βt(i)\begin{aligned} P(X,y_t=q_i|\lambda)&=P(x_1,x_2,...,x_t,x_{t+1},...,x_T|y_t=q_i,\lambda)P(y_t=q_i|\lambda)\\ &=P(x_1,x_2,...,x_t|y_t=q_i,\lambda)P(x_{t+1},...,x_T|y_t=q_i,\lambda)P(y_t=q_i|\lambda)\\ &=P(x_1,x_2,...,x_t,y_t=q_i | \lambda)P(x_{t+1},...,x_T|y_t=q_i,\lambda)\\ &= \alpha_t(i)\beta_t(i) \end{aligned} P(X,yt=qi∣λ)=P(x1,x2,...,xt,xt+1,...,xT∣yt=qi,λ)P(yt=qi∣λ)=P(x1,x2,...,xt∣yt=qi,λ)P(xt+1,...,xT∣yt=qi,λ)P(yt=qi∣λ)=P(x1,x2,...,xt,yt=qi∣λ)P(xt+1,...,xT∣yt=qi,λ)=αt(i)βt(i)
结论2:
给定模型 λ\lambdaλ 和观测序列 XXX,在时刻 ttt 处于状态 qiq_iqi 的概率:
γt(i)=P(yt=qi∣X,λ)=αt(i)βt(i)∑i=1Nαt(i)βt(i)\gamma_t(i)=P(y_t=q_i|X,\lambda)=\frac{ \alpha_t(i)\beta_t(i)}{\sum_{i=1}^N \alpha_t(i)\beta_t(i)}γt(i)=P(yt=qi∣X,λ)=∑i=1Nαt(i)βt(i)αt(i)βt(i)
证明:
γt(i)=P(yt=qi∣X,λ)=P(yt=qi,X∣λ)P(X∣λ)=αt(i)βt(i)∑i=1NP(yt=qi,X∣λ)=αt(i)βt(i)∑i=1Nαt(i)βt(i)\begin{aligned} \gamma_t(i)&=P(y_t=q_i|X,\lambda)=\frac{P(y_t=q_i,X|\lambda)}{P(X|\lambda)}\\ &=\frac{ \alpha_t(i)\beta_t(i)}{\sum_{i=1}^N P(y_t=q_i,X|\lambda)}\\ &=\frac{ \alpha_t(i)\beta_t(i)}{\sum_{i=1}^N \alpha_t(i)\beta_t(i)} \end{aligned} γt(i)=P(yt=qi∣X,λ)=P(X∣λ)P(yt=qi,X∣λ)=∑i=1NP(yt=qi,X∣λ)αt(i)βt(i)=∑i=1Nαt(i)βt(i)αt(i)βt(i)
结论3:
给定模型 λ\lambdaλ 和观测序列 XXX,在时刻 ttt 处于状态 qiq_iqi 且在时刻 t+1t+1t+1 处于 qjq_jqj 的概率 :
ξt(i,j)=P(yt=qi,yt+1=qj∣X,λ)=αt(i)Ai,jBj,xt+1βt+1(j)∑i=1N∑j=1Nαt(i)Ai,jBj,xt+1βt+1(j)\xi_t(i,j)=P(y_t=q_i,y_{t+1}=q_j |X, \lambda)=\frac{\alpha_t(i)A_{i,j}B_{j,x_{t+1}}\beta_{t+1}(j)}{\sum_{i=1}^N\sum_{j=1}^N \alpha_t(i)A_{i,j}B_{j,x_{t+1}}\beta_{t+1}(j)}ξt(i,j)=P(yt=qi,yt+1=qj∣X,λ)=∑i=1N∑j=1Nαt(i)Ai,jBj,xt+1βt+1(j)αt(i)Ai,jBj,xt+1βt+1(j)
证明:
由上面可知:P(X∣λ)=∑i=1N∑j=1Nαt(i)Ai,jBj,xt+1βt+1(j)P(X|\lambda)=\sum_{i=1}^N\sum_{j=1}^N \alpha_t(i)A_{i,j}B_{j,x_{t+1}}\beta_{t+1}(j)P(X∣λ)=∑i=1N∑j=1Nαt(i)Ai,jBj,xt+1βt+1(j)
αt(i)Ai,jBj,xt+1βt+1(j)=P(X,yt=qi,yt+1=qj∣λ)\alpha_t(i)A_{i,j}B_{j,x_{t+1}}\beta_{t+1}(j)=P(X,y_t=q_i,y_{t+1}=q_j | \lambda)αt(i)Ai,jBj,xt+1βt+1(j)=P(X,yt=qi,yt+1=qj∣λ)
ξt(i,j)=P(yt=qi,yt+1=qj∣X,λ)=P(X,yt=qi,yt+1=qj∣λ)P(X∣λ)=αt(i)Ai,jBj,xt+1βt+1(j)∑i=1N∑j=1Nαt(i)Ai,jBj,xt+1βt+1(j)\begin{aligned} \xi_t(i,j)&=P(y_t=q_i,y_{t+1}=q_j |X, \lambda)\\ &=\frac{P(X,y_t=q_i,y_{t+1}=q_j | \lambda)}{P(X|\lambda)}\\ &=\frac{\alpha_t(i)A_{i,j}B_{j,x_{t+1}}\beta_{t+1}(j)}{\sum_{i=1}^N\sum_{j=1}^N \alpha_t(i)A_{i,j}B_{j,x_{t+1}}\beta_{t+1}(j)} \end{aligned} ξt(i,j)=P(yt=qi,yt+1=qj∣X,λ)=P(X∣λ)P(X,yt=qi,yt+1=qj∣λ)=∑i=1N∑j=1Nαt(i)Ai,jBj,xt+1βt+1(j)αt(i)Ai,jBj,xt+1βt+1(j)
结论4:
-
在观测序列 XXX 下状态 qiq_iqi 出现的期望值:∑t=1Tγt(i)\sum_{t=1}^T \gamma_t(i)∑t=1Tγt(i)
-
在观测序列 XXX 下由状态 qiq_iqi 转移的期望值:∑t=1T−1γt(i)\sum_{t=1}^{T-1} \gamma_t(i)∑t=1T−1γt(i)
-
在观测序列 XXX 下由状态 qiq_iqi 转移到状态 qjq_jqj 的期望值:∑t=1T−1ξt(i,j)\sum_{t=1}^{T-1} \xi_t(i,j)∑t=1T−1ξt(i,j)
3. 学习算法
隐马尔可夫模型的学习,根据训练数据进行分类:
数据 | 学习方法 |
---|---|
观测序列 XXX + 对应的状态序列 YYY | 监督学习 |
观测序列 XXX | 无监督学习(BaumBaumBaum-WelchWelchWelch / EMEMEM 算法) |
3.1 监督学习方法
- 转移概率AAA, i−>j\quad i->ji−>j: iii 到 jjj 状态的频数 ÷÷÷ iii 转移到所有状态的频数和
- 观察概率BBB, i\quad ii 观测 kkk : iii 状态观测到 kkk 的频数 ÷÷÷ iii 状态观测到所有 xix_ixi 的频数和
- 初始状态概率π\piπ,yi\quad y_iyi 在样本中的初始概率
由于监督学习需要使用标注的训练数据,而人工标注训练数据往往代价很高,需要用无监督学习的方法。
3.2 无监督 BaumBaumBaum-WelchWelchWelch 算法
(先跳过)
4. 预测算法
4.1 近似算法
根据 2.4 节 结论2:每个时刻 t 处于状态 qiq_iqi 的概率可以计算
那么每一时刻 t 的最有可能的状态就是概率最大的那个状态。
- 优点是计算简单
- 缺点是不能保证预测的状态序列整体是最有可能的状态序列,因为预测的状态序列可能有实际不发生的部分。事实上,上述方法得到的状态序列中有可能存在转移概率为0的相邻状态,尽管如此,近似算法仍然是有用的。
4.2 维特比 ViterbiViterbiViterbi 算法
维特比算法实际是用动态规划(dynamic programming)解隐马尔可夫模型预测问题,即用动态规划求概率最大路径(最优路径)。这时一条路径对应着一个状态序列。
2个定义:
-
在时刻 t 状态为 i 的所有单个路径中概率最大值为
δt(i)=maxy1,y2,...,yt−1P(yt=i,yt−1,...,y1,xt,...,x1∣λ)\delta_t(i) = \max \limits_{y_1,y_2,...,y_{t-1}}P(y_t=i,y_{t-1},...,y_1,x_t,...,x_1|\lambda)δt(i)=y1,y2,...,yt−1maxP(yt=i,yt−1,...,y1,xt,...,x1∣λ) -
由定义可得变量 δ\deltaδ 的递推公式:
δt+1(i)=maxy1,y2,...,ytP(yt+1=i,yt,...,y1,xt+1,...,x1∣λ)=max1≤j≤N[δt(j)Aj,i]Bi(xt+1)\begin{aligned} \delta_{t+1}(i) &= \max \limits_{y_1,y_2,...,y_{t}}P(y_{t+1}=i,y_{t},...,y_1,x_{t+1},...,x_1|\lambda)\\ &= \max \limits_{1 \leq j \leq N }[\delta_t(j)A_{j,i}]B_i(x_{t+1}) \end{aligned}δt+1(i)=y1,y2,...,ytmaxP(yt+1=i,yt,...,y1,xt+1,...,x1∣λ)=1≤j≤Nmax[δt(j)Aj,i]Bi(xt+1) -
在时刻 t 状态为 i 的所有单个路径中概率最大的路径的第
t-1
个结点为
Ψt(i)=argmax1≤j≤N[δt−1(j)Aj,i]\Psi_t(i)=arg \quad \max \limits_{1 \leq j \leq N }[\delta_{t-1}(j)A_{j,i}]Ψt(i)=arg1≤j≤Nmax[δt−1(j)Aj,i]
viterbi 算法:
- 输入:模型 λ=(A,B,π)\lambda=(A,B,\pi)λ=(A,B,π),观测序列 X=(x1,x2,...,xT)X=(x_1,x_2,...,x_T)X=(x1,x2,...,xT)
- 输出:最优路径 Y∗=(y1∗,y2∗,...,yT∗)Y^*=(y_1^*,y_2^*,...,y_T^*)Y∗=(y1∗,y2∗,...,yT∗)
- 初始化
δ1(i)=πiBi(x1)\delta_1(i)=\pi_iB_i(x_1)δ1(i)=πiBi(x1);Ψ1(i)=0\Psi_1(i)=0Ψ1(i)=0 - 递推,对 t=2,3,...,Tt=2,3,...,Tt=2,3,...,T
δt(i)=max1≤j≤N[δt−1(j)Aj,i]Bi(xt)Ψt(i)=argmax1≤j≤N[δt−1(j)Aj,i]\begin{aligned} \delta_{t}(i) &= \max \limits_{1 \leq j \leq N }[\delta_{t-1}(j)A_{j,i}]B_i(x_{t})\\ \Psi_t(i)&=arg \quad \max \limits_{1 \leq j \leq N }[\delta_{t-1}(j)A_{j,i}] \end{aligned}δt(i)Ψt(i)=1≤j≤Nmax[δt−1(j)Aj,i]Bi(xt)=arg1≤j≤Nmax[δt−1(j)Aj,i] - 终止
- 最优路径回溯
yt∗=Ψt+1(yt+1∗)y_t^*=\Psi_{t+1}(y_{t+1}^*)yt∗=Ψt+1(yt+1∗)
例题:
考虑盒子和球模型 λ=(A,B,π)\lambda = (A,B,\pi)λ=(A,B,π),状态集合(盒子的编号)Q={1,2,3}Q=\{1,2,3\}Q={1,2,3},观测集合 V={红,白}V=\{红,白\}V={红,白}
π=[0.20.40.4],A=[0.50.20.30.30.50.20.20.30.5],B=[0.50.50.40.60.70.3]\pi=\begin{bmatrix} 0.2 \\ 0.4 \\ 0.4 \\ \end{bmatrix}, A = \begin{bmatrix} 0.5 & 0.2 & 0.3 \\ 0.3 & 0.5 & 0.2 \\ 0.2 & 0.3 & 0.5 \\ \end{bmatrix}, B = \begin{bmatrix} 0.5 & 0.5 \\ 0.4 & 0.6 \\ 0.7 & 0.3 \\ \end{bmatrix} π=⎣⎡0.20.40.4⎦⎤,A=⎣⎡0.50.30.20.20.50.30.30.20.5⎦⎤,B=⎣⎡0.50.40.70.50.60.3⎦⎤
已知观测序列 X=(红,白,红)X=(红,白,红)X=(红,白,红),求最优状态序列 Y∗Y^*Y∗
解:
(1)初始化
在 t=1t=1t=1 时刻,对每一个状态 i ,i = 1,2,3, 求状态为 i 观测 x1x_1x1 为红的概率,记为 δ1(i)\delta_1(i)δ1(i):
δ1(i)=πiBi(x1)=πiBi(红),i=1,2,3\delta_1(i)=\pi_iB_i(x_1)=\pi_iB_i(红),i=1,2,3δ1(i)=πiBi(x1)=πiBi(红),i=1,2,3
δ1(1)=0.2∗0.5=0.10\delta_1(1)=0.2*0.5=0.10δ1(1)=0.2∗0.5=0.10
δ1(2)=0.4∗0.4=0.16\delta_1(2)=0.4*0.4=0.16δ1(2)=0.4∗0.4=0.16
δ1(3)=0.4∗0.7=0.28\delta_1(3)=0.4*0.7=0.28δ1(3)=0.4∗0.7=0.28
Ψ1(i)=0,i=1,2,3\Psi_1(i)=0,i=1,2,3Ψ1(i)=0,i=1,2,3
(2)在 t=2t=2t=2 时刻,对每个状态 i ,i = 1,2,3, 求在 t=1t=1t=1 时状态为 jjj 观测为 红 并在 t=2t=2t=2 时刻状态为 i 观测为 x2x_2x2 为白的路径的最大概率,记此最大概率为 δ2(i)\delta_2(i)δ2(i)
δ2(i)=max1≤j≤3[δ1(j)Aj,i]Bi(x2)\delta_2(i)=\max \limits_{1 \leq j \leq 3 }[\delta_{1}(j)A_{j,i}]B_i(x_{2})δ2(i)=1≤j≤3max[δ1(j)Aj,i]Bi(x2)
同时,对每个状态 i,i=1,2,3,i, i=1,2,3,i,i=1,2,3, 记录概率最大路径的前一个状态 jjj:
Ψ2(i)=argmax1≤j≤3[δ1(j)Aj,i],i=1,2,3\Psi_2(i) = arg \max \limits_{1 \leq j \leq 3}[\delta_1(j)A_{j,i}],\quad i=1,2,3Ψ2(i)=arg1≤j≤3max[δ1(j)Aj,i],i=1,2,3
计算:
δ2(1)=max1≤j≤3[δ1(j)Aj,1]B1(x2)=maxj{0.10∗0.5,0.16∗0.3,0.28∗0.2}∗0.5=0.056∗0.5=0.028\delta_2(1)=\max \limits_{1 \leq j \leq 3 }[\delta_{1}(j)A_{j,1}]B_1(x_{2}) = \max \limits_j \{0.10*0.5, 0.16*0.3, 0.28*0.2\}*0.5=0.056*0.5=0.028δ2(1)=1≤j≤3max[δ1(j)Aj,1]B1(x2)=jmax{0.10∗0.5,0.16∗0.3,0.28∗0.2}∗0.5=0.056∗0.5=0.028
Ψ2(1)=3\Psi_2(1) = 3Ψ2(1)=3
δ2(2)=max1≤j≤3[δ1(j)Aj,2]B2(x2)=maxj{0.10∗0.2,0.16∗0.5,0.28∗0.3}∗0.6=0.084∗0.6=0.0504\delta_2(2)=\max \limits_{1 \leq j \leq 3 }[\delta_{1}(j)A_{j,2}]B_2(x_{2}) = \max \limits_j \{0.10*0.2, 0.16*0.5, 0.28*0.3\}*0.6=0.084*0.6=0.0504δ2(2)=1≤j≤3max[δ1(j)Aj,2]B2(x2)=jmax{0.10∗0.2,0.16∗0.5,0.28∗0.3}∗0.6=0.084∗0.6=0.0504
Ψ2(2)=3\Psi_2(2) = 3Ψ2(2)=3
δ2(3)=max1≤j≤3[δ1(j)Aj,3]B3(x2)=maxj{0.10∗0.3,0.16∗0.2,0.28∗0.5}∗0.3=0.14∗0.3=0.042\delta_2(3)=\max \limits_{1 \leq j \leq 3 }[\delta_{1}(j)A_{j,3}]B_3(x_{2}) = \max \limits_j \{0.10*0.3, 0.16*0.2, 0.28*0.5\}*0.3=0.14*0.3=0.042δ2(3)=1≤j≤3max[δ1(j)Aj,3]B3(x2)=jmax{0.10∗0.3,0.16∗0.2,0.28∗0.5}∗0.3=0.14∗0.3=0.042
Ψ2(3)=3\Psi_2(3) = 3Ψ2(3)=3
在 t=3t=3t=3 时刻,
δ3(i)=max1≤j≤3[δ2(j)Aj,i]Bi(x3)\delta_3(i)=\max \limits_{1 \leq j \leq 3 }[\delta_{2}(j)A_{j,i}]B_i(x_{3})δ3(i)=1≤j≤3max[δ2(j)Aj,i]Bi(x3)
Ψ3(i)=argmax1≤j≤3[δ2(j)Aj,i]\Psi_3(i) = arg \max \limits_{1 \leq j \leq 3 } [\delta_2(j)A_{j,i}]Ψ3(i)=arg1≤j≤3max[δ2(j)Aj,i]
δ3(1)=max1≤j≤3[δ2(j)Aj,1]B1(x3)=maxj{0.028∗0.5,0.0504∗0.3,0.042∗0.2}∗0.5=0.01512∗0.5=0.00756\delta_3(1)=\max \limits_{1 \leq j \leq 3 }[\delta_{2}(j)A_{j,1}]B_1(x_{3}) = \max \limits_j \{0.028*0.5, 0.0504*0.3, 0.042*0.2\}*0.5=0.01512*0.5=0.00756δ3(1)=1≤j≤3max[δ2(j)Aj,1]B1(x3)=jmax{0.028∗0.5,0.0504∗0.3,0.042∗0.2}∗0.5=0.01512∗0.5=0.00756
Ψ3(1)=2\Psi_3(1) = 2Ψ3(1)=2
δ3(2)=max1≤j≤3[δ2(j)Aj,2]B2(x3)=maxj{0.028∗0.2,0.0504∗0.5,0.042∗0.3}∗0.4=0.0252∗0.4=0.01008\delta_3(2)=\max \limits_{1 \leq j \leq 3 }[\delta_{2}(j)A_{j,2}]B_2(x_{3}) = \max \limits_j \{0.028*0.2, 0.0504*0.5, 0.042*0.3\}*0.4=0.0252*0.4=0.01008δ3(2)=1≤j≤3max[δ2(j)Aj,2]B2(x3)=jmax{0.028∗0.2,0.0504∗0.5,0.042∗0.3}∗0.4=0.0252∗0.4=0.01008
Ψ3(2)=2\Psi_3(2) = 2Ψ3(2)=2
δ3(3)=max1≤j≤3[δ2(j)Aj,3]B3(x3)=maxj{0.028∗0.3,0.0504∗0.2,0.042∗0.5}∗0.7=0.021∗0.7=0.0147\delta_3(3)=\max \limits_{1 \leq j \leq 3 }[\delta_{2}(j)A_{j,3}]B_3(x_{3}) = \max \limits_j \{0.028*0.3, 0.0504*0.2, 0.042*0.5\}*0.7=0.021*0.7=0.0147δ3(3)=1≤j≤3max[δ2(j)Aj,3]B3(x3)=jmax{0.028∗0.3,0.0504∗0.2,0.042∗0.5}∗0.7=0.021∗0.7=0.0147
Ψ3(3)=3\Psi_3(3) = 3Ψ3(3)=3
(3)以 Y∗Y^*Y∗ 表示最优路径的概率,则
Y∗=max1≤i≤3δ3(i)=0.0147Y^* = \max \limits_{1 \leq i \leq 3}\delta_3(i)=0.0147Y∗=1≤i≤3maxδ3(i)=0.0147
最优路径的终点是 i3∗=argmaxi[δ3(i)]=3i_3^*=arg \max \limits_i [\delta_3(i)]=3i3∗=argimax[δ3(i)]=3
(4)由最优路径的终点 i3∗i_3^*i3∗,逆向找到前面的路径
t=2,i2∗=Ψ3(i3∗)=Ψ3(3)=3t=2, \quad i_2^* = \Psi_3(i_3^*)=\Psi_3(3)=3t=2,i2∗=Ψ3(i3∗)=Ψ3(3)=3
t=1,i1∗=Ψ2(i2∗)=Ψ2(3)=3t=1, \quad i_1^* = \Psi_2(i_2^*)=\Psi_2(3)=3t=1,i1∗=Ψ2(i2∗)=Ψ2(3)=3
于是求得最优路径,最优状态序列 Y∗=(i1∗,i2∗,i3∗)=(3,3,3)Y^*=(i_1^*,i_2^*,i_3^*)=(3,3,3)Y∗=(i1∗,i2∗,i3∗)=(3,3,3)
4.3 维特比算法Python代码
import numpy as np
class HiddenMarkov:def viterbi(self, Q, V, A, B, X, PI): # 维特比算法,求最优状态序列N = len(Q) # 隐藏状态数量M = len(X) # 观测序列大小deltas = np.zeros((N, M))psis = np.zeros((N, M))Y = np.zeros((1, M)) # 最优(概率最大)状态序列for t in range(M):realT = t+1 # 真实时刻,从1开始indexOfXi = V.index(X[t]) # 观测值Xi对应的V中的索引for i in range(N):realI = i+1 # 状态序号,从1开始if t == 0:deltas[i][t] = PI[0][i]* B[i][indexOfXi] # 初始值psis[i][t] = 0print("delta1(%d)=pi_%d * B%d(x1)=%.2f * %.2f=%.2f" %(realI, realI, realI, PI[0][i], B[i][indexOfXi], deltas[i][t]))print('psis1(%d)=0' % realI)else:deltas[i][t] = np.max(np.multiply([delta[t-1] for delta in deltas],[a[i] for a in A])) \* B[i][indexOfXi] # 递推公式print("delta%d(%d)=max[delta%d(j)Aj%d]B%d(x%d)=%.2f*%.2f=%.5f"% (realT, realI, realT-1, realI, realI, realT,np.max(np.multiply([delta[t-1] for delta in deltas], [a[i] for a in A])),B[i][indexOfXi], deltas[i][t]))psis[i][t] = np.argmax(np.multiply([delta[t-1] for delta in deltas],[a[i] for a in A])) + 1 # 序号从1开始print("psis%d(%d)=argmax[delta%d(j)Aj%d]=%d" %(realT, realI, realT-1, realI, psis[i][t]))print(deltas)print(psis)Y[0][M-1] = np.argmax([delta[M-1] for delta in deltas]) +1 # 序号从1开始print("Y%d=argmax[deltaT(i)]=%d" %(M, Y[0][M-1])) # 最优路径的终点for t in range(M-2, -1, -1): # 逆序推导前面的路径Y[0][t] = psis[int(Y[0][t+1])-1][t+1] # 寻找前面路径print("Y%d=psis%d(Y%d)=%d" %(t+1, t+2, t+2, Y[0][t]))print("最大概率的状态序列Y是: ", Y)Q = [1, 2, 3]
V = ['红', '白']
A = [[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]]
B = [[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]]
# X = ['红', '白', '红', '红', '白', '红', '白', '白']
X = ['红', '白', '红'] # 书上的例子
PI = [[0.2, 0.4, 0.4]]hmm = HiddenMarkov()
hmm.viterbi(Q, V, A, B, X, PI)
运行结果:(与上面计算一致)
delta1(1)=pi_1 * B1(x1)=0.20 * 0.50=0.10
psis1(1)=0
delta1(2)=pi_2 * B2(x1)=0.40 * 0.40=0.16
psis1(2)=0
delta1(3)=pi_3 * B3(x1)=0.40 * 0.70=0.28
psis1(3)=0
delta2(1)=max[delta1(j)Aj1]B1(x2)=0.06*0.50=0.02800
psis2(1)=argmax[delta1(j)Aj1]=3
delta2(2)=max[delta1(j)Aj2]B2(x2)=0.08*0.60=0.05040
psis2(2)=argmax[delta1(j)Aj2]=3
delta2(3)=max[delta1(j)Aj3]B3(x2)=0.14*0.30=0.04200
psis2(3)=argmax[delta1(j)Aj3]=3
delta3(1)=max[delta2(j)Aj1]B1(x3)=0.02*0.50=0.00756
psis3(1)=argmax[delta2(j)Aj1]=2
delta3(2)=max[delta2(j)Aj2]B2(x3)=0.03*0.40=0.01008
psis3(2)=argmax[delta2(j)Aj2]=2
delta3(3)=max[delta2(j)Aj3]B3(x3)=0.02*0.70=0.01470
psis3(3)=argmax[delta2(j)Aj3]=3
[[0.1 0.028 0.00756][0.16 0.0504 0.01008][0.28 0.042 0.0147 ]]
[[0. 3. 2.][0. 3. 2.][0. 3. 3.]]
Y3=argmax[deltaT(i)]=3
Y2=psis3(Y3)=3
Y1=psis2(Y2)=3
最大概率的状态序列Y是: [[3. 3. 3.]]
- 完整代码
5. PosTagging词性标注 实践
基于HMM的中文词性标注 POSTagging