LSTM的输入和输出尺寸
CLASS torch.nn.LSTM(*args, **kwargs)
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
For each element in the input sequence, each layer computes the following function:
对于一个输入序列实现多层长短期记忆的RNN网络,对于输入序列中的每一个元素,LSTM的每一层进行如下计算:
it=σ(Wiixt+bii+Whiht−1+bhi)ft=σ(Wifxt+bif+Whfht−1+bhf)gt=tanh(Wigxt+big+Whght−1+bhg)ot=σ(Wioxt+bio+Whoht−1+bho)ct=ft⊙ct−1+it⊙gtht=ot⊙tanh(ct)i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\ f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\ o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho}) \\ c_t = f_t \odot c_{t-1} + i_t \odot g_t \\ h_t = o_t \odot \tanh(c_t) \\ it=σ(Wiixt+bii+Whiht−1+bhi)ft=σ(Wifxt+bif+Whfht−1+bhf)gt=tanh(Wigxt+big+Whght−1+bhg)ot=σ(Wioxt+bio+Whoht−1+bho)ct=ft⊙ct−1+it⊙gtht=ot⊙tanh(ct)
其中:
- ht:h_t:ht:时间步t的隐藏状态
- ct:c_t:ct:时间步t的细胞状态
- xt:x_t:xt:时间步t的输入
- ht−1:h_{t-1}:ht−1:时间步t-1的隐藏状态或者初始化的隐藏状态(时间步0)
- it、ft、gt:i_t、f_t、g_t:it、ft、gt:分别是输入门,遗忘门,单元门和输出门
- σ:\sigma:σ:sigmoid函数
- ⊙:\odot:⊙:Hadamard积
其中的参数:
input_size :输入的维度hidden_size:h的维度num_layers:堆叠LSTM的层数,默认值为1bias:偏置 ,默认值:Truebatch_first: 如果是True,则input为(batch, seq, input_size)。默认值为:False(seq_len, batch, input_size)bidirectional :是否双向传播,默认值为False
输入
Inputs: input, (h_0, c_0)
-
Input输入维度是(seq_len, batch, input_size),即(句子中字的数量,批量大小,每个字向量的长度)
-
h_0 的维度(num_layers * num_directions, batch, hidden_size),即(层数∗*∗LSTM方向数量(单向或者双向),批量大小,隐藏向量维度)
-
c_0 的维度 (num_layers * num_directions, batch, hidden_size),即(层数∗*∗LSTM方向数量,隐藏向量维度)
-
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero,h_0和c_0的默认参数都是全0.
输出
Outputs: output, (h_n, c_n)
- output 输出维度 (seq_len, batch, num_directions * hidden_size),即(句子中字的数量,批量大小,LSTM方向数量∗*∗隐藏向量维度)
- h_n 维度 (num_layers * num_directions, batch, hidden_size)
- c_n 维度 (num_layers * num_directions, batch, hidden_size)
举个例子
- num_layers = 1
import torch.nn as nn
import torch
x = torch.rand(5,50,100)#(seq_len, batch, input_size)
lstm = nn.LSTM(100,20,num_layers=2)
output,(hidden,cell) = lstm(x)
print("output size:{} \nhidden size:{} \ncell size:{}".format(output.size(),hidden.size(),cell.size()))
输出:
output size:torch.Size([5, 50, 20])
hidden size:torch.Size([2, 50, 20])
cell size:torch.Size([2, 50, 20])
- bidirecrtional = True
import torch.nn as nn
import torch
x = torch.rand(5,50,100)
lstm = nn.LSTM(100,20,bidirectional=True)
output,(hidden,cell) = lstm(x)
print("output size:{} \nhidden size:{} \ncell size:{}".format(output.size(),hidden.size(),cell.size()))
输出:
output size:torch.Size([5, 50, 40])
hidden size:torch.Size([2, 50, 20])
cell size:torch.Size([2, 50, 20])