PyTorch中的线性变换：nn.Parameter VS nn.Linear

self.weight = nn.Parameter(torch.randn(in_channels, out_channels)) 和 self.linear = nn.Linear(in_channels, out_channels) 并不完全一致，尽管它们都可以用于实现线性变换（即全连接层），但它们的使用方式和内部实现有所不同。

`nn.Parameter`

当手动创建一个 nn.Parameter 时，是在显式地定义权重矩阵，并且需要自己管理这个参数以及它如何参与到计算中。例如：

self.weight = nn.Parameter(torch.randn(in_channels, out_channels))

这里，self.weight 是一个可学习的参数，可以将其视为模型的一部分，并在前向传播过程中手动与输入进行矩阵乘法运算。假设输入是 x，则输出可以这样计算：

output = torch.matmul(x, self.weight)

注意这里的数学公式是 $Y = X W$ ，其中 $X$ 是输入矩阵， $W$ 是权重矩阵。如果还需要加上偏置项 $b$ ，则变为 $Y = X W + b$ 。在这个例子中，需要另外定义并初始化偏置项 self.bias。

示例 1：自定义实现线性层

import torch
import torch.nn as nnclass CustomLinear(nn.Module):def __init__(self, in_channels, out_channels):super(CustomLinear, self).__init__()# 初始化权重self.weight = nn.Parameter(torch.randn(in_channels, out_channels))# 初始化偏置self.bias = nn.Parameter(torch.randn(out_channels))def forward(self, x):# 线性变换：Y = XW + breturn torch.matmul(x, self.weight) + self.bias# 创建自定义线性层
custom_linear = CustomLinear(in_channels=3, out_channels=2)# 打印权重和偏置
print("Weights:", custom_linear.weight)
print("Bias:", custom_linear.bias)# 输入数据
input_data = torch.randn(4, 3)  # 4个样本，每个样本有3个特征# 前向传播
output = custom_linear(input_data)
print("Output:", output)

在这个示例中，我们手动创建了一个自定义的线性层 CustomLinear，它使用 nn.Parameter 来定义权重和偏置。在 forward 方法中，我们手动计算线性变换：Y = XW + b。这个实现与 nn.Linear 提供的功能类似，但更多地体现了手动管理权重和偏置的方式。

`nn.Linear`

另一方面，nn.Linear 是 PyTorch 提供的一个封装好的模块，用于执行线性变换。它不仅包含了权重矩阵，还自动处理了偏置项（除非明确设置 bias=False）。例如：

self.linear = nn.Linear(in_channels, out_channels)

当调用 self.linear(x) 时，它实际上是在执行以下操作：

output = torch.matmul(x, self.linear.weight.t()) + self.linear.bias

这里，self.linear.weight 的形状是 (out_channels, in_channels)，而不是直接 (in_channels, out_channels)，因此在进行矩阵乘法之前需要对其转置 (t() 方法)。这意味着数学公式实际上是 $Y = XW^T + b$ ，其中 $W^T$ 表示权重矩阵的转置。

示例 2：使用 `nn.Linear`

import torch
import torch.nn as nn# 定义一个线性层
linear_layer = nn.Linear(in_features=3, out_features=2)# 打印权重和偏置
print("Weights:", linear_layer.weight)
print("Bias:", linear_layer.bias)# 输入数据
input_data = torch.randn(4, 3)  # 4个样本，每个样本有3个特征# 前向传播
output = linear_layer(input_data)
print("Output:", output)

在这个示例中，我们创建了一个线性层，它接受一个形状为 [4, 3] 的输入数据，并将其映射到一个形状为 [4, 2] 的输出数据。linear_layer.weight 和 linear_layer.bias 是自动初始化的。

数学公式的对比

对于手动定义的 nn.Parameter，如果输入是 $X$ (形状为 $N, in\_channels]$ )，权重是 $W$ (形状为 $in\_channels, out\_channels]$ )，那么输出 $Y$ 将通过 $Y = X W$ 计算。
对于 nn.Linear，同样的输入 $X$ (形状为 $N, in\_channels]$ )，但是权重 $W^{'}$ (形状为 $out\_channels, in\_channels]$ )，输出 $Y$ 将通过 $Y = X(W')^T + b$ 计算。