朴素贝叶斯分类器
朴素贝叶斯是一种基于密度估计的分类算法,它利用贝叶斯定理进行预测。该算法的核心假设是在给定类别的情况下,各个特征之间是条件独立的,尽管这一假设在现实中通常不成立,但朴素贝叶斯分类器依然能够生成对有偏类密度估计具有较强鲁棒性的后验分布,尤其是在后验概率接近决策边界(0.5)时。
朴素贝叶斯分类器通过最大后验概率决策规则将观测值分配到最有可能的类别。
具体步骤如下:
- 密度估计:计算每个类别中各特征的密度分布。
- 后验概率建模:根据贝叶斯公式计算后验概率。对于所有类别 k = 1 , … , K k = 1, \ldots, K k=1,…,K,
P ^ ( Y = k ∣ X 1 , … , X d ) = P ( Y = k ) ∏ j = 1 d P ( X j ∣ Y = k ) ∑ k = 1 K P ( Y = k ) ∏ j = 1 d P ( X j ∣ Y = k ) , \widehat{P}(Y = k | X_1, \ldots, X_d) = \frac{P(Y = k) \prod\limits_{j=1}^{d} P(X_j | Y = k)}{\sum_{k=1}^{K} P(Y = k) \prod\limits_{j=1}^{d} P(X_j | Y = k)}, P (Y=k∣X1,…,Xd)=∑k=1KP(Y=k)j=1∏dP(Xj∣Y=k)P(Y=k)j=1∏dP(Xj∣Y=k),
其中:
- Y Y Y表示观测值所属类别的随机变量。
- X 1 , … , X d X_1, \ldots, X_d X1,…,Xd是观测值的特征变量。
- P ( Y = k ) P(Y = k) P(Y=k)是类别 k k k的先验概率。
- 分类决策:通过比较不同类别的后验概率,将观测值归类到后验概率最大的类别中。
两类密度估计方法
Normal (Gaussian) Distribution
The ‘normal’ distribution (specify using ‘normal’) is appropriate for predictors that have normal distributions in each class. For each predictor you model with a normal distribution, the naive Bayes classifier estimates a separate normal distribution for each class by computing the mean and standard deviation of the training data in that class.
Kernel Distribution
The ‘kernel’ distribution (specify using ‘kernel’) is appropriate for predictors that have a continuous distribution. It does not require a strong assumption such as a normal distribution and you can use it in cases where the distribution of a predictor may be skewed or have multiple peaks or modes. It requires more computing time and more memory than the normal distribution. For each predictor you model with a kernel distribution, the naive Bayes classifier computes a separate kernel density estimate for each class based on the training data for that class. By default the kernel is the normal kernel, and the classifier selects a width automatically for each class and predictor. The software supports specifying different kernels for each predictor, and different widths for each predictor or class.
定理 设 ( X 1 , X 2 , ⋯ , X n ) (X_1, X_2, \cdots, X_n) (X1,X2,⋯,Xn)是 n n n维连续型随机变量, f ( x 1 , x 2 , ⋯ , x n ) f(x_1, x_2, \cdots, x_n) f(x1,x2,⋯,xn)是其联合概率密度函数, f X i ( x i ) f_{X_i}(x_i) fXi(xi)是关于 X i ( i = 1 , 2 , ⋯ , n ) X_i (i=1,2,\cdots,n) Xi(i=1,2,⋯,n)的边缘概率密度函数,则随机变量 X 1 , X 2 , ⋯ , X n X_1, X_2, \cdots, X_n X1,X2,⋯,Xn相互独立等价于
f ( x 1 , x 2 , ⋯ , x n ) = ∏ i = 1 n f X i ( x i ) , f(x_1, x_2, \cdots, x_n) = \prod_{i=1}^{n} f_{X_i}(x_i), f(x1,x2,⋯,xn)=i=1∏nfXi(xi),
其中 ( x 1 , x 2 , ⋯ , x n ) (x_1, x_2, \cdots, x_n) (x1,x2,⋯,xn)为任意的实数组。