一、分类树实现(基于CART算法)
数据集:使用MATLAB内置的鸢尾花(Iris)数据集,实现三分类任务
核心步骤:数据加载→特征标准化→模型训练→可视化→性能评估
%% 1. 数据加载与预处理
load fisheriris; % 加载数据集
X = meas; % 特征矩阵(4维)
Y = species; % 类别标签% 数据标准化(消除量纲影响)
[X_scaled, mu, sigma] = zscore(X);% 划分训练集与测试集(70%训练,30%测试)
cv = cvpartition(Y, 'HoldOut', 0.3);
X_train = X_scaled(cv.training,:);
Y_train = Y(cv.training);
X_test = X_scaled(cv.test,:);
Y_test = Y(cv.test);%% 2. 构建CART分类树
tree = fitctree(X_train, Y_train, ...'PredictorNames', {'SL','SW','PL','PW'}, ... % 特征名称'MaxNumSplits', 10, ... % 最大分裂次数'MinLeafSize', 5); % 最小叶节点样本数%% 3. 可视化决策树结构
figure;
view(tree, 'Mode', 'graph'); % 生成树状图
title('决策树结构可视化');%% 4. 模型预测与评估
Y_pred = predict(tree, X_test);
accuracy = sum(strcmp(Y_pred, Y_test)) / numel(Y_test);
fprintf('分类准确率: %.2f%%
', accuracy*100);%% 5. 性能分析(混淆矩阵)
confMat = confusionmat(Y_test, Y_pred);
cm = confusionchart(confMat);
cm.Title = '混淆矩阵';
cm.XLabel = '预测类别';
cm.YLabel = '真实类别';
二、回归树实现(自写算法)
数据集:自生成非线性数据(含特征交互项)
核心步骤:数据生成→模型训练→预测→误差分析→可视化
%% 1. 生成模拟数据
rng(42); % 固定随机种子
n_samples = 500;
X1 = 8*rand(n_samples,1)+2; % 特征1: 2-10
X2 = 6*rand(n_samples,1)+1; % 特征2: 1-7
y = 20 + 5*X1 + 3*X2 + 2*X1.^2 + 1.5*sin(2*X2) + 3*randn(n_samples,1); % 目标变量% 数据可视化
figure;
scatter3(X1,X2,y,40,y,'filled');
xlabel('特征1'); ylabel('特征2'); zlabel('目标值');
title('原始数据分布');%% 2. 自写决策树回归算法
tree = build_tree(X, y, 0, 5, 10, 5); % 参数:max_depth=5, min_samples_split=10, min_samples_leaf=5%% 3. 模型预测
Y_pred = predict_batch(tree, X_test);%% 4. 误差分析
mse = mean((Y_test - Y_pred).^2);
rmse = sqrt(mse);
fprintf('测试集RMSE: %.3f
', rmse);%% 5. 可视化预测结果
figure;
plot(Y_test, 'b', 'LineWidth', 1.5); hold on;
plot(Y_pred, 'r--', 'LineWidth', 1.5);
legend('真实值', '预测值');
title('回归预测结果对比');
xlabel('样本索引'); ylabel('目标值');%% 6. 决策边界可视化(二维特征)
figure;
[x1_grid, x2_grid] = meshgrid(linspace(2,10,50), linspace(1,7,50));
[X_mesh, Y_mesh] = meshgrid(x1_grid, x2_grid);
Z_pred = zeros(size(X_mesh));
for i = 1:numel(X_mesh)Z_pred(i) = predict_tree(tree, [X_mesh(i), Y_mesh(i)]);
end
surf(X_mesh, Y_mesh, Z_pred, 'EdgeColor', 'none');
hold on;
scatter3(X1,X2,y,40,y,'filled');
title('决策边界与数据分布');
xlabel('特征1'); ylabel('特征2'); zlabel('预测值');
三、关键函数实现(自写决策树)
function node = build_tree(X, y, depth, max_depth, min_samples_split, min_samples_leaf)% 创建节点结构node = struct();node.is_leaf = false;node.prediction = mean(y);node.samples = length(y);% 停止条件if depth >= max_depth || length(y) < min_samples_split || var(y) < 1e-6node.is_leaf = true;return;end% 寻找最佳分裂特征与阈值[best_feature, best_threshold] = find_best_split(X, y);% 递归构建子树left_idx = X(:, best_feature) <= best_threshold;right_idx = ~left_idx;node.feature = best_feature;node.threshold = best_threshold;node.left = build_tree(X(left_idx,:), y(left_idx), depth+1, max_depth, min_samples_split, min_samples_leaf);node.right = build_tree(X(right_idx,:), y(right_idx), depth+1, max_depth, min_samples_split, min_samples_leaf);
endfunction [best_feature, best_threshold] = find_best_split(X, y)n_features = size(X, 2);best_mse = inf;for f = 1:n_featuresthresholds = unique(X(:,f));for t = thresholds'left = X(:,f) <= t;right = ~left;if sum(left) == 0 || sum(right) == 0continue;endmse = (mean((y(left)-mean(y(left))).^2) * sum(left) + ...mean((y(right)-mean(y(right))).^2) * sum(right)) / length(y);if mse < best_msebest_mse = mse;best_feature = f;best_threshold = t;endendend
end
参考代码 matlab实现的决策树仿真的代码 www.youwenfan.com/contentcnm/82338.html
四、应用场景扩展
- 医疗诊断 输入:患者生理指标(年龄、血压、血糖等) 输出:疾病分类(患病/健康) 代码修改:替换
fisheriris为医疗数据集 - 工业故障预测 输入:传感器时序数据(振动、温度、电流) 输出:故障类型(轴承磨损/电机过热) 代码修改:调整
build_tree函数支持时序特征 - 金融风控 输入:客户信用数据(收入、负债、信用记录) 输出:贷款违约预测(0/1) 代码修改:添加特征工程(如收入负债比)
五、常见问题解决方案
| 问题现象 | 解决方法 |
|---|---|
| 过拟合严重 | 增加MinLeafSize或启用剪枝 |
| 预测值偏差大 | 检查特征相关性,添加交互项 |
| 训练时间过长 | 减少MaxNumSplits或启用并行 |
| 叶节点样本不均衡 | 设置SplitCriterion='gdi' |
通过上述代码框架,可快速实现决策树的分类与回归任务。实际应用中需根据数据特征调整参数,并通过交叉验证优化模型性能。