机器学习线性回归案例讲解_09机器学习实战之简单线性回归

基本概念

1. 介绍：

回归(regression) Y变量为连续数值型(continuous numerical variable)

如：房价，人数，降雨量

分类(Classification): Y变量为类别型(categorical variable)

如：颜色类别，电脑品牌，有无信誉

2. 简单线性回归(Simple Linear Regression)

2.1 很多做决定过过程通常是根据两个或者多个变量之间的关系

2.3 回归分析(regression analysis)用来建立方程模拟两个或者多个变量之间如何关联

2.4 被预测的变量叫做：因变量(dependent variable), y, 输出(output)

2.5 被用来进行预测的变量叫做：自变量(independent variable), x, 输入(input)

3. 简单线性回归介绍

3.1 简单线性回归包含一个自变量(x)和一个因变量(y)

3.2 以上两个变量的关系用一条直线来模拟

3.3 如果包含两个以上的自变量，则称作多元回归分析(multiple regression)

4. 简单线性回归模型

4.1 被用来描述因变量(y)和自变量(X)以及偏差(error)之间关系的方程叫做回归模型

4.2 简单线性回归的模型是:

其中：参数偏差

5. 简单线性回归方程

E(y) = β0+β1x

这个方程对应的图像是一条直线，称作回归线

其中，β0是回归线的截距

β1是回归线的斜率

E(y)是在一个给定x值下y的期望值(均值)

ε服从标准正太分布，均值为0

6. 正向线性关系

7. 负向线性关系

8. 无关系

9. 估计的简单线性回归方程

ŷ=b0+b1x

这个方程叫做估计线性方程(estimated regression line)

其中，b0是估计线性方程的纵截距

b1是估计线性方程的斜率

ŷ是在自变量x等于一个给定值的时候，y的估计值

10. 线性回归分析流程

11. 关于偏差ε的假定

11.1 是一个随机的变量，均值为0

11.2 ε的方差(variance)对于所有的自变量x是一样的

11.3 ε的值是独立的

11.4 ε满足正态分布

例子

简单线性回归模型举例：

汽车卖家做电视广告数量与卖出的汽车数量：

第一步：如何练处适合简单线性回归模型的最佳回归线？

使sum of squares最小

第二步：计算

分子 = (1-2)(14-20)+(3-2)(24-20)+(2-2)(18-20)+(1-2)(17-20)+(3-2)(27-20)

= 6 + 4 + 0 + 3 + 7

= 20

分母 = (1-2)^2 + (3-2)^2 + (2-2)^2 + (1-2)^2 + (3-2)^2

= 1 + 1 + 0 + 1 + 1

b1 = 20/4 =5

b0 = 20 - 5*2 = 20 - 10 = 10

推导过程

代码实现

In [3]:

import numpy as np

import matplotlib.pyplot as plt

In [4]:

x = np.array([1, 2, 3, 4, 5])

y = np.array([1, 3, 2, 3, 5])

In [10]:

plt.scatter(x, y)

plt.axis([0, 6, 0, 6])

plt.show()

In [11]:

x_mean = np.mean(x)

y_mean = np.mean(y)

In [12]:

numerator = 0.0 # 分子

denominator = 0.0 # 分母

In [13]:

for x_i, y_i in zip(x, y):

numerator += (x_i - x_mean) * (y_i - y_mean)

denominator += (x_i - x_mean) ** 2

In [14]:

a = numerator / denominator

b = y_mean - a * x_mean

In [15]:

Out[15]:

0.8

In [16]:

Out[16]:

0.39999999999999947

In [17]:

y_hat = a * x + b

In [19]:

plt.scatter(x, y)

plt.plot(x, y_hat, color='r')

plt.axis([0, 6, 0, 6])

plt.show()

In [20]:

x_predict = 6

y_predict = a * x_predict + b

y_predict

Out[20]:

5.2

In [28]:

from ml09simpleLinearRegression1 import SimpleLinearRegression1

In [29]:

reg1 = SimpleLinearRegression1()

reg1.fit(x, y)

Out[29]:

SimpleLinearRegression()

In [30]:

reg1.predict(np.array([x_predict]))

Out[30]:

array([5.2])

In [31]:

reg1.a_

Out[31]:

0.8

In [32]:

reg1.b_

Out[32]:

0.39999999999999947

In [33]:

y_hat1 = reg1.predict(x)

In [34]:

plt.scatter(x, y)

plt.plot(x, y_hat1, color='r')

plt.axis([0, 6, 0, 6])

plt.show()

importnumpy as npclassSimpleLinearRegression1:def __init__(self):"""初始化Simple Linear Regression模型"""self.a_=None

self.b_=Nonedeffit(self, x_train, y_train):"""根据训练数据集x_train, y_train训练Simple Linear Regression模型"""

assert x_train.ndim == 1, \"Simple Linear Regressor can only solve single feature training data."

assert len(x_train) ==len(y_train), \"the size of x_train must be equal to the size of y_train"x_mean=np.mean(x_train)

y_mean=np.mean(y_train)#numerator = 0.0 # 分子

#denominator = 0.0 # 分母

#for x_i, y_i in zip(x_train, y_train):

#numerator += (x_i - x_mean) * (y_i - y_mean)

#denominator += (x_i - x_mean) ** 2

#self.a_ = numerator / denominator

#self.b_ = y_mean - self.a_ * x_mean

"""使用向量点积，代替上面的for循环"""self.a_= (x_train - x_mean).dot(y_train - y_mean) / (x_train - x_mean).dot(x_train -x_mean)

self.b_= y_mean - self.a_ *x_meanreturnselfdefpredict(self, x_predict):"""给定待预测数据集x_predict，返回表示x_predict的结果向量"""

assert x_predict.ndim == 1, \"Simple Linear Regressor can only solve single feature training data."

assert self.a_ is not None and self.b_ is notNone, \"must fit before predict!"

return np.array([self._predict(x) for x inx_predict])def_predict(self, x_single):"""给定单个待预测数据x，返回x的预测结果值"""

return self.a_ * x_single +self.b_def __repr__(self):return "SimpleLinearRegression()"

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/409928.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

机器学习线性回归案例讲解_09机器学习实战之简单线性回归

相关文章

silverlight Image Source URI : 一个反斜杠引发的血案

python编码和解码_Python中的编码与解码(转)

[html] 怎样使用iframe刷新父级页面

Nginx编译安装和平滑升级

python 调用shell 不阻塞_遇到问题---python调用shell脚本时subprocess.check_call不阻塞

android ListView详解

[html] iframe在更改了src之后，不出现后退或者前进按钮怎么解决？

python一次性输入3个数_python实现输入数字的连续加减方法

gd动态曲线 php_php中用GD绘制折线图

Nginx网站用户认证

STL源码剖析学习二：空间配置器（allocator）

[html] iframe如何自动调整高度？

python selenium 处理弹窗_python+selenium 抓取弹出对话框信息

Nginx基于域名的虚拟主机

【100题】第三十四实现一个队列

[html] 如何禁止web端的页面缩放？

centos安装后两个启动项、_Windows安装Centos7双系统后Windows启动项消失

如何给定两个gps坐标算出航向角_机器人开发如何配置ROS中的TF变换关系？

Tomcat架构与原理

[html] 微软雅黑是有版权的，在页面中使用font-family:Microsoft YaHei会不会有版权问题呢？