只是指出这一点: statsmodel 's least squares fit does by default not include a constant. If we remove the constant from R'适合,我们得到与Python实现非常相似的结果,或者相反,如果我们向 statsmodel -fit添加一个常量,我们得到类似于 R 的结果:
删除 R 的 lm -call中的常量:
summary(lm(Temp ~ . - 1, data = train[,3:NCOL(train)]))
Call:
lm(formula = Temp ~ . - 1, data = train[, 3:NCOL(train)])
Residuals:
Min 1Q Median 3Q Max
-0.221940 -0.032347 0.002071 0.037048 0.167294
Coefficients:
Estimate Std. Error t value Pr(>|t|)
MEI 0.036076 0.027983 1.289 0.2079
CO2 0.004640 0.008945 0.519 0.6080
CH4 -0.002328 0.002132 -1.092 0.2843
N2O -0.014115 0.079452 -0.178 0.8603
`CFC-11` -0.031232 0.096693 -0.323 0.7491
`CFC-12` 0.035760 0.103574 0.345 0.7325
TSI -0.003283 0.036861 -0.089 0.9297
Aerosols 69.968040 33.093275 2.114 0.0435 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.07457 on 28 degrees of freedom
Multiple R-squared: 0.9724, Adjusted R-squared: 0.9645
F-statistic: 123.1 on 8 and 28 DF, p-value: < 2.2e-16
让我们为 statsmodel 的调用添加一个常量:
X_with_constant = sm.add_constant(X)
model = sm.OLS(y, X_with_constant).fit()
model.summary()
给我们相同的结果:
OLS Regression Results
Dep. Variable: Temp R-squared: 0.535
Model: OLS Adj. R-squared: 0.397
Method: Least Squares F-statistic: 3.877
Date: Tue, 02 Oct 2018 Prob (F-statistic): 0.00372
Time: 10:14:03 Log-Likelihood: 46.899
No. Observations: 36 AIC: -75.80
Df Residuals: 27 BIC: -61.55
Df Model: 8
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -17.8663 563.008 -0.032 0.975 -1173.064 1137.332
MEI 0.0361 0.029 1.265 0.217 -0.022 0.095
CO2 0.0048 0.011 0.451 0.656 -0.017 0.027
CH4 -0.0024 0.002 -0.950 0.351 -0.007 0.003
N2O -0.0130 0.088 -0.148 0.884 -0.194 0.168
CFC-11 -0.0332 0.116 -0.285 0.777 -0.272 0.205
CFC-12 0.0378 0.123 0.307 0.761 -0.215 0.290
TSI 0.0091 0.392 0.023 0.982 -0.795 0.813
Aerosols 70.4633 37.139 1.897 0.069 -5.739 146.666
Omnibus: 8.316 Durbin-Watson: 1.488
Prob(Omnibus): 0.016 Jarque-Bera (JB): 10.432
Skew: -0.535 Prob(JB): 0.00543
Kurtosis: 5.410 Cond. No. 1.06e+08