The outlier should be well treated. It can affect the consequences of the model. Can use mean or other statistics to replace.
quantile to quantile plot; plot the data’s quantile in corresponding two distributions (to be examined); if they follow the same distribution, the line should be a ‘y=x’ line.
testing the goodness of the model. $\beta_i = 0$ or at least one $\ne0$
为什么线性回归试用MSE
Mallow's Cp is a statistical method used to assess the adequacy of a linear regression model. It is a criterion that compares the performance of a given regression model to that of a full model that includes all possible predictors.
The Mallow's Cp statistic is based on the residual sum of squares (RSS) of a regression model, the number of estimated parameters, and the sample size. It is defined as:
$Cp = (RSS_p / s^2) - n + 2p$
where $RSS_p$ is the residual sum of squares of the model with p predictors, $s^2$ is the estimate of the error variance, n is the sample size, and p is the number of predictors in the model.
**The Cp statistic measures the trade-off between model complexity and goodness of fit. A Cp value close to the number of predictors indicates a good balance between model complexity and goodness of fit. A Cp value greater than the number of predictors suggests that the model may be too complex, while a Cp value less than the number of predictors implies that the model may be too simple and could benefit from additional predictors.**
Mallow's Cp is commonly used in regression model selection to identify the best subset of predictors to include in a model. The goal is to choose a model with the fewest predictors that still adequately explains the variation in the response variable.
AIC
相比于最小绝对值法,最小二乘法的优点在于最优解唯一、求解方便和有好的解析性质,但缺点在于受异常值扰动影响大。
最小二乘法不永远是最优的方法。对于不同数据形式和建模需求,需要能自行选择合适的建模方式。
广义最小二乘法 GLS:generalised least squares
[计量经济学导论05:异方差 - 这个XD很懒 - 博客园](<https://www.cnblogs.com/lixddd/p/14367772.html>)
截面数据不容易出现自相关,时序数据中自相关比较常见: