Regression assumptions for normal people

The assumptions of linear regression are widely documented, and how to recognize violations of these assumptions is contained in basically every textbook on regression. Yet, the problems that violations of these assumptions cause for the estimation of regression coefficients is less well documented. The goal of this post is to explain (mainly to myself) why regression assumptions are actually important.
statistics
regression
Author

Thom Benjamin Volker

Published

September 30, 2025

L <- 100
x <- numeric(L)
for (i in 2:L) x[i] <- x[i-1] + rnorm(1, 0, 0.2)

cor(x, dplyr::lag(x), use = "complete.obs")
[1] 0.9824951
summary(lm(x ~ dplyr::lag(x, 1)))

Call:
lm(formula = x ~ dplyr::lag(x, 1))

Residuals:
     Min       1Q   Median       3Q      Max 
-0.67645 -0.12118  0.00309  0.11186  0.41306 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       0.03738    0.02898    1.29      0.2    
dplyr::lag(x, 1)  0.99437    0.01914   51.94   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1957 on 97 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.9653,    Adjusted R-squared:  0.9649 
F-statistic:  2698 on 1 and 97 DF,  p-value: < 2.2e-16