Understanding the dataset
# vector y has missing values we need to impute
x = c(1,2,3,4,5,6,7,8,9,10)
y = c(11,12,18,14,17,NA,NA,19,NA,27)
z = c(19,11,2,14,20,4,9,10,18,1)
w = c(1,4,7,10,3,5,7,6,6,9)
# create a data frame for the data set
data = data.frame(x,y,z,w)
data
# x y z w
# 1 1 11 19 1
# 2 2 12 11 4
# 3 3 18 2 7
# 4 4 14 14 10
# 5 5 17 20 3
# 6 6 NA 4 5
# 7 7 NA 9 7
# 8 8 19 10 6
# 9 9 NA 18 6
# 10 10 27 1 9
Fit the model
Since y is highly correlated with x, we use the formula y~x for fitting the linear model
# fitting linear regression model of Y on X
lrm = lm(y~x, data = data)
Predict y values using linear model
These values can be used to impute the missing y values
# predict y using linear model
y_pred = predict(lrm, newdata = data)
# compare the predicted values and the original values
data_compare = data.frame(y_pred,y)
data_compare
# y_pred y
# 1 11.25225 11
# 2 12.76126 12
# 3 14.27027 18
# 4 15.77928 14
# 5 17.28829 17
# 6 18.79730 NA
# 7 20.30631 NA
# 8 21.81532 19
# 9 23.32432 NA
# 10 24.83333 27