cars_luxury.csv
)price
: Price of the car in dollars.mileage
: Car mileage.luxury
: If the car is a luxury car: “\(\texttt{yes}\)”or “\(\texttt{no}\)”badge
: Badge indicating if the car is considered some type of deal, that can be: “\(\texttt{Good Deal}\)”, “\(\texttt{Great Deal}\)”, “\(\texttt{No Badge}\)” or “\(\texttt{Fair Price}\)”.luxury
, into the multiple regression model.luxury
?mileage
, price
and luxury
luxury
is a categorical variable (\(\texttt{"yes"}\) or \(\texttt{"no"}\) in this data set).luxury
into a quantitative variable where \(\texttt{1 = "yes"}\), \(\texttt{0 = "no"}\).# Remove scientific notation
options(scipen = 999)
# Regression Model
lm1 = lm(price ~ mileage + luxury, data = cars_luxury)
summary(lm1)
Call:
lm(formula = price ~ mileage + luxury, data = cars_luxury)
Residuals:
Min 1Q Median 3Q Max
-24018 -6204 -1919 3727 78453
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25422.756210 508.681485 49.98 <0.0000000000000002 ***
mileage -0.185784 0.008688 -21.39 <0.0000000000000002 ***
luxuryyes 12986.388662 569.304402 22.81 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11010 on 2085 degrees of freedom
Multiple R-squared: 0.3439, Adjusted R-squared: 0.3433
F-statistic: 546.4 on 2 and 2085 DF, p-value: < 0.00000000000000022
\[ \texttt{price} = 25,423 - 0.19\times \texttt{mileage} + 12,986 \times \texttt{luxury} \]
How can we interpret the coefficients?
intercept
: For a car with zero mileage and luxury
= \(\texttt{"no"}\) = 0, the average selling price is equal to US$ 25,423.
mileage
: For a fixed type of car, for each extra increase in mileage (in miles), there will be a decrease of US$ 0.19 in the price of the car.
luxury
: For cars with the same mileage, the added price of being a luxury car (luxury
= \(\texttt{"yes"}\) = 1) is US$ 12,986.
Important: When we add a categorical variable to the regression model, the intercept is also referred to as the baseline. The effect of the categorical variable is also known as the offset.
By adding a categorical variable, we can also interpret this as different regression models depending on the number of groups.
To do so we add the effect of the categorical variable to the intercept.
luxury
= \(\texttt{"yes"}\) = 1 \[
\begin{align}
\texttt{price} &= 25,423 - 0.19\times \texttt{mileage} + 12,986 \times (1) \\
&= (25,423 + 12,986) - 0.19\times \texttt{mileage} \\
&= 38,409 - 0.19\times \texttt{mileage} \\
\end{align}
\]
luxury
= \(\texttt{"no"}\) = 0 \[
\begin{align}
\texttt{price} &= 25,423 - 0.19\times \texttt{mileage} + 12,986 \times (0) \\
&= 25,423 - 0.19\times \texttt{mileage} \\
\end{align}
\]
2.5 % 97.5 %
(Intercept) 24425.1797220 26420.3326977
mileage -0.2028214 -0.1687463
luxuryyes 11869.9244244 14102.8528996
Yes, with 95% confidence we can conclude that the price of a luxury car is different from a non luxury one.
What is estimated price of luxury vehicle that has as mileage of 50000.
badge
contains for groups or levels: “\(\texttt{Good Deal}\)”, “\(\texttt{Great Deal}\)”, “\(\texttt{No Badge}\)” or “\(\texttt{Fair Price}\)”.
Call:
lm(formula = price ~ mileage + badge, data = cars_luxury)
Residuals:
Min 1Q Median 3Q Max
-19961 -6981 -2395 3629 82508
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35931.481715 1140.032009 31.518 < 0.0000000000000002 ***
mileage -0.209568 0.009527 -21.997 < 0.0000000000000002 ***
badgeGood Deal -3556.561624 1057.385699 -3.364 0.000783 ***
badgeGreat Deal -8988.415770 1062.334934 -8.461 < 0.0000000000000002 ***
badgeNo Badge -9930.896296 1143.713386 -8.683 < 0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11860 on 2083 degrees of freedom
Multiple R-squared: 0.2388, Adjusted R-squared: 0.2374
F-statistic: 163.4 on 4 and 2083 DF, p-value: < 0.00000000000000022
intercept
(baseline): For a car with zero mileage and with a fair price badge, the average selling price is equal to US$ 35,932 (\(\texttt{Good Deal} = 0\),\(\texttt{Great Deal} = 0\), \(\texttt{No Badge} = 0\)).
mileage
: For a car with a fixed badge, for each extra increase in mileage (in miles), there will be a decrease of US$ 0.21 in the price of the car.
\(\texttt{Good Deal} = 1\), remainig levels equal to zero: For cars with the same mileage, there will be a decrease in their price if they have a good deal badge of US$ 3,557 compared to the baseline, that is, cars with a fair price badge.
\(\texttt{Great Deal} = 1\), remainig levels equal to zero: For cars with the same mileage, there will be a decrease in their price if they have a great deal badge of US$ 8,988 compared to the baseline, that is, cars with a fair price badge.
\(\texttt{No Badge} = 1\), remainig levels equal to zero: For cars with the same mileage, there will be a decrease in their price if they have no badge of US$ 8,988 compared to the baseline, that is, cars with a fair price badge.
\[ \texttt{price} = \beta_0 + \beta_1\texttt{mileage} + \beta_2\texttt{luxury} + \beta_3 (\texttt{luxury} \times \texttt{mileage}) + e \]
If we have a non-luxury car, then luxury
= \(\texttt{"no"} = 0\), so the \(\beta_2\) and \(\beta_3\) terms cancel out: \[
\texttt{price} = \beta_0 + \beta_1\texttt{mileage} + e
\]
If we have a luxury car, then luxury
= \(\texttt{"yes"} = 1\), so we get both a different intercept and a different slope for mileage: \[
\texttt{price} = (\beta_0 + \beta_2) + (\beta_1 + \beta_3) \texttt{mileage} + e
\]
Call:
lm(formula = price ~ mileage * luxury, data = cars_luxury)
Residuals:
Min 1Q Median 3Q Max
-25662 -6055 -2066 3563 83626
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23893.601384 545.040269 43.838 < 0.0000000000000002 ***
mileage -0.154697 0.009595 -16.122 < 0.0000000000000002 ***
luxuryyes 19772.433662 1092.529243 18.098 < 0.0000000000000002 ***
mileage:luxuryyes -0.155457 0.021457 -7.245 0.000000000000606 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10880 on 2084 degrees of freedom
Multiple R-squared: 0.36, Adjusted R-squared: 0.3591
F-statistic: 390.8 on 3 and 2084 DF, p-value: < 0.00000000000000022
How do we interpret this model?
intercept
(baseline), luxury
= \(\texttt{"no"}\) = 0: For a non-luxury car with zero mileage, the average selling price is equal to US$ 23,894.
Now we have two cases:
luxury
= \(\texttt{"no"}\) = 0:
mileage
: For each extra increase in mileage (in miles), there will be a decrease of US$ 0.15 in the price of non-luxury cars.
luxury
= \(\texttt{"yes"}\) = 1:
mileage
: For each extra increase in mileage (in miles), there will be a decrease of US$ 0.16 in the price of luxury cars on top of the decrease of US$ 0.15 of non-luxury cars.
We also have the following interpretation:
luxury
= \(\texttt{"yes"}\) = 0 \[
\begin{align}
\texttt{price} &= 23,894 - 0.15\times \texttt{mileage} + 19,772 (0) - 0.16\times \texttt{mileage} (0) \\
&= 23,894 - 0.15\times \texttt{mileage}
\end{align}
\]
luxury
= \(\texttt{"yes"}\) = 1 \[
\begin{align}
\texttt{price} &= 23,894 - 0.15\times \texttt{mileage} + 19,772 (1) - 0.16\times \texttt{mileage} (1) \\
&= (23,894 + 19,772) - (0.15 + 0.16) \times \texttt{mileage} \\
&= 43,666 - 0.31 \times \texttt{mileage} \\
\end{align}
\]
We have that not only the intercept change but also the slope.
2.5 % 97.5 %
(Intercept) 22824.7212986 24962.4814687
mileage -0.1735141 -0.1358795
luxuryyes 17629.8713295 21914.9959940
mileage:luxuryyes -0.1975365 -0.1133770
Yes, with 95% confidence we can conclude that the price of a luxury car depreciates faster than a non-luxury one.
What is estimated price of luxury vehicle that has as mileage of 50000.