cookies | happiness |
---|---|
1 | 0.1 |
2 | 2 |
3 | 1 |
4 | 2.5 |
5 | 3 |
6 | 1.3 |
7 | 1.9 |
8 | 2.4 |
9 | 1.8 |
10 | 3 |
happiness
is the response variable (\(Y\)).cookies
is the predictor variable (\(X\)).cookies
variable.movie_1990_data.csv
)Adj_Budget
.Adj_Revenue
.tidyverse
, then we use the ggplot
function to make plot between Adj_Budget
and Adj_Revenue
We encode the model below in R.
\[ \texttt{Adj_Revenue} = \beta_0 + \beta_1\cdot\texttt{Adj_Budget} + e \]
# The model
# Revenue = intercept + slope*Budget + e
lm1 <- lm(Adj_Revenue ~ Adj_Budget, data = movie_1990_data)
summary(lm1)
Call:
lm(formula = Adj_Revenue ~ Adj_Budget, data = movie_1990_data)
Residuals:
Min 1Q Median 3Q Max
-262.40 -38.01 -16.39 19.24 619.23
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.66095 3.15001 7.194 1.03e-12 ***
Adj_Budget 1.11043 0.03738 29.709 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 79.78 on 1366 degrees of freedom
Multiple R-squared: 0.3925, Adjusted R-squared: 0.3921
F-statistic: 882.6 on 1 and 1366 DF, p-value: < 2.2e-16
Let’s visualize this model.
From the confidence interval we have that:
predict
function in Rsummary
function.imdbRating
which encodes the different IMDB ratings in the data (1-10).# The model
# Revenue = intercept + slope*Budget + slope*Rating + e
lm2 <- lm(Adj_Revenue ~ Adj_Budget + imdbRating, data = movie_1990_data)
summary(lm2)
Call:
lm(formula = Adj_Revenue ~ Adj_Budget + imdbRating, data = movie_1990_data)
Residuals:
Min 1Q Median 3Q Max
-256.79 -41.25 -14.97 26.55 598.53
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -136.50696 14.64962 -9.318 <2e-16 ***
Adj_Budget 1.09103 0.03585 30.433 <2e-16 ***
imdbRating 24.09935 2.17050 11.103 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 76.43 on 1365 degrees of freedom
Multiple R-squared: 0.4428, Adjusted R-squared: 0.442
F-statistic: 542.5 on 2 and 1365 DF, p-value: < 2.2e-16
\[ \texttt{Adj_Revenue} = -136.507 + 1.091 \cdot\texttt{Adj_Budget} + 24.099\cdot\texttt{imdbRating} \]
2.5 % 97.5 %
(Intercept) -165.245169 -107.768758
Adj_Budget 1.020706 1.161364
imdbRating 19.841475 28.357234
residual standard error
when down to \(\texttt{76.43}\).# We use the confint() function to get the confidence interval
predict(lm2, list(Adj_Budget = 25, imdbRating = 5.4))
1
20.90542