-
Notifications
You must be signed in to change notification settings - Fork 60
Expand file tree
/
Copy pathmotor_trend_mpg_data_analysis.Rmd
More file actions
119 lines (105 loc) · 4 KB
/
motor_trend_mpg_data_analysis.Rmd
File metadata and controls
119 lines (105 loc) · 4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
title: "Motor Trend MPG Data Analysis"
author: "Matthew Bender"
date: "September 19, 2015"
output: html_document
---
*Created with knitr*
#### Executive Summary
This report analyzed the relationship between transmission type (manual or
automatic) and miles per gallon (MPG). The report set out to determine which
transmission type produces a higher MPG. The `mtcars` dataset was used for this
analysis. A t-test between automatic and manual transmission vehicles shows that
manual transmission vehicles have a 7.245 greater MPG than automatic
transmission vehicles. After fitting multiple linear regressions, analysis
showed that the manual transmission contributed less significantly to MPG, only
an improvement of 1.81 MPG. Other variables, weight, horsepower, and number of
cylinders contributed more significantly to the overall MPG of vehicles.
#### Load Data
Load the dataset and convert categorical variables to factors.
```{r results='hide', message=FALSE}
library(ggplot2)
data(mtcars)
head(mtcars, n=3)
dim(mtcars)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- factor(mtcars$am)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
attach(mtcars)
```
#### Exploratory Analysis
**See Appendix Figure I** Exploratory Box graph that compares Automatic and Manual
transmission MPG. The graph leads us to believe that there is a significant
increase in MPG when for vehicles with a manual transmission vs automatic.
##### Statistical Inference
T-Test transmission type and MPG
```{r}
testResults <- t.test(mpg ~ am)
testResults$p.value
```
The T-Test rejects the null hypothesis that the difference between transmission
types is 0.
```{r}
testResults$estimate
```
The difference estimate between the 2 transmissions is 7.24494 MPG in favor of
manual.
##### Regression Analysis
Fit the full model of the data
```{r results='hide'}
fullModelFit <- lm(mpg ~ ., data = mtcars)
summary(fullModelFit) # results hidden
summary(fullModelFit)$coeff # results hidden
```
Since none of the coefficients have a p-value less than 0.05 we cannot conclude
which variables are more statistically significant.
Backward selection to determine which variables are most statistically
significant
```{r results='hide'}
stepFit <- step(fullModelFit)
summary(stepFit) # results hidden
summary(stepFit)$coeff # results hidden
```
The new model has 4 variables (cylinders, horsepower, weight, transmission). The
R-squared value of 0.8659 confirms that this model explains about 87% of the
variance in MPG. The p-values also are statistically significantly because they
have a p-value less than 0.05. The coefficients conclude that increasing the
number of cylinders from 4 to 6 with decrease the MPG by 3.03. Further
increasing the cylinders to 8 with decrease the MPG by 2.16. Increasing the
horsepower is decreases MPG 3.21 for every 100 horsepower. Weight decreases the
MPG by 2.5 for each 1000 lbs increase. A Manual transmission improves the MPG by
1.81.
#### Residuals & Diagnostics
Residual Plot
**See Appendix Figure II**
The plots conclude:
1. The randomness of the Residuals vs. Fitted plot supports the assumption of
independence
2. The points of the Normal Q-Q plot following closely to the line conclude that
the distribution of residuals is normal
3. The Scale-Location plot random distribution confirms the constant variance
assumption
4. Since all points are within the 0.05 lines, the Residuals vs. Leverage
concludes that there are no outliers
```{r}
sum((abs(dfbetas(stepFit)))>1)
```
#### Conclusion
There is a difference in MPG based on transmission type. A manual transmission
will have a slight MPG boost. However, it seems that weight, horsepower, &
number of cylinders are more statistically significant when determining MPG.
### Appendix Figures
#### I
```{r echo=FALSE}
boxplot(mpg ~ am,
xlab="Transmission Type (0 = Automatic, 1 = Manual)",
ylab="MPG",
main="MPG by Transmission Type")
```
#### II
```{r echo=FALSE}
par(mfrow = c(2, 2))
plot(stepFit)
```