Just need help with number 7. thank you.

7. (15+2 for Excel work) Consider the following data for the average cost of various fuels and electricity for a twelve year period. The data measure the following:Electricity Residential rate per kilowatt hourNatural Gas Residential natural gas per 1000 cubic feetFuel Oil Residential fuel oil per gallonGasoline Regular gasoline per gallonYour model will attempt to predict residential electricity costs using the cost of the other fuels.a) (1) State whether you would expect a positive or negative relationship between electricity costs and each of the independent variables and explain why.b) (2) Use EXCEL, to create a correlation using all independent variables provided. Discuss all aspects of the correlation matrix, as it compares to your expectations in part (a) and what it tells you about the potential results of a regression analysis.c) (3) Use EXCEL to run a multiple regression to estimate electricity costs using all independent variables provided. Test if the overall regression is significant at the 5% level.d) (3) Test for the significance of each slope coefficient at the 5% level. (Incorporate all tests intoONE five step answer as in the slides.)e) (2) Based on your correlation matrix and the results of your regression do you suspect multicollinearity? If yes, which independent variable(s) to you think could be responsible?f) (2) Explain how you test for multicollinearity but do not test.g) (2) How could you rerun the regression taking into account the issues in the first regression?Which variable or variables would you keep as an explanatory variable(s), and why?

ECON 2202 ? Statistical Methods in Economics and Business II
EXCEL ASSIGNMENT
WINTER 2016
Due: by END OF CLASS (11:30) Thursday April 7, 2016
NO LATE ASSIGNMENTS ARE ACCEPTED.
IT MUST BE SUBMITTED TO COMPLETE THE COURSE REQUIREMENTS

1. Late assignments receive a grade of 0. Failure to submit this assignment leads to an FND in
the course.
2. You must use EXCEL in the questions where indicated for this assignment. Copy and paste the
tables into EXCEL from the text below. You must hand in all EXCEL output where indicated. .
3. The assignment otherwise is hand written. Please label the Excel outputs with the question
numbers.
4. Students may work together on the assignment, but each student MUST submit the
assignment independently.
5. Keep a copy of this assignment to study from, solutions will be posted.
6. All hypothesis tests MUST use the five steps. Formulae are necessary in all cases unless
otherwise noted. For this assignment ONLY you can follow these examples to keep the
document smaller and save time!! The conclusions MUST be complete IN CONTEXT as always.
F-test:
1.
Ho: regression is not significant, Ha: regression is significant
2.
?=0.05, one-tailed. F?,(k, n-k-1) = F0.05,(2,7) = 6.542
3.
Reject Ho if F0 &gt;F?
4.
F = MSR/MSE = 43.3648
5.
Since F0 &gt; F? (43.3648&gt; 6.542), reject Ho. The regression equation is significant; the relationship
between apples and oranges is significant.
OR if using p-values
P-value approach
1.
Ho: regression is not significant, Ha: regression is significant
2.
?=0.05, one-tailed
3.
Reject Ho if p-value &lt; ? = 0.05
4.
p-value = 0.0001
5.
Since p-value = 0.0001 &lt; ? = 0.05, reject Ho. The regression equation is significant; the
relationship between apples and oranges is significant.
1

1. (16 +3 for Excel work) A company is reviewing the performance of two different suppliers of
toner ink as their intention is to compare them and possible switch to the alternate one. Using
the following data for hours of toner use before empty and alpha = 0.05 in all cases answer the
following.
CURRENT
SUPPLIER
A
486

ALTERNATE
SUPPLIER
B
489

490
491
491
494
494
496
498
498

489
491
492
492
492
492
493
493

498
502
504
505
506
507
508
510
514
515
527

494
495
496
497
497
497
498
499
502
503
505

a. (5) Determine if the standard deviation of the Alternate Supplier B is significantly less than 5.
Use alpha = 0.05. (No EXCEL)
b. (3) Determine if the variation of the hours of toner use of Current Supplier A is greater than
that of Alternate Supplier B. Use EXCEL, include printout of result and DO NOT USE p-values
in test write up.
c. (3) Determine if there is any difference in the variation of the hours of toner use between the
two suppliers. Use EXCEL, include printout of result and DO NOT USE p-values in test write
up.
d. (3) Using the findings of part (c), test to determine if Current Supplier A has higher average
hours of toner use than Alternate Supplier B. Use EXCEL, include printout of result and use pvalues in test.
e. (2) What is your recommendation to the company with respect to the supplier to select?
Explain.
2. (3+1 for Excel work) Public health in a local hospital decided to test whether the length of time
that nurses wash their hands will alter the amount of bacteria remaining on them. To ensure
2

consistency in the testing nine nurses were selected and each washed their hands, in the first
trial for 2.5 seconds and in the second for 15 seconds. The amount of remaining bacteria was
measured in each case. Test the hypothesis that longer washing leads to less bacteria
remaining. Use alpha = 0.05.
CULTURE
1
66
132
120
187
190
17
33
92
1000

CULTURE
2
78
115
93
48
77
3
12
12
146

3

3. (15+1 for Excel work) Grading assignments is a real problem, taking a great deal of time and
while many students do a poor job, many copy and gain no benefit. An instructor who taught
three sections of a course carried out a test. One section had no assignments, one had
were reviewed after the first midterm. Assume each population is normally distributed, and
samples are independent random samples. Using EXCEL where possible perform ALL ANOVA
tests necessary to determine which approach if any led to higher grades. Use alpha = 0.05 in all
tests. Use EXCEL for the ANOVA test and use the p values in testing. For those tests for which
you cannot use EXCEL, use the standard write up as used in the practice assignments.
No Assignments
69
69
92
84
79
84
76
63
76
82
89
72
72
65
73
47
92
71
83
81
92
80
64
72
84
79
74
81

73
63
68
79
57
68
72
74
49
84
79
71
80
74
71
63
88
83
89
82
69
92
79
81
76
81
81
75

83
97
72
79
84
76
91
76
83
88
91
96
68
99
89
80
79
91
83
83
76
90
79
67
86
86
82
84

4

4. (9+1 for Excel work) The instructor in the previous example teaches the same sections of the
course four times a year and wanted to investigate if there was not only a difference in the way
assignments were treated but also in the time of year the course was given. The following data was
collected. Assume all populations are normally distributed, observations are independent, and
population variances are equal. Using EXCEL perform all remaining tests necessary to determine
whether the grades were influenced by assignment method and/or semester. Use alpha = 0.05 in
all tests. No post tests are required.
TERM
FALL

WINTER

EARLY SUMMER

LATE SUMMER

No Assignments
69
69
92
84
79
84
76
63
76
82
89
72
72
65
73
47
92
71
83
81
92
80
64
72
84
79
74
81

Assignments, Not Collected
73
63
68
79
57
68
72
74
49
84
79
71
80
74
71
63
88
83
89
82
69
92
79
81
76
81
81
75

Assignments, Collected
83
97
72
79
84
76
91
76
83
88
91
96
68
99
89
80
79
91
83
83
76
90
79
67
86
86
82
84

5

5. (16) Is the type of beverage ordered with lunch at a restaurant independent of the age of the
customer? A random sample of 309 lunch customers was taken and the results are as follows.
Using alpha = 0.01, test to determine if the two factors are related. Please show intermediate
calculation steps for part marks. (No Excel in this question.)

AGE
21-34
35-55
OVER 55

COFFEE/TE
A
26
41
24
91

SOFT
DRINK
95
40
13
148

OTHER
18
20
32
70

139
101
69
309

6. (13 +5 for Excel work) Can sales of major fast food chains be explained by the number of
individual locations the chain has? Consider the following data where sales are measured in \$
billions and locations are in thousands.

MCDONALD'S
BURGER KING
TACO BELL
PIZZA HUT
WENDY'S
KFC
SUBWAY
DAIRY QUEEN
A&amp;W

SALES
17.1
7.9
4.8
4.7
4.6
4
2.9
2.7
2.7

LOCATION
S
12.4
7.5
6.8
8.7
4.6
5.1
11.2
5.1
2.9

Use EXCEL, to create a scatter diagram, correlation matrix, and regression model. Conduct all
tests at the 5% significance level.
a) (2) Reviewing the data and the scatter diagram, does the correlation matrix result support the
relationship you would expect? Are there any other considerations or concerns that arise from
this?
b) (5) Conduct a test to determine whether the population correlation coefficient is significantly
different from zero. Do not use p values.
c) (4) Is there an observation which is impacting the relationship? Given your findings in parts (a)
and (b), explain what you might consider doing to further investigate this relationship? Rerun
the correlation matrix and the equation and compare the two models with respect to the
correlation of the variables, the fit (R2), and the significance of the slope. You do not need to
write out the tests just use the p values to explain what happens and why when comparing the
significance of the model or slope.
6

d) (2) Which model would you select to examine the relationship? Explain your choice.
7. (15+2 for Excel work) Consider the following data for the average cost of various fuels and
electricity for a twelve year period. The data measure the following:
Electricity
Natural Gas
Fuel Oil
Gasoline

Residential rate per kilowatt hour
Residential natural gas per 1000 cubic feet
Residential fuel oil per gallon
Regular gasoline per gallon

Your model will attempt to predict residential electricity costs using the cost of the other fuels.
a) (1) State whether you would expect a positive or negative relationship between electricity costs
and each of the independent variables and explain why.
b) (2) Use EXCEL, to create a correlation using all independent variables provided. Discuss all
aspects of the correlation matrix, as it compares to your expectations in part (a) and what it tells
you about the potential results of a regression analysis.
c) (3) Use EXCEL to run a multiple regression to estimate electricity costs using all independent
variables provided. Test if the overall regression is significant at the 5% level.
d) (3) Test for the significance of each slope coefficient at the 5% level. (Incorporate all tests into
ONE five step answer as in the slides.)
e) (2) Based on your correlation matrix and the results of your regression do you suspect
multicollinearity? If yes, which independent variable(s) to you think could be responsible?
f) (2) Explain how you test for multicollinearity but do not test.
g) (2) How could you rerun the regression taking into account the issues in the first regression?
Which variable or variables would you keep as an explanatory variable(s), and why?

Electricit Natural
Gasolin Fuel
y
Gas
e
Oil
2.54
1.29
0.39
0.21
3.51
1.71
0.57
0.31
4.64
2.98
0.86
0.44
5.36
3.68
1.19
0.61
6.2
4.29
1.31
0.76
6.86
5.17
1.22
0.68
7.18
6.06
1.16
0.65
7.54
6.12
1.13
0.69
7.79
6.12
1.12
0.61
7.41
5.83
0.86
0.34
7.41
5.54
0.9
0.42
7.49
4.49
0.9
0.33

7