<\body> ||<\author-address> >> <\problem> A traffic engineering study on traffic delay was conducted at intersetions with signals on urban streets. Three types of traffic signals were utilized in the study: <\enumerate-numeric> pretimed; semi-actuated; fully-actuated. Five intersections were used for each type of signal. The measure of traffic delay used in the study was the average stopped time per vehicle at each of the intersections (seconds/vehicle). The data follow: ||>|||>|||>|||>|||>|||>>>>|Source: W. Reilly, C. Gardner, and J. Kell (1976). A technique for measurement of delay at intersections. , Federal Highway Administration, Office of R & D, Washington, DC.> <\enumerate-alpha> Write the linear statistical model for this study, and explain the model components. State the assumptions necessary for an analysis of variance of the data. Compute the analysis of variance for the data. Compute the least squares mean of the traffic delay and their standard error for each signal type. Compute the confidence interval estimates of the signal type means. Since the emphasis of our solution is the R implementation, we define the model in R in several session fragments, interspersed with minimal comments. There is one factor. the type of the traffic signal. The factor has three levels. Let us define this factor in R: <\input| >> TrafficLightType \- as.factor(c("Pretimed", "Semi-actuated", "Fully activated")) <\input| >> TrafficLightType <\output> [1] Pretimed \ \ \ \ \ \ \ Semi-actuated \ \ Fully activated Levels: Fully activated Pretimed Semi-actuated <\input| >> \; > The numerical response is the average stopping time. We define tye response directly in R. There are multiple ways to achieve this step, and we chose one using just the most primitive operations: , and . <\input| >> Pretimed \- c(36.6, 39.2, 30.4, 37.1, 34.1) <\input| >> SemiActivated \- c(17.5, 30.6, 18.7, 25.7, 22.0) <\input| >> FullyActivated \- c(15.0, 10.4, 18.9, 10.5, 15.2) <\input| >> AverageStoppedTime \- c(Pretimed, SemiActivated, FullyActivated) <\input| >> N \- length(Pretimed) <\input| >> TrafficLightType \- c(rep("Pretimed", N), rep("Semi-activated",N), rep("Fully activated", N)) <\input| >> StudyData \- data.frame(TrafficLightType, AverageStoppedTime) <\input| >> StudyData <\output> \ \ \ TrafficLightType AverageStoppedTime 1 \ \ \ \ \ \ \ \ \ Pretimed \ \ \ \ \ \ \ \ \ \ \ \ \ \ 36.6 2 \ \ \ \ \ \ \ \ \ Pretimed \ \ \ \ \ \ \ \ \ \ \ \ \ \ 39.2 3 \ \ \ \ \ \ \ \ \ Pretimed \ \ \ \ \ \ \ \ \ \ \ \ \ \ 30.4 4 \ \ \ \ \ \ \ \ \ Pretimed \ \ \ \ \ \ \ \ \ \ \ \ \ \ 37.1 5 \ \ \ \ \ \ \ \ \ Pretimed \ \ \ \ \ \ \ \ \ \ \ \ \ \ 34.1 6 \ \ \ Semi-activated \ \ \ \ \ \ \ \ \ \ \ \ \ \ 17.5 7 \ \ \ Semi-activated \ \ \ \ \ \ \ \ \ \ \ \ \ \ 30.6 8 \ \ \ Semi-activated \ \ \ \ \ \ \ \ \ \ \ \ \ \ 18.7 9 \ \ \ Semi-activated \ \ \ \ \ \ \ \ \ \ \ \ \ \ 25.7 10 \ \ Semi-activated \ \ \ \ \ \ \ \ \ \ \ \ \ \ 22.0 11 \ Fully activated \ \ \ \ \ \ \ \ \ \ \ \ \ \ 15.0 12 \ Fully activated \ \ \ \ \ \ \ \ \ \ \ \ \ \ 10.4 13 \ Fully activated \ \ \ \ \ \ \ \ \ \ \ \ \ \ 18.9 14 \ Fully activated \ \ \ \ \ \ \ \ \ \ \ \ \ \ 10.5 15 \ Fully activated \ \ \ \ \ \ \ \ \ \ \ \ \ \ 15.2 <\input| >> \; > Thus, the statistical model for this study is: <\equation*> AverageStoppedTime = >+Error In R the model is expressed as a formula: <\input| >> fmla \- AverageStoppedTime ~ TrafficLightType <\input| >> fmla <\output> AverageStoppedTime ~ TrafficLightType <\input| >> \; > allows for easy integration of graphics with the document. This is how it is done: <\input| >> X11(pointsize=6);plot(StudyData);v() <\output> |ps>||||||> <\input| >> \; > We note that the basic mechanics of incorporating R graphics in documents is to do the plotting as usual, except at the end we need to call function . This function is not part of standard R distribution, but added by when it launches an R session. The call to is a minor adjustment which reduces the font size used in the plot. Without it, the graphics appear truncated. The usual assumptions of analysis of variance apply. Hence, <\enumerate-roman> We assume that the experimental design is a completely randomized design. We assume that the average stopping time is normally distributed. This, of course, can only be approximately true due to the fact that time is always positive.\ We note that the stopping time for an individual vehicle arriving at an intersection at random time would be modeled a uniform distribution perhaps with some Gaussian noise added. Hence, the sample of cards used to compute each individual mean in the table should be large enough to ensure that the Central Limit Theorem is applicable. We note that the average of 6 normal distributions is approximately normal for most practical applications. As we do not know the sample size, we can only hope that the researchers used a reasonable sample size. We assume that each measurement is a mean of a larger number of cars passing through an intersection and a pre-determined number of cars were in each experimental unit to ensure parity of information. We assume that the variance of the average stopping time does not depend on the intersection or traffic light type. We use the R function to compute the analysis of variance table for this example, using the defaults for all arguments except the first two (formula and data). <\input| >> aov \- aov(fmla, StudyData) <\input| >> summary(aov) <\output> \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Df \ Sum Sq Mean Sq F value \ \ \ Pr(\F) \ \ \ TrafficLightType \ 2 1164.76 \ 582.38 \ 32.992 1.328e-05 *** Residuals \ \ \ \ \ \ \ 12 \ 211.83 \ \ 17.65 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ --- Signif. codes: \ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1\ <\input| >> \; > We see that the we should reject > and accept the alternative hypothesis that the means a different, with significance level of approximately . \; The table returned by contains both answers. <\input| >> aov$coefficients <\output> \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (Intercept) \ \ \ \ \ \ TrafficLightTypePretimed\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 14.00 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 21.48\ TrafficLightTypeSemi-activated\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 8.90\ <\input| >> levels(StudyData$TrafficLightType) <\output> [1] "Fully activated" "Pretimed" \ \ \ \ \ \ \ "Semi-activated"\ <\input| >> coefficients(aov) <\output> \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (Intercept) \ \ \ \ \ \ TrafficLightTypePretimed\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 14.00 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 21.48\ TrafficLightTypeSemi-activated\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 8.90\ <\input| >> \; \; > We note that these are group means of the treatment groups. However, we should look carefully at the order of the levels of the factor . R auto-ordered them starting with . The named value is the mean of the group corresponding to the first treatment level. The named value plus the intercept is the group mean of the second treatment group, , and finally, the third value corresponds to the level .\ We note that the generic function can be used to extract the coefficients from the structure returned by . We note that the generic function also can be used to compute the treatment means. The are simply differences between the group means and the overall mean. <\input| >> tbl \- model.tables(aov, type="means", se=T) <\input| >> tbl <\output> Tables of means Grand mean \ \ \ \ \ \ \ \ \ 24.12667\ \; \ TrafficLightType\ TrafficLightType Fully activated \ \ \ \ \ \ \ Pretimed \ Semi-activated\ \ \ \ \ \ \ \ \ \ \ 14.00 \ \ \ \ \ \ \ \ \ \ 35.48 \ \ \ \ \ \ \ \ \ \ 22.90\ \; Standard errors for differences of means \ \ \ \ \ \ \ \ TrafficLightType \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 2.657 replic. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 5 <\input| >> \; > We note that the value of depends on the choice of contrasts. This solution assumes that the default setting for the contrasts is used. The default setting is a . Since contrasts appear in Chapter 3, we discuss the subject of contrasts no further. All essential components returned by (and some of the less essential ones) are returned by a call to : <\input| >> attributes(aov) <\output> $names \ [1] "coefficients" \ "residuals" \ \ \ \ "effects" \ \ \ \ \ \ "rank" \ \ \ \ \ \ \ \ \ [5] "fitted.values" "assign" \ \ \ \ \ \ \ "qr" \ \ \ \ \ \ \ \ \ \ \ "df.residual" \ \ [9] "contrasts" \ \ \ \ "xlevels" \ \ \ \ \ \ "call" \ \ \ \ \ \ \ \ \ "terms" \ \ \ \ \ \ \ [13] "model" \ \ \ \ \ \ \ \; $class [1] "aov" "lm"\ <\input| >> \; > For many of these components, there exists a generic function to extract or analyze them. The the name of the generic either coincides with or contains the name of the component, and thus it can be located with standard help facilities, such as and . Confidendce intervals can be computed using the generic function . Let us review the methods of this generic: \; <\input| >> methods(confint) <\output> [1] confint.default confint.glm* \ \ \ confint.lm* \ \ \ \ confint.nls* \ \ \; \ \ \ Non-visible functions are asterisked <\input| >> \; > As we can see, any model fitted with , and can be passed as an argument to . In particular, the model originated by can be used as an argument.\ We conclude the solution of our excercise by making a call to . The problem requests confidence intervals for the means, and thus we pass the extra argument . The argument has a default value of and is not required in our situation. <\input| >> ivals \- confint(aov, level=.95) <\input| >> ivals <\output> \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 2.5 % \ \ 97.5 % (Intercept) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 9.906112 18.09389 TrafficLightTypePretimed \ \ \ \ \ \ 15.690368 27.26963 TrafficLightTypeSemi-activated \ 3.110368 14.68963 <\input| >> attributes(ivals) <\output> $dim [1] 3 2 \; $dimnames $dimnames[[1]] [1] "(Intercept)" \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ "TrafficLightTypePretimed" \ \ \ \ \ [3] "TrafficLightTypeSemi-activated" \; $dimnames[[2]] [1] "2.5 %" \ "97.5 %" <\input| >> class(ivals) <\output> [1] "matrix" <\input| >> \; > Again, the row labeled refers to the first level of the factor, i.e. . We note that the class of the answer is and the rows of the matrix are named after the levels of the applicable factor: . <\enumerate-alpha> \; \; <\initial> <\collection> <\references> <\collection> > > > > > > > > > <\auxiliary> <\collection> <\associate|table> |Technical Report FHWA-RD-76-135>, Federal Highway Administration, Office of R & D, Washington, DC.|> <\associate|toc> |math-font-series||1Formulation of the problem> |.>>>>|> |math-font-series||2The statistical model> |.>>>>|> |math-font-series||3Plot of the data> |.>>>>|> |math-font-series||4Assumptions> |.>>>>|> |math-font-series||5Computation of analysis of variance (ANOVA)> |.>>>>|> |math-font-series||6Computation of the least square means and standard errors> |.>>>>|> |6.1The least squares means |.>>>>|> > |math-font-series||7Confidence intervals of the means> |.>>>>|>