##################### # QERM 598 # # Lab 3 1.22.2007 # ##################### # 4 Things to do: # 1. Discussion of t-tests # 2. Introduce non-parametric two-sample test. # 3. Introduce paired-sample t-test. # 4. Presentations of HW problems ######################### # Discussion of t-tests # ######################### # What are the assumptions in a t-test? # Data are normally distributed. # Data are independent. # Data are identically distributed within each sample. # NB: This is a short list # 1. Using R for t-tests. (is really easy) x<-rnorm(20) y<-rnorm(20) t.test(x,y,alternative="two.sided",var.equal=T,conf.level=.95) # Do it out explicity var.pool<- (var(x)*19 + var(y)*19)/38*(1/20+1/20) sd.pool<-sqrt(var.pool) (mean(x)-mean(y))/sd.pool # 2. Is alpha accurate in that test? num.sims<-5000 num.rejections<-0 for(sim in 1:num.sims){ x<-rnorm(20) y<-rnorm(20) my.test<-t.test(x,y,alternative="two.sided", var.equal=T,conf.level=.95) if(my.test$p.value<.05) num.rejections<-num.rejections+1 } num.rejections/num.sims # near alpha? # 3. Consider another test and examine its power. # Now we will draw x and y from distributions having # different means. x<-rnorm(20,mean=0) y<-rnorm(20,mean=1) t.test(x,y,alternative="two.sided",var.equal=T,conf.level=.95) # How powerful is this test? num.sims<-5000 num.rejections<-0 for(sim in 1:num.sims){ x<-rnorm(20,mean=0) y<-rnorm(20,mean=1) my.test<-t.test(x,y,alternative="two.sided",var.equal=T,conf.level=.95) if(my.test$p.value<.05) num.rejections<-num.rejections+1 } num.rejections/num.sims # what is beta? (near high 80%'s?) # 4. Here is a classic example from Gossett. data(sleep) # load a data frame from R's library help(sleep) # see the details about this data set. t.test(extra ~ group, data = sleep, var.equal=T) # the above code also shows some of the use of the data.frame # object in R. Here is a breakdown of the above code. x<-sleep$extra[sleep$group==1] y<-sleep$extra[sleep$group==2] t.test(x,y,data=sleep,var.equal=T) # # How can we check some of our assumptions? # Independence: Should come from experimental design # Identically distributed: Should come from process knowledge, # but can also look at histograms and other covariates. # Normality: can look at some plots. par(mfrow=c(1,2)) # This plot compares the sample quantiles of x and y to # the theoretical quantiles of a standard normal. # Our assessment involves seeing if the points plotted fall # roughly along the plotted line. qqnorm(x) qqline(x) qqnorm(y) qqline(y) # We see some deviation, especially for x, but with such # small sample size, the normality assumption seems acceptable. # Other things we've not discussed: # balanced design (What happens when n_1 != n_2?) # pooled variance estimation (How can we use both samples to improve our # estimate of sigma^2?) # one-sided vs. two-sided tests (You might be able to figure this out # on your own in the R help file. # Fisher-Behrens problem (What happens when sigma_1 != sigma_2?) ################################################# # Non-parametric tests equivalents to t-test # # Mann-Whitney test (a.k.a. Wilcoxon Rank-Sum) # ################################################# # What if we're *NOT* willing to assume normality? # We can use the Mann-Whitney test, which makes a different set of # assumptions. The notes you have for homework describe how this # test works and discusses briefly the difference between # parametric and non-parametric tests. # All I'm showing you here is some code. # # The following data sets give the number of arson-related deaths in # the U.S. from 1996-1999 and from 2000-2005. The data for 2001 exclude # the deaths from the 9-11 attacks. Do these data provide evidence # that the number of arson-related deats have declined since Y2K? # Test this at the alpha = .1 level # Data available at: http://www.usfa.dhs.gov/statistics/arson/index.shtm late.nineties<-c(520,445,470,370) early.2000.s<-c(505,330,350,320,315) wilcox.test(late.nineties,early.2000.s,alternative="less",conf.level=.9) # NB: Here, the alternative refers to x's relationship to y, i.e. # "less" means the alternative states that the first data points given, in # this case late.nineties, is less than the second data points given, in # this case early.2000.s. # NB: The t-test is robust to some deviation from the normality # assumption, so it's often worth comparing your non-parametric # result to the t-test result. If the inferences disagree, it # can be helpful to understand what it is in your data that # causes the disparity, e.g. data being bounded somehow, or # very heavy-tailed, or something else. t.test(late.nineties,early.2000.s,var.equal=T,conf.level=.1) norm.90s<-(late.nineties-mean(late.nineties))/sd(late.nineties) norm.00s<-(early.2000.s-mean(early.2000.s))/sd(early.2000.s) # Is the t-test appropriate here? par(mfrow=c(1,2)) qqnorm(norm.90s); qqline(norm.90s) qqnorm(norm.00s); qqline(norm.00s) # Consider this cautionary note from the R help file on wilcox.test(): # # "The literature is not unanimous about the definitions of the # Wilcoxon rank sum and Mann-Whitney tests. The two most common # definitions correspond to the sum of the ranks of the first sample # with the minimum value subtracted or not: R subtracts and S-PLUS # does not, giving a value which is larger by m(m+1)/2 for a first # sample of size m. (It seems Wilcoxon's original paper used the # unadjusted sum of the ranks but subsequent tables subtracted the # minimum.)" # # In light of this, if you end up using the Mann-Whitney test, I recommend # doing some more reading about the topic and specifying your procedure. ######################### # Paired sample t-tests # ######################### # The following data from Statistics for Experimenters by # George E. P. Box (1978), are the amount of wear of the soles of # shoes worn by 10 boys. Each boy wore a special pair of shoes, # the sole of one shoe was made with material A and the sole of the # other with material B, which were assigned to the left and right shoes # randomly. We want to test whether there was no difference # between materials A and B. # A B # 1 14.0 13.2 # 2 8.8 8.2 # 3 11.2 10.9 # 4 14.2 14.3 # 5 11.8 10.7 # 6 6.4 6.6 # 7 9.8 9.5 # 8 11.3 10.8 # 9 9.3 8.8 # 10 13.6 13.3 # Why do you think this experiment was run this way? Why not # just assign the two different materials randomly among the # ten boys? What is gained by this design? # # How do we adjust our analysis to reflect this pairing in the data? # We us a paired-sample t-test, which is basically a one-sample # t-test on the between-foot differences for each boy. # NB: to do this kind of test, we need some kind of pairing to # exist in how the data are collected. A<-c(14,8.8,11.2,14.2,11.8,6.4,9.8,11.3,9.3,13.6) B<-c(13.2,8.2,10.9,14.3,10.7,6.6,9.5,10.8,8.8,13.3) t.test(A,B,paired=T) t.test(A,B,var.equal=T) cor(A,B) plot(A,B) # How comfortable are we with our t-test assumptions? d<-A-B d.norm<-(d-mean(d))/sd(d) qqnorm(d.norm) qqline(d.norm) # In your homework, you will learn about a non-parametric test # that can be used for paired sample data like these.