################# # QERM 598 HW 2 # # due 1.22.2007 # ################# # 1. Read .pdf file of lecture notes found on the website under # Hypothesis_Notes.pdf. The R code necessary for the randomization # follows: # read in needle length data for each aspect length.north<-c(3.6,4.5,6.9,7.2,8.2,8.3,8.4,8.5,8.8, 8.8,9.4,9.7,10.0,11.3,11.4) length.south<-c(7.1,8.0,8.4,8.6,8.7,9.6,10.6,10.8,10.9, 11.1,11.5,11.7,12.3,12.5,12.6) # combine length vectors into one vector length<-c(length.north,length.south) # read in vectors of aspect labels N=north, S=south aspect.north<-rep("N",15) aspect.south<-rep("S",15) # combine aspect vectors into one vector aspect<-c(aspect.north,aspect.south) # combine both vectors into one data frame. # (this step isn't really necessary, but can be useful # down the line. whitepine.data<-data.frame(aspect,length) whitepine.data # what is the difference in means? x.bar.north<-mean(length.north) x.bar.south<-mean(length.south) # form our test statistic t0 = difference in sample means t.0<-x.bar.south-x.bar.north t.0 # should be 1.96 # Create a null distribution for testing. num.samples<-999 # create a storage vector for differences in means null.dist<-rep(NA,num.samples) # create a loop to generate 999 samples from the null distribution. for(i in 1:num.samples){ # randomly mix up the lengths scramble<-sample(length) # calculate mean for the "south" aspect data fake.south<-mean(scramble[1:15]) # calculate mean for the "north" aspect data fake.north<-mean(scramble[16:30]) # calculate difference in sample means and store it null.dist[i]<-fake.south-fake.north } # combine observed difference in means with random # samples to create 1000 realizations full.null<-c(null.dist,t.0) hist(full.null,breaks=50) # mark off the observed value in red abline(v=t.0,col="red",lwd=2) # how many values from the null distribution are # greater than or equal to the observed value? length(full.null[full.null>=t.0]) # What is the p-value? p.value<-length(full.null[full.null>=t.0])/1000 p.value # let's mark off the rejection regions if we use # a level alpha=.05 test alpha<-.05 # threshold for lowest 2.5% of realizations low.threshold<-quantile(full.null,alpha/2) # threshold for highest 2.5% of realizations high.threshold<-quantile(full.null,1-alpha/2) abline(v=low.threshold,col="blue",lwd=2) abline(v=high.threshold,col="blue",lwd=2) #2. Get on top of the jargon. You should practice using the following # terms: Null and alternative hypotheses, alpha-level, power, p-value, # test statistic, null distribution, rejection region, two-sided test, decision rule # the phrase "under the null", Type I and II error, one-tailed and two-tailed tests. # There is nothing to turn in for this exercise, but you should spend some # time practicing with these terms. I recomment reading about them in # Casella and Berger (2000) to see how they are expressed in mathematical # terms and also in an elementary statistical text to see how they are # used in a more applied way. #3. Read about two-sample t-tests in any standard introductory statistics book, # e.g. Zar's _Biostatistical Analysis_ or Freund's _Modern Elementary Statistics_. # I recommend each student trying a different book and comparing what you # understand from the different texts. # # Conduct a two-sided hypothesis test at the alpha=.05 level # using a two-sample t-test for the white-pine data given above. # Do the calculations by hand and find your p-value using the pt() # command. Find your rejection threshold using the qt() command. # # Now run the following code using the t.test() function and the # data read in above. Do your results agree? whitepine.t.test<-t.test(length.north,length.south,alternative="two.sided", var.equal=T,conf.level=.95) whitepine.t.test # 4. Use R to illustrate some concept from this week's Stat 513 homework # in a way that is helpful to you. Be prepared to discuss your results # next week in class.