[R] Passing data to t.test in loop

Tue Dec 2 10:16:40 CET 2014

Dear All

First post so sorry for any breaches of etiquette.
I have a csv containing the results for a series of experiments which record the time taken for various sizes of iterations.

"run_id","size","time"
1,100,1.00
2,200,2.100
3,100,1.100
4,200,2.100
5,200,1.900
6,300,4.00
7,200,2.5
...

I read the data set, extract the results for each "size" and return various statistics.
The only problem is I would like to iterate over the distinct sizes to do a t.test
My code has a section commented #manual t.test but I have no luck with the attempt labelled #attempt to automate t.test
I'm assuming it's my attempt to pass the data as an argument to t.test()

Any pointers gratefully accepted but as I'm a learner hints rather than a solution are preferred.

Cheers Paul

getwd()
setwd("c:/work/R/experiment1")
# read raw experimental data from results file
data <- read.csv("data1.csv", header = TRUE)
data
#create a new dataframe which has space for a record for each unique size of experiment
# this is to collect collated statistics for each experiment

var_list <- c("num_obs", "size_run", "sample_mean","sample_var","std_dev","se")
var_list_length <- length(var_list)
num_experiments <- length(unique(data$size))
# create the dataframe
df = data.frame(matrix(vector(), num_experiments , var_list_length, dimnames=list(c(),var_list)), stringsAsFactors=F)
# it should be empty
df
# insert the experiment size
df$size_run <- unique(data$size)
# now it should have a single column filled
# using
df$size_run
df
# create a vector with the experiment sizes

for (i in df$size_run)
{
# calculate the sample_variance of observations on a particular size
df$sample_var[df$size == i] <- var(subset(data$time, data$size == i))

# calculate the mean of the returned values for all experiments of the same size
df$sample_mean[df$size == i] <- mean(subset(data$time, data$size ==i))

# calculate the number of observations on a particular size
df$num_obs[df$size == i] <- length(subset(data$time, data$size == i))

# calculate the sd of the data
df$std_dev[df$size == i] <- sd(subset(data$time, data$size == i))

# calculate the standard error
df$se[df$size == i] <- sd(subset(data$time, data$size ==i))/sqrt(length(subset(data$time, data$size ==i)))
}

df

#manual t.test
print("t.test for size  = 100")
t.test(subset(data$time, data$size == 100))

print("t.test for size  = 200")
t.test(subset(data$time, data$size == 200))

print("t.test for size  = 300")
t.test(subset(data$time, data$size == 300))

#attempt to automate t.test
for (i in df$size_run)
{
print(i)
a <- subset(data$time, data$size == i)
print(a)
t.test(a)
}

	[[alternative HTML version deleted]]