[R] Large Dataset

Edwin Sendjaja edwin7 at web.de
Tue Jan 6 18:48:06 CET 2009


Bellow, you  can see the R data. 

But this stucks even in first line (read.table..). 

I dont know how to calculate this and write the result into a new table.


Edwin





data <- read.table("test.data")

data <- subset(data, (data$Zusatz!="60") & (data$Zusatz!="0"))



 list(EndpointKeepAliveTimeOutIntervalLimit,"", PE_ID,"", Registrar, Region, 
RelTime)))
split.data <- with(data, split(Zusatz, 
list(EndpointKeepAliveTimeOutIntervalLimit, PE_ID, Run)))

#Find the min, max and std dev for each element in the resulting list
mins <- sapply(split.data, min)
maxs <- sapply(split.data, max)
devs <- sapply(split.data, sd)
mean <- sapply(split.data, mean)


name.list <- strsplit(names(split.data), "\\.")

endpointkeepalivetimeoutintervallimit <- as.numeric(sapply(name.list, 
function(x) x[[1]]))
pe_id <- sapply(name.list, function(x) x[[2]])
run <- sapply(name.list, function(x) x[[3]])

#Now construct a new data frame from these values
output <-
data.frame(EndpointKeepAliveTimeOutIntervalLimit=endpointkeepalivetimeoutintervallimit, 
PE_ID=pe_id, Run=run, Min=mins, Max=maxs, Standardabweichung=devs, Mean=mean)


output <- subset(output, (output$Min !="Inf"))
output_sort<-sort(output$EndpointKeepAliveTimeOutIntervalLimit)

output<-output[order(output$EndpointKeepAliveTimeOutIntervalLimit, 
partial=order(output$PE_ID)),]
rownames(output) <- seq(length=nrow(output))


write.table(output,file=Sys.getenv("filepdf"), quote = FALSE)





> For the mean, min, max and standard deviance (deviation I suppose) you
> don't need to store all data in the memory, you can calculate them
> incrementally. Read the file line by line (if it is a text file).
>
> G.
>
> On Tue, Jan 6, 2009 at 6:10 PM, Edwin Sendjaja <edwin7 at web.de> wrote:
> > Hi Ben,
> >
> > Using colClasses doensnt improve the performace much.
> >
> > With the data, I will calculate the mean, min, max, and standard
> > deviance.
> >
> > I have also failed to import the data in a Mysql Database. I dont have
> > much knowledge in Mysql.
> >
> > Edwin
> >
> >> Edwin Sendjaja <edwin7 <at> web.de> writes:
> >> > Hi Simon,
> >> >
> >> > My RAM is only 3.2 GB (actually it should be 4 GB, but my Motherboard
> >> > doesnt support it.
> >> >
> >> > R use almost of all my RAM and half of my swap. I think memory.limit
> >> > will not solve my problem.  It seems that I need  RAM.
> >> >
> >> > Unfortunately, I can't buy more RAM.
> >> >
> >> > Why R is slow reading big data set?
> >> >
> >> > Edwin
> >>
> >>   Start with FAQ 7.28 ,
> >> http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-is-read_002etable_0028_
> >>002 9-so-inefficient_003f
> >>
> >>   However, I think you're going to have much bigger problems
> >> if you have a 3.1G data set and a total of 3.2G of RAM: what do
> >> you expect to be able to do with this data set once you've read
> >> it in?  Have you considered storing it in a database and accessing
> >> just the bits you need at any one time?
> >>
> >>   Ben Bolker
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html and provide commented,
> >> minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.




More information about the R-help mailing list