[R] Very slow read.table on Linux, compared to Win2000
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Jun 28 14:28:42 CEST 2006
<davidek at zla-ryba.cz> writes:
> Dear all,
>
> I read.table a 17MB tabulator separated table with 483 variables(mostly numeric) and 15000
> observations into R. This takes a few seconds with R 2.3.1 on windows 2000, but it takes
> several minutes on my Linux machine. The linux machine is Ubuntu 6.06, 256 MR RAM,
> Athlon 1600 processor. The windows hardware is better (Pentium 4, 512 RAM), but it
> shouldn't make such a difference.
>
> The strange thing is that even doing something with the data(say a histogram of a variable, or
> transforming
> integers into a factor) takes really long time on the linux box and the computer seems to work
> extensively with the hard disk.
> Could this be caused by swapping ? Can I increase the memory allocated to R somehow ?
> I have checked the manual, but the memory options allowed for linux don't seem to
> help me (I may be doing it wrong, though ...)
>
> The code I run:
>
> TBO <- read.table(file="TBO.dat",sep="\t",header=TRUE,dec=","); # this takes forever
> TBO$sexe<-factor(TBO$sexe,labels=c("man","vrouw")); # even this takes like 30 seconds, compared
> to nothing on Win2000
>
> I'd be grateful for any suggestions,
Almost surely, the fix is to insert more RAM chips. 256 MB leaves you
very little space for actual work these days, and a 17MB file will get
expanded to several times the original size during reading and data
manipulations. Using a lightweight window manager can help, but you
usually regret the switch for other reasons.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list