[R] Re: large survey data

David Scott d.scott at auckland.ac.nz
Wed Jul 11 23:59:42 CEST 2001

On 11 Jul 2001, Douglas Bates wrote:

> Micha? Bojanowski <bojanr at wp.pl> writes:
> > Recently I came across a problem. I have to analyze a large survey
> > data - something about 600 columns and 10000 rows (tab-delimited file
> > with names in the header). I was able do import the data into an
> > object, but there is no more memory left.
> >
> > Is there a way to import the data column by column? I have to analyze
> > the whole data, but only two variables at a time.
> You will probably need to do the data manipulation externally.
> Two possible solutions are to use a scripting language like python or
> perl or to store the data in a relational database like PostgreSQL or
> MySQL.  For data of this size I would recommend the relational
> database approach.
> R has packages to connect to PostgreSQL or to MySQL.
> If you want to use python instead the code is fairly easy to write.
> Extracting the first two fields (for which the index expression really
> is written 0:2, not 0:1 or 1:2 as one might expect), you could use
> #!/usr/bin/env python
> import string
> import fileinput
> for line in fileinput.input():
>     flds = string.split(line, "\t")
>     print string.join(flds[0:2], "\t")

If you are on a unix box, and you have a tab delimited file, 'cut' will
easily cut out fields from the file. To automate it, use a shell program
to produce all the pairs you want. That is a 1980's solution but it should
work just fine.

David Scott

David Scott     Department of Statistics
                Tamaki Campus
                The University of Auckland, PB 92019
                Auckland        NEW ZEALAND
Phone: +64 9 373 7599 ext 6830     Fax: +64 9 373 7000
Email:  d.scott at Auckland.ac.nz

President, New Zealand Statistical Association

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list