[R] Request for aid in first R script
S Ellison
S@Elli@on @ending from LGCGroup@com
Mon Nov 19 12:46:00 CET 2018
Pointers inline below:
> > Since I'm a newbie on R, I was wondering if you could help me to achieve a
> > small project that I think it's possible with this project (I cant seem to
> > find a similar tool)
> >
> > I have a data file with about 2000 value lines, organized like this:
> >
> > x;y;z;j;
> > ...
> >
> > I want to find diferent correlations (linear regression with
> > Levenberg–Marquardt or least squares) between the x values and a y or z
> > pair. For instance, between x and y.
> >
> > So, what I'm trying to do is:
> >
> > 1) Load the file (is there a limit on the load size? If yes, can I load it
> > in sequence by parts?)
See ?read.table and note that you can define a separator. Using read.table() with sep=";" should work
Load limits are memory size; I have read 800,000 lines on a 4Gb system
> > 2) Define 100 sets of 20 values each (also sequence, from x1 to xn: first
> > from x1 to x20, next from x21 to x41, etc.) or process one set at the time
> > in case of file limits in 1)
You can say something like
mydata[i:(i+20), ]
to get row-wise slices of your data, but an R user would perhaps consider setting up an ancillary variable using
mydata$chunks <- gl(100,20)
and use a variant of aggregate() or ddply to apply a function to each subset
> > 3) Define a fitting function
er... anything you can write, either as an expression or a function.
> > 4) Use the same function model to find the best fit for each set
Look at, for example, lm for linear models (including polynomials), nls or nlm for non-linear models, and a decent book on R for a much, much, much wider range, including splines, generalised additive models, generalised linear models, mixed effects models (linear and otherwise) ...
> > 5) Save in a file, the coefficients of those fits.
Something like sapply or ddply should be able to give you a table of coefficients, especially if you write a wrapper function like
mywrap <- function(x) coef( nls(y~fitfun))
to return a vector of coefficients from a chunk x
> > Can this be done accurately with R?
Yes; R has well-characterised numerically stable core functions, which is more than can be said for most spreadsheets.
> > It would save me a lot of programming.
You'll still have to do that, but doing it in R will be a lot faster than C
> > The files will soon have about 1
> > million lines, which is a lot to process.
If you can’t load it all at once, you can use read.table with start and end rows.
or you can puch the whole lot to a database and use any of R's database packages to read from that; Rmysql and the like.
*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If
you have received this message in error, please notify the sender
immediately via +44(0)20 8943 7000 or notify postmaster using lgcgroup.com
and delete this message and any copies from your computer and network.
LGC Limited. Registered in England 2991879.
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
More information about the R-help
mailing list