[R] qr with missing dependent variables

Richard Mott Richard.Mott at well.ox.ac.uk
Thu Dec 8 17:52:29 CET 2005

Dear R-help

We have a regression problem which could be solved elegantly if we could 
figure out how to get the R residuals() function to accept missing 
dependent variables.

We have ~20000 gene-expression vectors y, each being measured on the 
same set of individuals, but each having a small random number of 
missing values.

For each expression vector we wish to search across the genome looking 
for quantitative trait loci - ie chromosomal regions g where the local 
genetic structure, represented by the design matrix X(g), gives a 
significant linear regression relationship. Depending on the complexity 
of the genetic model being investigated, X(g) typically has either 7 or 
32 columns, i.e is of non-trivial size. the number of loci g to be 
investigated is ~13000, so we have to do 13000*20000 = 260,000,000 
multiple regressions. Therefore computational efficiency is important.

We thought of one way to do this: - for each design matrix g, compute 
the qr decomposition once, then work out the residual sum of squares for 
each of the expression phenotypes using residuals() on the qr object 
applied to the expression vector. That way would only need to do the 
hard part of the linear regression once.

The problem with this approach is the missing values, which are not 
allowed by residuals(). Unfortunatley we can't just eliminate all rows 
containing a missing value because we would throw away too much data.

Is there a way round this ? Can we set the missing values to 0 and then 
sort out the discrepancies in the residual SS? More generally, is it 
consistent to compute a qr decomposition including rows for which there 
are no dependent observations ?

As far as I can see, this problem has not been addressed in R-help, but 
my apologies if it has !


Richard Mott

Richard Mott       | Wellcome Trust Centre
tel 01865 287588   | for Human Genetics
fax 01865 287697   | Roosevelt Drive, Oxford OX3 7BN

More information about the R-help mailing list