[R] ape comparative analysis query

Thu May 11 15:06:43 CEST 2006

Thank you very much, that makes things a lot clearer.

The issue with known roots and ancestors in my case is that these taxa
are actually molecules evolved from one another in the lab (with control
over the evolutionary process, though not the traits that emerge), so
the data on the root and intermediate taxa are as good as any of the
tips. 

To follow up your points:

On Thu, 2006-05-11 at 12:02 +1000, Simon Blomberg wrote:
> > 1) Was I misleading myself that my independent contrasts were valid in
> > the first place?
> >   
> I think partly. Independent contrasts were designed for data for tip 
> taxa only, not ancestral state data. One of the steps in the algorithm 
> requires a correction to account for the fact that ordinarily the values 
> for traits at nodes are estimated and not known, basically lengthening 
> the branch length to that node by a certain amount (See Felsenstein's 
> original paper). If you have known ancestral states, then you should not 
> make this correction.

I don't think the the branch lengthening for unknown ancestors is a
problem here- if, for instance one's looking at a contrast between node
B2 and C1 in the tree (in my original email, also below), the branch to
C1 would be extended because it's an internal node, but the amounts it
should extend by (looking at Felsenstein '85) is ij/(i+j) where i and j
are the branch lengths coming from the node, which will be zero when, as
in this case, one of the branches from the node has length zero. 

That's not to say that there's not some other issue- I was under the
impression (immediately I can only find it as an un-referenced assertion
in Halsey et al '06 Am Nat167:276-287) that when run very simply with a
Brownian model (gamma parameter of 1), gls should give the same result
as independent contrasts, which it doesn't seem to here.

> > 2) What is it, if anything, about the root taxon that causes this issue,
> > given that other taxa also have zero branch lengths?
> >   
> When converting a tree to a correlation matrix for use in GLS or GEE, 
> the Brownian motion assumption implies that the covariances among taxa 
> are represented by their shared branch length from the root, and the 
> height of each terminal taxon is equal to the variance of the trait for 
> that taxon. If the tree is ultrametric, then the variances are equal. In 
> your case, the root has zero covariance with each taxon, and zero 
> variance. Now, the error in the gls call is caused by a division by zero 
> in the calculation of the correlation matrix, because the root has zero 
> variance. try:
> 
> vcv.phylo(tree)
> vcv.phylo(tree, cor=TRUE)
> 

So, if I read that correctly, the model assumes that the expected
variance of the traits, not just their covariance, is directly
proportional to their distance from the root, so at the root there is no
variance so an undefined correlation. This seems a little odd. I guess I
could stop the variance in the root being zero by somehow adding a small
variance due to measurement error (which I could probably come up with a
number for)? 

It also seems odd that I can't get round this by putting in a root edge:
vcv.phylo(tree) 
gives exactly the same as:
tree$root.edge<-10
vcv.phylo(tree)

However, perhaps it would be better just to use a different model with a
more appropriate expected variance. Looking at corMartins(), which
depends on distance between taxa, not shared common history, it seems
the expected variance (i.e. where the distance between the taxa, tij is
zero) should equal the gamma parameter rather than the distance from the
root. That seems more sensible here, but I'm not convinced I want to
assume the sort of stabilising selection that was designed for (?).

> > 3) Is there any way of getting around this and including data on the
> > root taxon, or am I better off just dropping it (ultimately I want to
> > work with much larger trees (up to tens of thousands of taxa) where that
> > one piece of information will become relatively less important)
> >   
> I'm surprised that you have known values for the root data. Are you sure 
> that the root taxon is not actually an outgroup? If so, it will have 
> some branch length (variance), and the model should fit OK.
> 

In this particular case that does suggest a way around- I have several
lineages with different starting molecules which are essentially
un-related- I guess I can just make one big tree with an unknown root
and analyse that (the branch lengths from the root will necessarily be a
little arbitrary, but I can probably define something justifiable).
However, perhaps I'd still do better with corMartins()? (I guess I can
try both and look at AIC or similar?)

Many thanks for your help,

Chris

>library(ape)
> tree<-read.tree(text="((B1:3,(((D1:5,C1:0):5,B2:0):4,(B3:3,B4:4):0):0):0,A1:0):0;")
> plot(tree)

-- 
--------------------------------------------------------------
Dr Christopher Knight                      School of Chemistry
chris.knight at manchester.ac.uk     The University of Manchester
Tel: extn 64414                               Faraday Building
     +44 (0)161 3064414                              PO Box 88
Fax: +44 (0)161 3064556                       Sackville Street
                                            Manchester M60 1QD
 ` · . ,,><(((°>                                            UK