[BioC] GWASTools: quasi-/perfect linear separation

Stephanie M. Gogarten sdmorris at u.washington.edu
Wed Sep 10 20:20:53 CEST 2014


Hi Danica,

assocTestRegression will return an error code for SNPs that are 
monomorphic in either cases or controls, but it seems that you have 
found a case that we did not test for.

I consulted with Matt Conomos, who wrote this function, and he said the 
following:

Since AA has a count of 0 in cases in the example given, and an error 
was not returned, I would assume that both AB and BB are non-zero in 
cases, but it would be nice to confirm this.  Also, it would be nice to 
know which allele is the minor allele (the function returns this), since 
a recessive model is being fit.  If the A allele is the minor allele, 
then the recessive model collapses the AB and BB classes, and this could 
lead to the separability issue.  I may need to add in a check for this 
when fitting dominant or recessive models.

Could you please provide the full output of assocTestRegression for the 
SNPs where you see this problem?  Also, include the output of 
sessionInfo() so we know which version of GWASTools you are using.

Stephanie

On 9/4/14, 3:56 AM, Danica [guest] wrote:
> This is not really a question but more of a warning to other users.
>
> I have performed a regression analysis using the assocTestRegression function under three different models (dominant,recessive,additive). My data set contains ~3 million markers which have been filtered so that only SNPs with >= MAF of 10% are included. Please note that this filter was applied with both cases and controls as one big data set (i.e. I did not perform the filter for cases and controls separately).
>
> Once I have examined the results of the association under the recessive model, I noticed very large beta estimates (8-9). When I looked at the genotype counts, I realised that this was due to the fact that in some SNPs, there is perfect linear separation. In other words, the AA genotype has a count of 0 in cases and a count of 170 in controls, which leads to inflated estimates.
>
> I was surprised to find that the function does not throw a warning for this or drops the analysis for SNPs where this occurs.
>
> Regards,
> Danica
>
>
>
>   -- output of sessionInfo():
>
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>



More information about the Bioconductor mailing list