[BioC] EdgeR GLM
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Dec 10 08:10:14 CET 2011
Dear Shona,
Yes, age.L is the log2-fold-change for increasing age.
However your topTags table is testing for the linear and quadratic terms
together, so it is testing the significance of any differences between the
three age groups. I'd suggest you test whether the quadratic term is
required using
glmLRT(d, glmfit, coef=3)
I'd also generally recommend you go on to estimate the trended and tagwise
dispersions, not just the common dispersion.
Best wishes
Gordon
> Date: Thu, 8 Dec 2011 17:41:07 +0000
> From: "Wood, Shona" <Shona.Wood at liverpool.ac.uk>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] EdgeR GLM
>
> Hi,
>
> I am trying to look at gene expression changes with increasing age. I
> have three time points young, middle age and old. I have been using the
> following, ordered factors and coefficient in order to get a fold change
> and FDR for a linear change in gene expression with increasing age. Can
> you look over my code and see if I have done this right? Is age.L the
> fold change in gene expression with increasing age?
>
>> library(edgeR)
>> library(limma)
>> targets<-read.delim (file="coding_targets.txt")
>> targets$age<-factor(ordered (targets$age))
>> targets$sample<-factor(targets$sample)
>> targets
> X files age sample
> 1 p1 p1_coding_counts.txt a-young 1
> 2 p2 p2_coding_counts.txt a-young 2
> 3 p3 p3_coding_counts.txt a-young 3
> 4 p7 p7_coding_counts.txt b-middle 7
> 5 p8 p8_coding_counts.txt b-middle 8
> 6 p9 p9_coding_counts.txt b-middle 9
> 7 p4 p4_coding_counts.txt c-old 4
> 8 p5 p5_coding_counts.txt c-old 5
> 9 p6 p6_coding_counts.txt c-old 6
>
>> d<-readDGE(targets, skip=1, comment.char='#')
>> colnames(d)<-row.names(targets)
>> d<- calcNormFactors(d)
>> d
> An object of class "DGEList"
> $samples
> X files age sample lib.size norm.factors
> 1 p1 p1_coding_counts.txt a-young 1 3445622 0.9655724
> 2 p2 p2_coding_counts.txt a-young 2 2696547 0.9902573
> 3 p3 p3_coding_counts.txt a-young 3 3308099 0.9787044
> 4 p7 p7_coding_counts.txt b-middle 7 2503479 1.0584660
> 5 p8 p8_coding_counts.txt b-middle 8 2639127 1.0477893
> 6 p9 p9_coding_counts.txt b-middle 9 2696547 0.9902573
> 7 p4 p4_coding_counts.txt c-old 4 3037440 0.9778553
> 8 p5 p5_coding_counts.txt c-old 5 2647915 1.0144109
> 9 p6 p6_coding_counts.txt c-old 6 2475370 0.9809077
>
> $counts
> 1 2 3 4 5 6 7 8 9
> ENSRNOG00000000007 1287 1285 1041 788 968 1285 1092 1009 960
> ENSRNOG00000000008 0 0 0 0 1 0 0 1 0
> ENSRNOG00000000009 0 0 0 3 0 0 1 0 0
> ENSRNOG00000000010 38 405 50 18 105 405 372 42 282
> ENSRNOG00000000012 0 0 0 0 0 0 0 0 0
> 22932 more rows ...
>
>> d<-d[rowSums(d$counts)>9,]
>> design<- model.matrix(~ age, data = targets)
>> design
> (Intercept) age.L age.Q
> 1 1 -7.071068e-01 0.4082483
> 2 1 -7.071068e-01 0.4082483
> 3 1 -7.071068e-01 0.4082483
> 4 1 -7.850462e-17 -0.8164966
> 5 1 -7.850462e-17 -0.8164966
> 6 1 -7.850462e-17 -0.8164966
> 7 1 7.071068e-01 0.4082483
> 8 1 7.071068e-01 0.4082483
> 9 1 7.071068e-01 0.4082483
> attr(,"assign")
> [1] 0 1 1
> attr(,"contrasts")
> attr(,"contrasts")$age
> [1] "contr.poly"
>
>> d<-estimateGLMCommonDisp(d, design)
>> names(d)
> [1] "samples" "counts" "common.dispersion"
>> glmfit<- glmFit(d, design, dispersion=d$common.dispersion)
>> results<- glmLRT(d, glmfit, coef=c(2,3))
>> topTags(results)
> Coefficient: age.L age.Q
> logConc age.L age.Q LR P.Value
> ENSRNOG00000011672 -11.27122 -3.2445886 -1.9642177 109.53641 1.638591e-24
>
> Many Thanks
>
> Shona
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list