[R] Problem with t-test

Tue Aug 6 21:45:46 CEST 2013

Hi Vivek,
I removed the rows with missing values and also duplicated rows.  Now, it looks like it is working.

x<-read.table("RP_matrix_FPKM_PGTvsPDGT.txt",header=T,sep="\t")
x1<- read.table("RP_plaise_FPKM_PGTvsPDGT.txt",header=T,sep="\t") 
str(x1)
#'data.frame':    19680 obs. of  6 variables:
# $ ID    : Factor w/ 19678 levels "XLOC_000001",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ PGT.1 : num  112.47 13.76 62.13 4.16 0 ...
# $ PGT.0 : num  118.83 14.88 94.29 3.49 0 ...
# $ PGT.2 : num  179.324 22.677 117.368 6.36 0.385 ...
# $ PDGT.0: num  301.154 39.165 242.685 9.119 0.126 ...
# $ PDGT.1: num  144.5 30 161.2 3.5 0 ...
 str(x)
#'data.frame':    28599 obs. of  6 variables:
# $ gene  : Factor w/ 28599 levels "XLOC_000001",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ PGT.1 : num  71.25 8.71 14.6 1.99 0 ...
# $ PGT.0 : num  68.36 8.16 9.75 2.4 0 ...
# $ PGT.2 : num  108.17 13.35 18.29 3.64 0 ...
# $ PDGT.0: num  195.01 24.76 40.59 5.61 0 ...
# $ PDGT.1: num  93.06 18.88 26.83 2.14 0 ...
 length(unique(x[,1]))
#[1] 28599
 length(unique(x1[,1]))
#[1] 19679
x2<- x1[-which(duplicated(x1[,1])),]
dim(x2)
#[1] 19679     6

x3<- na.omit(x2)
 dim(x3)
#[1] 19678     6

cl<-c(rep(0,3),rep(1,2))

origin<-c(rep(1,5))

library(RankProd)
RP.out <- RPadvance(x3[,-1],cl,origin,gene.names=as.character(x3[,1]),num.perm=200)

A.K.

________________________________
From: Vivek Das <vd4mmind at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Tuesday, August 6, 2013 9:38 AM
Subject: Re: Problem with t-test

No I have tried it again on other files and the error is not there it works fine.. its a new file I have created, I am sending you the script and the file which I am using, its a non fussy script I created and worked multiples times with other files, I am sending you 2 different input files where in one it works in the other it does not. With the files plaise its not working but with the other input file its working.

library(RankProd)

x<-read.table("RP_matrix_RF_PGTvsPDGT.txt",header=T,sep="\t")

cl<-c(rep(0,3),rep(1,2))

origin<-c(rep(1,5))

RP.out <- RPadvance(x[,-1],cl,origin,gene.names=x[,1],num.perm=200)

topGene(RP.out,cutoff = 0.1)
#plotRP(RP.out, cutoff = 0.1)

table=topGene(RP.out,cutoff=0.1,method="pfp")

t1<-table$Table1
t2<-table$Table2

ind1<-which(t1[,4]<0.1)

ind2<-which(t2[,4]<0.1)

up<-t1[ind1,]

down<-t2[ind2,]

degs<-rbind(up,down)

----------------------------------------------------------

Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy

emails: vivek.das at ieo.eu
            vchris_05 at yahoo.co.in
            vd4mmind at gmail.com

On Tue, Aug 6, 2013 at 3:17 PM, arun <smartpink111 at yahoo.com> wrote:

HI Vivek,
>I never used RankProd before.  So, can't guarantee if I can sort the problem.  But, you can send me the file and the script.  I will try it later.
>As you mentioned that RankProd worked before, is it on the same file or a different file.  If it is the latter, then try running it on that file and see if the error repeats.
>
>
>
>
>
>
>
>
>________________________________
>From: Vivek Das <vd4mmind at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Tuesday, August 6, 2013 9:09 AM
>
>Subject: Re: Problem with t-test
>
>
>
>Yes, I know this but am worried about the consistency of the data then as it will remove a lot of observations and so the results will not be good infact I tested it and am not getting p value as I expected. Anyways I am doing another test which is a RankProd package in R. I am encountering a problem here, I have used this package multiple number of times but have never faced this , do you have any idea when do we get the below error?
>
>Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘’ in rankprod. 
>
>
>I am not being able to understand the duplicate'row.names' option as these are gene location on the row with values of expression and the locations are duplicate more than 2-3 times , I have used such data frame earlier as well to compute the RankProd and they worked. But now I am getting some error. I can share the script and the file with you if you need as the pipeline for RankProd is very easy to execute.
>
>If you can give me some idea about the error it will be good.
>
>
>----------------------------------------------------------
>
>Vivek Das
>PhD Student in Computational Biology
>Giuseppe Testa's Lab
>European School of Molecular Medicine
>IFOM-IEO Campus
>Via Adamello, 16
>Milan, Italy
>
>emails: vivek.das at ieo.eu
>            vchris_05 at yahoo.co.in
>            vd4mmind at gmail.com
>
>
>
>On Tue, Aug 6, 2013 at 3:01 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Hi Vivek,
>>No problem.
>>?t.test
>>na.action: a function which indicates what should happen when the data
>>          contain ‘NA’s.  Defaults to ‘getOption("na.action")’.
>>
>>In my system,
>>
>>getOption("na.action")
>>#[1] "na.omit"
>>
>>
>>So, it removes the NA's by default and reduce the number of observations.
>>
>>
>>
>>________________________________
>>From: Vivek Das <vd4mmind at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Tuesday, August 6, 2013 8:52 AM
>>Subject: Re: Problem with t-test
>>
>>
>>
>>
>>yes actually I just tested few conditions and found that there are NaN values and so this problem is happening.. I cannot proceed with this test and have to change the pipeline with some other R package for my analysis. Thanks for your input.
>>
>>
>>----------------------------------------------------------
>>
>>Vivek Das
>>PhD Student in Computational Biology
>>Giuseppe Testa's Lab
>>European School of Molecular Medicine
>>IFOM-IEO Campus
>>Via Adamello, 16
>>Milan, Italy
>>
>>emails: vivek.das at ieo.eu
>>            vchris_05 at yahoo.co.in
>>            vd4mmind at gmail.com
>>
>>
>>
>>On Tue, Aug 6, 2013 at 2:42 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>HI Vivek,
>>>It looks like the number of observations in each test are 2 (PDGT) and 3 respectively.  It could be possible that some of the entries are NA, and therefore, the observation number is low to produce the error.  It's just a guess as this is not a reproducible example. 
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>________________________________
>>>From: Vivek Das <vd4mmind at gmail.com>
>>>To: arun <smartpink111 at yahoo.com>
>>>Sent: Tuesday, August 6, 2013 4:29 AM
>>>Subject: Problem with t-test
>>>
>>>
>>>
>>>
>>>data<- read.table("/Users/vdas/Documents/RNA-Seq_Smaples_Udine_08032013/GBM_29052013/UD_RP_25072013/filteredFPKM_matrix.txt",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>> head(data)
>>>           ID Sample_118p Sample_118rp3 Sample_118rz Sample_118z Sample_132p1 Sample_132p2 Sample_132p3 Sample_132rp1 Sample_132rp3 Sample_132rp4 Sample_132rz1
>>>1 XLOC_000001   112.47400     166.17900     81.52270   44.778700   301.154000    118.82700    144.47000    170.407000    406.899000    189.131000     97.183400
>>>2 XLOC_000002    13.76090      17.76730     11.91100    6.290600    39.164800     14.88320     30.02390     42.717200     88.814600     23.310500     15.440800
>>>3 XLOC_000003    62.13010     102.16200    748.31300  273.520000   242.685000     94.28880    161.22800    225.243000    497.011000    160.376000    896.121000
>>>4 XLOC_000004     4.16261       5.71899      4.55739    2.486340     9.119170      3.49082      3.49611      4.975020     12.598600      6.387530      4.949830
>>>5 XLOC_000010     0.00000       0.00000      0.29217    0.270976     0.126338      0.00000      0.00000      0.464747      0.596984      0.199851      0.892021
>>>6 XLOC_000011     3.59279       9.09855      2.57678    1.593230    16.936300      4.47379      6.87020      6.922430     21.762200      7.461560      4.420570
>>>  Sample_132rz2 Sample_132z Sample_141p1 Sample_141p2 Sample_141p3 Sample_141p4 Sample_141z Sample_183p1 Sample_183p2 Sample_183p3 Sample_183z Sample_91p
>>>1     72.739000   386.81000     86.96600    85.703100     53.01000    158.31400   145.84300   219.667000   240.231000    127.42000    78.58140 179.324000
>>>2      7.475080    40.35110     12.61660    12.737300     10.96970     28.26550    22.65940    27.217700    27.832800     18.21300     7.88030  22.676900
>>>3    465.496000  2330.57000     72.35270    73.962600     71.36860    203.20100  1048.81000   172.241000   183.260000     98.11680   473.46400 117.368000
>>>4      4.818980    18.22750      3.22435     2.074460      1.97518      4.05074     8.86568     5.118540     6.414700      4.65076     4.37495   6.360260
>>>5      0.863341     2.91729      0.00000     0.226087      0.00000      0.00000     2.16320     0.356073     0.655415      0.00000     1.15980   0.385098
>>>6      3.341780    15.43730      5.21231     3.854980      2.53136      6.18972     4.83315     6.908790    12.524200      5.96035     3.40959   8.604070
>>>  Sample_91rp1 Sample_91rp3 Sample_91rp4 Sample_91rz
>>>1   297.395000   203.550000    251.53800  110.898000
>>>2    28.945600    18.749300     22.76070   15.679000
>>>3   174.073000   119.605000    122.66100  754.735000
>>>4     9.227550     6.656250      8.82010    7.172210
>>>5     0.718336     0.187613      0.34955    0.498937
>>>6    15.908700     8.162870      9.35126    6.013790
>>>> PGT<-cbind(data[,2],data[,7],data[,24])
>>>> head(PGT)
>>>          [,1]      [,2]       [,3]
>>>[1,] 112.47400 118.82700 179.324000
>>>[2,]  13.76090  14.88320  22.676900
>>>[3,]  62.13010  94.28880 117.368000
>>>[4,]   4.16261   3.49082   6.360260
>>>[5,]   0.00000   0.00000   0.385098
>>>[6,]   3.59279   4.47379   8.604070
>>>> PDGT<-cbind(data[,6],data[,8])
>>>
>>>pval2<-NULL
>>>> for(i in 1:length(PGT[,1])){
>>>+ pval2<-c(pval2,t.test(as.numeric(PDGT[i,]),as.numeric(PGT[i,]))$p.value)
>>>+ print(i)
>>>+ }
>>>
>>>Error:
>>>Error in t.test.default(as.numeric(PDGT[i, ]), as.numeric(PGT[i, ])) : 
>>>  not enough 'x' observations
>>>
>>>I cannot understand what went wrong with the vector . Can you please tell me? I am not being able to figure it out 
>>>----------------------------------------------------------
>>>
>>>Vivek Das
>>>PhD Student in Computational Biology
>>>Giuseppe Testa's Lab
>>>European School of Molecular Medicine
>>>IFOM-IEO Campus
>>>Via Adamello, 16
>>>Milan, Italy
>>>
>>>emails: vivek.das at ieo.eu
>>>            vchris_05 at yahoo.co.in
>>>            vd4mmind at gmail.com
>>>
>>
>