[R] Problem with t-test
arun
smartpink111 at yahoo.com
Tue Aug 6 21:45:46 CEST 2013
Hi Vivek,
I removed the rows with missing values and also duplicated rows. Now, it looks like it is working.
x<-read.table("RP_matrix_FPKM_PGTvsPDGT.txt",header=T,sep="\t")
x1<- read.table("RP_plaise_FPKM_PGTvsPDGT.txt",header=T,sep="\t")
str(x1)
#'data.frame': 19680 obs. of 6 variables:
# $ ID : Factor w/ 19678 levels "XLOC_000001",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ PGT.1 : num 112.47 13.76 62.13 4.16 0 ...
# $ PGT.0 : num 118.83 14.88 94.29 3.49 0 ...
# $ PGT.2 : num 179.324 22.677 117.368 6.36 0.385 ...
# $ PDGT.0: num 301.154 39.165 242.685 9.119 0.126 ...
# $ PDGT.1: num 144.5 30 161.2 3.5 0 ...
str(x)
#'data.frame': 28599 obs. of 6 variables:
# $ gene : Factor w/ 28599 levels "XLOC_000001",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ PGT.1 : num 71.25 8.71 14.6 1.99 0 ...
# $ PGT.0 : num 68.36 8.16 9.75 2.4 0 ...
# $ PGT.2 : num 108.17 13.35 18.29 3.64 0 ...
# $ PDGT.0: num 195.01 24.76 40.59 5.61 0 ...
# $ PDGT.1: num 93.06 18.88 26.83 2.14 0 ...
length(unique(x[,1]))
#[1] 28599
length(unique(x1[,1]))
#[1] 19679
x2<- x1[-which(duplicated(x1[,1])),]
dim(x2)
#[1] 19679 6
x3<- na.omit(x2)
dim(x3)
#[1] 19678 6
cl<-c(rep(0,3),rep(1,2))
origin<-c(rep(1,5))
library(RankProd)
RP.out <- RPadvance(x3[,-1],cl,origin,gene.names=as.character(x3[,1]),num.perm=200)
A.K.
________________________________
From: Vivek Das <vd4mmind at gmail.com>
To: arun <smartpink111 at yahoo.com>
Sent: Tuesday, August 6, 2013 9:38 AM
Subject: Re: Problem with t-test
No I have tried it again on other files and the error is not there it works fine.. its a new file I have created, I am sending you the script and the file which I am using, its a non fussy script I created and worked multiples times with other files, I am sending you 2 different input files where in one it works in the other it does not. With the files plaise its not working but with the other input file its working.
library(RankProd)
x<-read.table("RP_matrix_RF_PGTvsPDGT.txt",header=T,sep="\t")
cl<-c(rep(0,3),rep(1,2))
origin<-c(rep(1,5))
RP.out <- RPadvance(x[,-1],cl,origin,gene.names=x[,1],num.perm=200)
topGene(RP.out,cutoff = 0.1)
#plotRP(RP.out, cutoff = 0.1)
table=topGene(RP.out,cutoff=0.1,method="pfp")
t1<-table$Table1
t2<-table$Table2
ind1<-which(t1[,4]<0.1)
ind2<-which(t2[,4]<0.1)
up<-t1[ind1,]
down<-t2[ind2,]
degs<-rbind(up,down)
----------------------------------------------------------
Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy
emails: vivek.das at ieo.eu
vchris_05 at yahoo.co.in
vd4mmind at gmail.com
On Tue, Aug 6, 2013 at 3:17 PM, arun <smartpink111 at yahoo.com> wrote:
HI Vivek,
>I never used RankProd before. So, can't guarantee if I can sort the problem. But, you can send me the file and the script. I will try it later.
>As you mentioned that RankProd worked before, is it on the same file or a different file. If it is the latter, then try running it on that file and see if the error repeats.
>
>
>
>
>
>
>
>
>________________________________
>From: Vivek Das <vd4mmind at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Tuesday, August 6, 2013 9:09 AM
>
>Subject: Re: Problem with t-test
>
>
>
>Yes, I know this but am worried about the consistency of the data then as it will remove a lot of observations and so the results will not be good infact I tested it and am not getting p value as I expected. Anyways I am doing another test which is a RankProd package in R. I am encountering a problem here, I have used this package multiple number of times but have never faced this , do you have any idea when do we get the below error?
>
>Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘’ in rankprod.
>
>
>I am not being able to understand the duplicate'row.names' option as these are gene location on the row with values of expression and the locations are duplicate more than 2-3 times , I have used such data frame earlier as well to compute the RankProd and they worked. But now I am getting some error. I can share the script and the file with you if you need as the pipeline for RankProd is very easy to execute.
>
>If you can give me some idea about the error it will be good.
>
>
>----------------------------------------------------------
>
>Vivek Das
>PhD Student in Computational Biology
>Giuseppe Testa's Lab
>European School of Molecular Medicine
>IFOM-IEO Campus
>Via Adamello, 16
>Milan, Italy
>
>emails: vivek.das at ieo.eu
> vchris_05 at yahoo.co.in
> vd4mmind at gmail.com
>
>
>
>On Tue, Aug 6, 2013 at 3:01 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Hi Vivek,
>>No problem.
>>?t.test
>>na.action: a function which indicates what should happen when the data
>> contain ‘NA’s. Defaults to ‘getOption("na.action")’.
>>
>>In my system,
>>
>>getOption("na.action")
>>#[1] "na.omit"
>>
>>
>>So, it removes the NA's by default and reduce the number of observations.
>>
>>
>>
>>________________________________
>>From: Vivek Das <vd4mmind at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Tuesday, August 6, 2013 8:52 AM
>>Subject: Re: Problem with t-test
>>
>>
>>
>>
>>yes actually I just tested few conditions and found that there are NaN values and so this problem is happening.. I cannot proceed with this test and have to change the pipeline with some other R package for my analysis. Thanks for your input.
>>
>>
>>----------------------------------------------------------
>>
>>Vivek Das
>>PhD Student in Computational Biology
>>Giuseppe Testa's Lab
>>European School of Molecular Medicine
>>IFOM-IEO Campus
>>Via Adamello, 16
>>Milan, Italy
>>
>>emails: vivek.das at ieo.eu
>> vchris_05 at yahoo.co.in
>> vd4mmind at gmail.com
>>
>>
>>
>>On Tue, Aug 6, 2013 at 2:42 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>HI Vivek,
>>>It looks like the number of observations in each test are 2 (PDGT) and 3 respectively. It could be possible that some of the entries are NA, and therefore, the observation number is low to produce the error. It's just a guess as this is not a reproducible example.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>________________________________
>>>From: Vivek Das <vd4mmind at gmail.com>
>>>To: arun <smartpink111 at yahoo.com>
>>>Sent: Tuesday, August 6, 2013 4:29 AM
>>>Subject: Problem with t-test
>>>
>>>
>>>
>>>
>>>data<- read.table("/Users/vdas/Documents/RNA-Seq_Smaples_Udine_08032013/GBM_29052013/UD_RP_25072013/filteredFPKM_matrix.txt",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>> head(data)
>>> ID Sample_118p Sample_118rp3 Sample_118rz Sample_118z Sample_132p1 Sample_132p2 Sample_132p3 Sample_132rp1 Sample_132rp3 Sample_132rp4 Sample_132rz1
>>>1 XLOC_000001 112.47400 166.17900 81.52270 44.778700 301.154000 118.82700 144.47000 170.407000 406.899000 189.131000 97.183400
>>>2 XLOC_000002 13.76090 17.76730 11.91100 6.290600 39.164800 14.88320 30.02390 42.717200 88.814600 23.310500 15.440800
>>>3 XLOC_000003 62.13010 102.16200 748.31300 273.520000 242.685000 94.28880 161.22800 225.243000 497.011000 160.376000 896.121000
>>>4 XLOC_000004 4.16261 5.71899 4.55739 2.486340 9.119170 3.49082 3.49611 4.975020 12.598600 6.387530 4.949830
>>>5 XLOC_000010 0.00000 0.00000 0.29217 0.270976 0.126338 0.00000 0.00000 0.464747 0.596984 0.199851 0.892021
>>>6 XLOC_000011 3.59279 9.09855 2.57678 1.593230 16.936300 4.47379 6.87020 6.922430 21.762200 7.461560 4.420570
>>> Sample_132rz2 Sample_132z Sample_141p1 Sample_141p2 Sample_141p3 Sample_141p4 Sample_141z Sample_183p1 Sample_183p2 Sample_183p3 Sample_183z Sample_91p
>>>1 72.739000 386.81000 86.96600 85.703100 53.01000 158.31400 145.84300 219.667000 240.231000 127.42000 78.58140 179.324000
>>>2 7.475080 40.35110 12.61660 12.737300 10.96970 28.26550 22.65940 27.217700 27.832800 18.21300 7.88030 22.676900
>>>3 465.496000 2330.57000 72.35270 73.962600 71.36860 203.20100 1048.81000 172.241000 183.260000 98.11680 473.46400 117.368000
>>>4 4.818980 18.22750 3.22435 2.074460 1.97518 4.05074 8.86568 5.118540 6.414700 4.65076 4.37495 6.360260
>>>5 0.863341 2.91729 0.00000 0.226087 0.00000 0.00000 2.16320 0.356073 0.655415 0.00000 1.15980 0.385098
>>>6 3.341780 15.43730 5.21231 3.854980 2.53136 6.18972 4.83315 6.908790 12.524200 5.96035 3.40959 8.604070
>>> Sample_91rp1 Sample_91rp3 Sample_91rp4 Sample_91rz
>>>1 297.395000 203.550000 251.53800 110.898000
>>>2 28.945600 18.749300 22.76070 15.679000
>>>3 174.073000 119.605000 122.66100 754.735000
>>>4 9.227550 6.656250 8.82010 7.172210
>>>5 0.718336 0.187613 0.34955 0.498937
>>>6 15.908700 8.162870 9.35126 6.013790
>>>> PGT<-cbind(data[,2],data[,7],data[,24])
>>>> head(PGT)
>>> [,1] [,2] [,3]
>>>[1,] 112.47400 118.82700 179.324000
>>>[2,] 13.76090 14.88320 22.676900
>>>[3,] 62.13010 94.28880 117.368000
>>>[4,] 4.16261 3.49082 6.360260
>>>[5,] 0.00000 0.00000 0.385098
>>>[6,] 3.59279 4.47379 8.604070
>>>> PDGT<-cbind(data[,6],data[,8])
>>>
>>>pval2<-NULL
>>>> for(i in 1:length(PGT[,1])){
>>>+ pval2<-c(pval2,t.test(as.numeric(PDGT[i,]),as.numeric(PGT[i,]))$p.value)
>>>+ print(i)
>>>+ }
>>>
>>>Error:
>>>Error in t.test.default(as.numeric(PDGT[i, ]), as.numeric(PGT[i, ])) :
>>> not enough 'x' observations
>>>
>>>I cannot understand what went wrong with the vector . Can you please tell me? I am not being able to figure it out
>>>----------------------------------------------------------
>>>
>>>Vivek Das
>>>PhD Student in Computational Biology
>>>Giuseppe Testa's Lab
>>>European School of Molecular Medicine
>>>IFOM-IEO Campus
>>>Via Adamello, 16
>>>Milan, Italy
>>>
>>>emails: vivek.das at ieo.eu
>>> vchris_05 at yahoo.co.in
>>> vd4mmind at gmail.com
>>>
>>
>
More information about the R-help
mailing list