[R] reading data
arun
smartpink111 at yahoo.com
Tue Feb 26 19:51:49 CET 2013
Hi,
res10[is.na(res10)] <- 0
apply(res10[,c(4,6)],1,t.test)
#Error in t.test.default(newX[, i], ...) : data are essentially constant
To overcome this, something like:
t.test.p.value <- function(...) {
obj<-try(t.test(...), silent=TRUE)
if (is(obj, "try-error")) return(NA) else return(obj$p.value)
}
In this particular case, you have only two groups, 'c' and 't'
#Making up one more group
res11<- res10
set.seed(45)
res11$d1<- sample(0:4, 22,replace=TRUE)
resNew<-do.call(cbind,lapply(split(names(res11)[4:7],gsub("[0-9]","",names(res11)[4:7])), function(i) {x<-if(ncol(res11[i])>1) rowSums(res11[i]) else res11[i]; colnames(x)<-NULL;x}))
indx<-combn(names(resNew),2)
resPval<-do.call(cbind,lapply(seq_len(ncol(indx)),function(i) {x<-as.data.frame(apply(resNew[,indx[,i]],1,t.test.p.value)); colnames(x)<-paste("Pvalue",paste(indx[,i],collapse=""),sep="_");x}))
resF<-cbind(res11,resPval)
head(resF,3) #####head
# Seq Mod z c2 c3 t2 d1 Pvalue_cd Pvalue_ct Pvalue_dt
#1 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/ 2 0 0 1 3 0.5 0.5 0.2951672
#2 aAAAAAAAAAAASSPVGVGQR 1-n_acPro/ 2 0 0 1 1 0.5 0.5 NA
#3 aAAAAAAAAAGAAGGR 1-n_acPro/ 2 1 0 1 1 NA NA NA
A.K.
___________________________
From: Vera Costa <veracosta.rt at gmail.com>
To: arun <smartpink111 at yahoo.com>
Sent: Tuesday, February 26, 2013 12:05 PM
Subject: Re: reading data
I think, I didn't understand your question...
By each row, I need to "compare" groups. Groups is c, t, a,....
I'm thinking...but we can sum by group and apply a t test to compare means...
2013/2/26 arun <smartpink111 at yahoo.com>
Just a doubt.
>
>If you want to compare by rows:
>then did you mean to add the numbers in rows of c2 and c3 and compare it with t2 (if t has 1 more group, add t2 and t3)
>
>A.K.
>
>
>
>
>
>________________________________
>From: Vera Costa <veracosta.rt at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Tuesday, February 26, 2013 10:52 AM
>Subject: Re: reading data
>
>
>May I ask a new problem (continuation of this)? (if you could help a little bit more)
>
>I insert the row
>
>res10[is.na(res10)] <- 0
>
>in your code. Atfer I need to apply a statistical test (for example chisq.test) to all rows to compare groups.
>I will try to explain correctly.
>For example, for the first row, I need a new column with the pvalues that compare groups (in this case c and t - but we can have more)
> Seq Mod z c2 c3 t2 p-value
>1 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/ 2 0 0 1 """"""
>
>
>
>and after remove all non-significant (with pvalues<0.05).
>
>Thank you again (and I think this problem finish here (I think:-)).
>
>Vera
>
>
>
>2013/2/26 arun <smartpink111 at yahoo.com>
>
>No problem.
>>Arun
>>
>>
>>
>>
>>________________________________
>> From: Vera Costa <veracosta.rt at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Tuesday, February 26, 2013 9:47 AM
>>Subject: Re: reading data
>>
>>
>>Ah ok, no problem :-)
>>
>>I'm seeing the code.
>>Thank you very much for your big helps
>>
>>
>>
>>2013/2/26 arun <smartpink111 at yahoo.com>
>>
>>I used head(res10,3)
>>> res10
>>> Seq Mod z c2 c3 t2
>>>1 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/ 2 NA NA 1
>>>2 aAAAAAAAAAAASSPVGVGQR 1-n_acPro/ 2 NA NA 1
>>>3 aAAAAAAAAAGAAGGR 1-n_acPro/ 2 1 NA 1
>>>4 AAAAAAALQAK 2 NA 1 1
>>>5 aAAAAAGAGPEMVR 1-n_acPro/ 2 2 NA 2
>>>6 aAAAAEQQQFYLLLGNLLSPDNVVR 1-<_Carbamoylation/ 2 NA NA 1
>>>7 aAAAAEQQQFYLLLGNLLSPDNVVR 1-<_Carbamoylation/ 3 NA NA 1
>>>8 aAAAAEQQQFYLLLGNLLSPDNVVR 1-n_acPro/ 2 1 NA NA
>>>9 aAAAAEQQQFYLLLGNLLSPDNVVR 1-n_acPro/ 3 2 2 1
>>>10 AAAAAPGTAEK 2 1 NA NA
>>>11 aAAAASAPQQLSDEELFSQLR 1-n_acPro/ 2 NA NA 1
>>>12 aAAAAVGNAVPCGAR 1-n_acPro/ 2 1 1 1
>>>13 AAAAAWEEPSSGNGTAR 2 1 1 1
>>>14 aAAAELSLLEK 1-n_acPro/ 1 NA NA 1
>>>15 AAAAEVLGLILR 2 1 1 1
>>>16 aAAAGAAAAAAAEGEAPAEMGALLLEK 1-n_acPro/ 3 1 1 1
>>>17 aAAAGGGGPGTAVGATGSGIAAAAAGLAVYR 1-<_Carbamoylation/ 3 NA 1 NA
>>>18 aAAAGGGGPGTAVGATGSGIAAAAAGLAVYR 1-n_acPro/ 3 NA NA 1
>>>19 aAAANSGSSLPLFDCPTWAGKPPPGLHLDVVK 1-n_acPro/ 3 NA NA 1
>>>20 aAAAVGAGHGAGGPGAASSSGGAR 1-n_acPro/ 2 1 1 NA
>>>21 aAAAVGAGHGAGGPGAASSSGGAR 1-n_acPro/ 3 NA 1 NA
>>>22 aAADGDDSLYPIAVLIDELR 1-n_acPro/ 2 NA 1 NA
>>>
>>>In the folder you gave to me, there was only a1, which was deleted according to what you said.
>>>
>>>res4[[1]]
>>>$a1
>>> Seq Mod z spec
>>>1 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/ 2 11833
>>>2 aAAAAAAAAAAASSPVGVGQR 1-n_acPro/ 2 11833
>>>3 aAAAAAAAAAGAAGGR 1-n_acPro/ 2 13103
>>>4 AAAAAAALQAK 2 3084
>>>5 aAAAAAGAGPEMVR 1-n_acPro/ 2 9646,9821
>>>6 aAAAAEQQQFYLLLGNLLSPDNVVR 1-<_Carbamoylation/ 2 33650
>>>7 aAAAAEQQQFYLLLGNLLSPDNVVR 1-<_Carbamoylation/ 3 33607
>>>9 aAAAAEQQQFYLLLGNLLSPDNVVR 1-n_acPro/ 3 33769
>>>11 aAAAASAPQQLSDEELFSQLR 1-n_acPro/ 2 20602
>>>12 aAAAAVGNAVPCGAR 1-n_acPro/ 2 10018
>>>13 AAAAAWEEPSSGNGTAR 2 5576
>>>14 aAAAELSLLEK 1-n_acPro/ 1 19662
>>>16 AAAAEVLGLILR 2 22857
>>>17 aAAAGAAAAAAAEGEAPAEMGALLLEK 1-n_acPro/ 3 26060
>>>18 aAAAGGGGPGTAVGATGSGIAAAAAGLAVYR 1-n_acPro/ 3 21479
>>>19 aAAANSGSSLPLFDCPTWAGKPPPGLHLDVVK 1-n_acPro/ 3 21159
>>>
>>>
>>>
>>>A.K.
>>>
>>>
>>>
>>>________________________________
>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>To: arun <smartpink111 at yahoo.com>
>>>Sent: Tuesday, February 26, 2013 9:30 AM
>>>Subject: Re: reading data
>>>
>>>
>>>Hi, thank you
>>>
>>>But here is a small think that I didn't understand....Why we have only this output with 3 rows? a2 for example has a lot of rows...you didn't use the last attach?
>>>
>>>But if you used the first, I think we will have more...
>>>
>>>Vera
>>>
>>>
>>>
>>>2013/2/26 arun <smartpink111 at yahoo.com>
>>>
>>>
>>>>
>>>>Hi,
>>>>Try this:
>>>>
>>>>files<-paste("MSMS_",23,"PepInfo.txt",sep="")
>>>>read.data<-function(x) {names(x)<-gsub("^(.*)\\/.*","\\1",x); lapply(x,function(y) read.table(y,header=TRUE,sep = "\t",stringsAsFactors=FALSE,fill=TRUE))}
>>>>lista<-do.call("c",lapply(list.files(recursive=T)[grep(files,list.files(recursive=T))],read.data))
>>>>names(lista)<-paste("group_",gsub("\\d+","",names(lista)),sep="")
>>>>res2<-split(lista,names(lista))
>>>>res3<- lapply(res2,function(x) {names(x)<-paste(gsub(".*_","",names(x)),1:length(x),sep="");x})
>>>>#Freq FDR<0.01
>>>>res4<-lapply(seq_along(res3),function(i) lapply(res3[[i]],function(x) x[x[["FDR"]]<0.01,c("Seq","Mod","z","spec")]))
>>>>names(res4)<- names(res2)
>>>> res4New<-lapply(res4,function(x) lapply(names(x),function(i) do.call(rbind,lapply(x[i],function(x) cbind(folder_name=i,x))) ))
>>>>
>>>>res5<- lapply(res4New,function(x) if(length(x)>1) tail(x,-1) else NULL)
>>>>library(plyr)
>>>>library(data.table)
>>>>res6<- lapply(res5,function(x) lapply(x,function(x1) {x1<-data.table(x1); x1[,spec:=paste(spec,collapse=","),by=c("Seq","Mod","z")]}))
>>>> res7<-lapply(res6,function(x) lapply(x,function(x1) {x1$counts<-sapply(x1$spec, function(x2) length(gsub("\\s", "", unlist(strsplit(x2, ",")))));x3<-as.data.frame(x1);names(x3)[6]<- as.character(unique(x3$folder_name));x3[,-c(1,5)]}))
>>>> res8<-lapply(res7,function(x) Reduce(function(...) merge(...,by=c("Seq","Mod","z"),all=TRUE),x))
>>>> res9<-res8[lapply(res8,length)!=0]
>>>> res10<- Reduce(function(...) merge(...,by=c("Seq","Mod","z"),all=TRUE),res9)
>>>>head(res10,3)
>>>> # Seq Mod z c2 c3 t2
>>>>#1 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/ 2 NA NA 1
>>>>#2 aAAAAAAAAAAASSPVGVGQR 1-n_acPro/ 2 NA NA 1
>>>>#3 aAAAAAAAAAGAAGGR 1-n_acPro/ 2 1 NA 1
>>>>A.K.
>>>>________________________________
>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>To: arun <smartpink111 at yahoo.com>
>>>>Sent: Tuesday, February 26, 2013 5:15 AM
>>>>Subject: Re: reading data
>>>>
>>>>
>>>>Sorry, I only see now your last email.
>>>>
>>>>I have at the moment 8 folder, but I can have more. I need to work in general.
>>>>
>>>>Thank you
>>>>
>>>>
>>>>
>>>>2013/2/25 arun <smartpink111 at yahoo.com>
>>>>
>>>>I sent the solution. But, I need to know how many folders you have for the analysis because I manually inserted the names at the end. It works if there are not many folders. Otherwise, need to add it in the program.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>________________________________
>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>Sent: Monday, February 25, 2013 10:01 AM
>>>>>Subject: Re: reading data
>>>>>
>>>>>
>>>>>Hi.
>>>>>
>>>>>Is from the attached dataset, but without a1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>2013/2/25 arun <smartpink111 at yahoo.com>
>>>>>
>>>>>
>>>>>>
>>>>>>Hi,
>>>>>>Are you sure that the output is from the attached dataset:
>>>>>>
>>>>>>I am getting the result for aa: with 111 rows:
>>>>>> aa
>>>>>> Seq Mod
>>>>>>1 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/
>>>>>>2 aAAAAAAAAAAASSPVGVGQR 1-n_acPro/
>>>>>>3 aAAAAAAAAAGAAGGR 1-n_acPro/
>>>>>>4 aAAAAAAAGAAGGRGSGPGRR 1-n_acPro/
>>>>>>5 AAAAAAAkAAK 8-K_ac/
>>>>>>6 AAAAAAALQAK
>>>>>>7 aAAAAAGAGPEMVR 1-n_acPro/
>>>>>>8 aAAAAATAAAAASIR 1-n_acPro/
>>>>>>9 AAAAAEQQQFyLLLGNLLSPDNVVR 11-Y_ph/
>>>>>>10 aAAAAEQQQFYLLLGNLLSPDNVVR 1-<_Carbamoylation/
>>>>>>11 aAAAAEQQQFYLLLGNLLSPDNVVR 1-<_Carbamoylation/
>>>>>>12 aAAAAEQQQFYLLLGNLLSPDNVVR 1-n_acPro/
>>>>>>13 aAAAAEQQQFYLLLGNLLSPDNVVR 1-n_acPro/
>>>>>>14 AAAAAPGTAEK
>>>>>>15 aAAAAQGGGGGEPR 1-n_acPro/
>>>>>>16 aAAAASAPQQLSDEELFSQLR 1-n_acPro/
>>>>>>17 aAAAAVGNAVPCGAR 1-n_acPro/
>>>>>>18 AAAAAWEEPSSGNGTAR
>>>>>>19 aAAAELSLLEK 1-n_acPro/
>>>>>>20 aAAAELSLLEK 1-n_acPro/
>>>>>>21 AAAAEVLGLILR
>>>>>>22 aAAAGAAAAAAAEGEAPAEMGALLLEK 1-n_acPro/
>>>>>>23 aAAAGGGGPGTAVGATGSGIAAAAAGLAVYR 1-<_Carbamoylation/
>>>>>>24 aAAAGGGGPGTAVGATGSGIAAAAAGLAVYR 1-n_acPro/
>>>>>>25 aAAAKPNNLSLVVHGPGDLR 1-n_acPro/
>>>>>>26 aAAANSGSSLPLFDCPTWAGKPPPGLHLDVVK 1-n_acPro/
>>>>>>27 aAAAVGAGHGAGGPGAASSSGGAR 1-n_acPro/
>>>>>>28 aAAAVGAGHGAGGPGAASSSGGAR 1-n_acPro/
>>>>>>29 aAAAVQGGR 1-n_acPro/
>>>>>>30 aAAAVVEFQR 1-<_Carbamoylation/
>>>>>>31 aAAAVVEFQR 1-n_acPro/
>>>>>>32 aAAAVVVPAEWIK 1-n_acPro/
>>>>>>33 aAADGDDSLYPIAVLIDELR 1-n_acPro/
>>>>>>34 aAADGDDSLYPIAVLIDELR 1-n_acPro/
>>>>>>35 AAADLMAYCEAHAK
>>>>>>36 AAADLMAYCEAHAK
>>>>>>37 aAAEAANCIMEVSCGQAESSEKPNAEDMTSK 1-n_acPro/
>>>>>>38 AAAEIYEEFLAAFEGSDGNK
>>>>>>39 AAAEVAGQFVIK
>>>>>>40 AAAIGIDLGTTYSCVGVFQHGK
>>>>>>41 AAAIGIDLGTTYSCVGVFQHGK
>>>>>>42 AAALATVNAWAEQTGMK
>>>>>>43 AAAMANNLQK
>>>>>>44 AAAPAPEEEMDECEQALAAEPK
>>>>>>45 AAAQLLQSQAQQSGAQQTK
>>>>>>46 AAATPESQEPQAK
>>>>>>47 aAAVAAAGAGEPQSPDELLPK 1-n_acPro/
>>>>>>48 aAAVLSGPSAGSAAGVPGGTGGLSAVSSGPR 1-n_acPro/
>>>>>>49 AAAVVGInSETIMKPASISEEELLNLINK 8-N_Deamidation/
>>>>>>50 AAAYNLVQHGITNLCVIGGDGSLTGANIFR
>>>>>>51 aADTQVSETLKR 1-n_acPro/
>>>>>>52 aAEAADLGLGAAVPVELR 1-n_acPro/
>>>>>>53 AAEDDEDDDVDTKK
>>>>>>54 AAEEPSKVEEK
>>>>>>55 AAEEPSKVEEK
>>>>>>56 AAEGGLSSPEFSELCIWLGSQIK
>>>>>>57 AAELIANSLATAGDGLIELR
>>>>>>58 AAELIANSLATAGDGLIELR
>>>>>>59 AAELLMSCFR
>>>>>>60 aAEPNKTEIQTLFK 1-n_acPro/
>>>>>>61 AAEQILEDMITIDVENVMEDICSK
>>>>>>62 AAEsETPGKSPEKKPK 4-S_ph/
>>>>>>63 AAEsETPGKSPEKKPK 4-S_ph/
>>>>>>64 AAESLADPTEYENLFPGLK
>>>>>>65 AAFDDAIAELDTLSEESYK
>>>>>>66 AAFDDAIAELDTLSEESYK
>>>>>>67 AAFECMYTLLDSCLDR
>>>>>>68 AAGAGLPESVIWAVNAGGEAHVDVHGIHFR
>>>>>>69 aAGGDGAEAPAKKDVK 1-n_acPro/
>>>>>>70 AAGGGAGSSEDDAQSR
>>>>>>71 AAGHPGDPESQQR
>>>>>>72 AAGHPGDPESQQR
>>>>>>73 AAGkFK 4-K_me/
>>>>>>74 AAGLATMISTMRPDIDNMDEYVR
>>>>>>75 aAGTAAALAFLSQESR 1-n_acPro/
>>>>>>76 aAGTLYTYPENWR 1-n_acPro/
>>>>>>77 aAGTSSYWEDLRK 1-n_acPro/
>>>>>>78 AAGTVFTTVEDLGSK
>>>>>>79 aAGVEAAAEVAATEIK 1-n_acPro/
>>>>>>80 AAGVGDMVMATVK
>>>>>>81 AAGVNVEPFWPGLFAK
>>>>>>82 AAGVVLEMIR
>>>>>>83 AAHIFFTDTCPEPLFSELGR
>>>>>>84 AAHLCAEAALR
>>>>>>85 AAHnKDVLR 4-N_Deamidation/
>>>>>>86 AAHVEYSTAAR
>>>>>>87 AAHVEYSTAAR
>>>>>>88 AAIAQALAGEVSVVPPSR
>>>>>>89 AAIISAEGDSK
>>>>>>90 AALAAEVKkPAAAAAPGTAEkLSPkATTASQAk 9-K_me2/21-K_me2/25-K_me2/33-K_me/
>>>>>>91 AALAFGFLDLLK
>>>>>>92 AALAGGTTMIIDHVVPEPGTSLLAAFDQWR
>>>>>>93 AALAHSEEVTASQVAATK
>>>>>>94 AALCHFCIDMLNAK
>>>>>>95 aALDSLSLFTSLGLSEQK 1-n_acPro/
>>>>>>96 AALEALGSCLNNK
>>>>>>97 AALEAQNALHNMK
>>>>>>98 AALETDENLLLCAPTGAGK
>>>>>>99 AALGPLVTGLYDVQAFK
>>>>>>100 aALGVLESDLPSAVTLLK 1-n_acPro/
>>>>>>101 AALLETLSLLLAK
>>>>>>102 AALPGILSELDVDVnEGSLMELQGHIGR 15-N_Deamidation/
>>>>>>103 AALPSHVVTMLDNFPTNLHPMSQLSAAVTALNSESNFAR
>>>>>>104 AALPSHVVTMLDNFPTnLHPMSQLSAAVTALNSESNFAR 17-N_Deamidation/
>>>>>>105 AALSALESFLK
>>>>>>106 AALSEEELEKK
>>>>>>107 aALTAEHFAALQSLLK 1-n_acPro/
>>>>>>108 AAMADTFLEHMCR
>>>>>>109 AAMEALVVEVTK
>>>>>>110 AAMFTAGSNFNHVVQNEK
>>>>>>111 aANATTNPSQLLPLELVDK 1-n_acPro/
>>>>>> z counts.x.x counts.y.x counts counts.x.y counts.y.y
>>>>>>1 2 NA NA NA NA 1
>>>>>>2 2 NA NA NA NA 1
>>>>>>3 2 1 1 1 1 1
>>>>>>4 2 1 1 NA NA NA
>>>>>>5 2 NA 1 NA NA NA
>>>>>>6 2 NA NA 1 NA 1
>>>>>>7 2 1 2 1 1 2
>>>>>>8 2 1 1 NA 1 NA
>>>>>>9 3 NA NA NA NA 1
>>>>>>10 2 NA NA NA NA 1
>>>>>>11 3 NA NA NA NA 1
>>>>>>12 2 NA 1 1 NA NA
>>>>>>13 3 1 2 2 NA 1
>>>>>>14 2 1 1 NA NA 1
>>>>>>15 2 NA NA NA 1 NA
>>>>>>16 2 NA NA NA NA 1
>>>>>>17 2 NA 1 1 NA 1
>>>>>>18 2 NA 1 1 NA 1
>>>>>>19 1 NA NA NA NA 1
>>>>>>20 2 1 1 1 1 1
>>>>>>21 2 1 1 1 1 1
>>>>>>22 3 NA 1 1 NA 1
>>>>>>23 3 NA NA 1 NA NA
>>>>>>24 3 1 NA NA NA 1
>>>>>>25 3 NA 1 NA NA NA
>>>>>>26 3 NA NA NA NA 1
>>>>>>27 2 NA 1 1 1 NA
>>>>>>28 3 NA NA 1 1 NA
>>>>>>29 2 NA NA 1 NA NA
>>>>>>30 2 NA NA 1 NA NA
>>>>>>31 2 NA NA 1 NA NA
>>>>>>32 2 1 NA NA 1 NA
>>>>>>33 2 1 NA 1 1 NA
>>>>>>34 3 1 NA NA 1 NA
>>>>>>35 2 1 NA NA NA NA
>>>>>>36 3 1 NA NA 1 NA
>>>>>>37 3 1 NA NA 1 NA
>>>>>>38 2 1 NA NA 1 NA
>>>>>>39 2 1 NA NA NA NA
>>>>>>40 2 1 NA NA NA NA
>>>>>>41 3 1 NA NA NA NA
>>>>>>42 2 1 NA NA NA NA
>>>>>>43 2 1 NA NA NA NA
>>>>>>44 2 1 NA NA NA NA
>>>>>>45 2 1 NA NA NA NA
>>>>>>46 2 1 NA NA NA NA
>>>>>>47 2 1 NA NA NA NA
>>>>>>48 3 1 NA NA NA NA
>>>>>>49 3 1 NA NA NA NA
>>>>>>50 3 1 NA NA NA NA
>>>>>>51 2 1 NA NA NA NA
>>>>>>52 2 1 NA NA NA NA
>>>>>>53 2 1 NA NA NA NA
>>>>>>54 2 1 NA NA NA NA
>>>>>>55 3 1 NA NA NA NA
>>>>>>56 2 1 NA NA NA NA
>>>>>>57 2 1 NA NA NA NA
>>>>>>58 3 1 NA NA NA NA
>>>>>>59 2 1 NA NA NA NA
>>>>>>60 2 1 NA NA NA NA
>>>>>>61 3 1 NA NA NA NA
>>>>>>62 3 1 NA NA NA NA
>>>>>>63 4 1 NA NA NA NA
>>>>>>64 2 1 NA NA NA NA
>>>>>>65 2 1 NA NA NA NA
>>>>>>66 3 1 NA NA NA NA
>>>>>>67 2 1 NA NA NA NA
>>>>>>68 5 1 NA NA NA NA
>>>>>>69 3 1 NA NA NA NA
>>>>>>70 2 1 NA NA NA NA
>>>>>>71 2 1 NA NA NA NA
>>>>>>72 3 2 NA NA NA NA
>>>>>>73 1 1 NA NA NA NA
>>>>>>74 2 1 NA NA NA NA
>>>>>>75 2 1 NA NA NA NA
>>>>>>76 2 1 NA NA NA NA
>>>>>>77 2 1 NA NA NA NA
>>>>>>78 2 1 NA NA NA NA
>>>>>>79 2 11 NA NA NA NA
>>>>>>80 2 1 NA NA NA NA
>>>>>>81 2 1 NA NA NA NA
>>>>>>82 2 1 NA NA NA NA
>>>>>>83 2 1 NA NA NA NA
>>>>>>84 2 1 NA NA NA NA
>>>>>>85 2 1 NA NA NA NA
>>>>>>86 2 1 NA NA NA NA
>>>>>>87 3 1 NA NA NA NA
>>>>>>88 2 1 NA NA NA NA
>>>>>>89 2 1 NA NA NA NA
>>>>>>90 5 1 NA NA NA NA
>>>>>>91 2 1 NA NA NA NA
>>>>>>92 3 1 NA NA NA NA
>>>>>>93 3 1 NA NA NA NA
>>>>>>94 2 1 NA NA NA NA
>>>>>>95 2 1 NA NA NA NA
>>>>>>96 2 1 NA NA NA NA
>>>>>>97 2 1 NA NA NA NA
>>>>>>98 2 1 NA NA NA NA
>>>>>>99 2 1 NA NA NA NA
>>>>>>100 2 1 NA NA NA NA
>>>>>>101 2 1 NA NA NA NA
>>>>>>102 3 1 NA NA NA NA
>>>>>>103 3 1 NA NA NA NA
>>>>>>104 4 1 NA NA NA NA
>>>>>>105 2 1 NA NA NA NA
>>>>>>106 2 1 NA NA NA NA
>>>>>>107 2 1 NA NA NA NA
>>>>>>108 3 1 NA NA NA NA
>>>>>>109 2 1 NA NA NA NA
>>>>>>110 3 1 NA NA NA NA
>>>>>>111 2 1 NA NA NA NA
>>>>>>
>>>>>>________________________________
>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>Sent: Monday, February 25, 2013 8:56 AM
>>>>>>Subject: Re: reading data
>>>>>>
>>>>>>
>>>>>>You're correct, but my real data have +- 40000 row, and I can have duplicated rows. I group number of spec if data has the same Seq, mod and z.
>>>>>>
>>>>>>For the data in attach , if I do the code (only for c and t),
>>>>>>
>>>>>>c1 <- read.table("C:/Users/Vera Costa/Desktop/data.new/c1/MSMS_23PepInfo.txt",header=TRUE, sep = "\t", na.strings="NA", dec=".", strip.white=TRUE)
>>>>>>c2 <- read.table("C:/Users/Vera Costa/Desktop/data.new/c2/MSMS_23PepInfo.txt",header=TRUE, sep = "\t", na.strings="NA", dec=".", strip.white=TRUE)
>>>>>>c3 <- read.table("C:/Users/Vera Costa/Desktop/data.new/c3/MSMS_23PepInfo.txt",header=TRUE, sep = "\t", na.strings="NA", dec=".", strip.white=TRUE)
>>>>>>t1 <- read.table("C:/Users/Vera Costa/Desktop/data.new/t1/MSMS_23PepInfo.txt",header=TRUE, sep = "\t", na.strings="NA", dec=".", strip.white=TRUE)
>>>>>>t2 <- read.table("C:/Users/Vera Costa/Desktop/data.new/t2/MSMS_23PepInfo.txt",header=TRUE, sep = "\t", na.strings="NA", dec=".", strip.white=TRUE)
>>>>>>dc1<-c1[ifelse(c1$FDR<0.01, TRUE, FALSE),]
>>>>>>dc2<-c2[ifelse(c2$FDR<0.01, TRUE, FALSE),]
>>>>>>dc3<-c3[ifelse(c2$FDR<0.01, TRUE, FALSE),]
>>>>>>dt1<-t1[ifelse(t1$FDR<0.01, TRUE, FALSE),]
>>>>>>dt2<-t2[ifelse(t2$FDR<0.01, TRUE, FALSE),]
>>>>>>bc1<- aggregate(spec ~ Seq + Mod+z, data = dc1, paste, collapse = ",")
>>>>>>bc2<- aggregate(spec ~ Seq + Mod+z, data = dc2, paste, collapse = ",")
>>>>>>bc3<- aggregate(spec ~ Seq + Mod+z, data = dc3, paste, collapse = ",")
>>>>>>bt1<- aggregate(spec ~ Seq + Mod+z, data = dt1, paste, collapse = ",")
>>>>>>bt2<- aggregate(spec ~ Seq + Mod+z, data = dt2, paste, collapse = ",")
>>>>>>bc1$counts <- sapply(bc1$spec, function(x) length(gsub("\\s", "", unlist(strsplit(x, ",")))))
>>>>>>bc2$counts <- sapply(bc2$spec, function(x) length(gsub("\\s", "", unlist(strsplit(x, ",")))))
>>>>>>bc3$counts <- sapply(bc3$spec, function(x) length(gsub("\\s", "", unlist(strsplit(x, ",")))))
>>>>>>bt1$counts <- sapply(bt1$spec, function(x) length(gsub("\\s", "", unlist(strsplit(x, ",")))))
>>>>>>bt2$counts <- sapply(bt2$spec, function(x) length(gsub("\\s", "", unlist(strsplit(x, ",")))))
>>>>>>bc1<-bc1[,-4]
>>>>>>bc2<-bc2[,-4]
>>>>>>bc3<-bc3[,-4]
>>>>>>bt1<-bt1[,-4]
>>>>>>bt2<-bt2[,-4]
>>>>>>a1<-merge(bc1,bc2,by=c("Seq","Mod","z"),all=TRUE)
>>>>>>a2<-merge(a1,bc3,by=c("Seq","Mod","z"),all=TRUE)
>>>>>>a3<-merge(bt1,bt2,by=c("Seq","Mod","z"),all=TRUE)
>>>>>>aa<-merge(a2,a3,by=c("Seq","Mod","z"),all=TRUE)
>>>>>>aa
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>I have the output
>>>>>>
>>>>>>
>>>>>> Seq Mod z counts.x.x counts.y.x counts.x.y counts.y.y
>>>>>>1 aAAAAAAAAAGAAGGR 1-n_acPro/ 2 NA 1 1 1
>>>>>>2 aAAAAAAAGAAGGRGSGPGRR 1-n_acPro/ 2 1 NA NA NA
>>>>>>3 aAAAAAGAGPEMVR 1-n_acPro/ 2 NA 2 1 2
>>>>>>4 aAAAAATAAAAASIR 1-n_acPro/ 2 1 NA 1 NA
>>>>>>5 aAAAAEQQQFYLLLGNLLSPDNVVR 1-n_acPro/ 2 NA 1 NA NA
>>>>>>6 aAAAAEQQQFYLLLGNLLSPDNVVR 1-n_acPro/ 3 1 2 NA 1
>>>>>>7 aAAAAEQQQFYLLLGNLLSPDNVVR 1-<_Carbamoylation/ 2 NA NA NA 1
>>>>>>8 aAAAAEQQQFYLLLGNLLSPDNVVR 1-<_Carbamoylation/ 3 NA NA NA 1
>>>>>>9 AAAAAPGTAEK 2 1 1 NA NA
>>>>>>10 aAAAELSLLEK 1-n_acPro/ 1 NA NA NA 1
>>>>>>11 aAAAELSLLEK 1-n_acPro/ 2 1 NA NA NA
>>>>>>12 AAAAEVLGLILR 2 NA 1 NA 1
>>>>>>13 aAAAGGGGPGTAVGATGSGIAAAAAGLAVYR 1-n_acPro/ 3 1 NA NA 1
>>>>>>14 aAAAVVVPAEWIK 1-n_acPro/ 2 1 NA 1 NA
>>>>>>15 aAADGDDSLYPIAVLIDELR 1-n_acPro/ 2 1 NA 1 NA
>>>>>>16 aAADGDDSLYPIAVLIDELR 1-n_acPro/ 3 NA NA 1 NA
>>>>>>17 AAADLMAYCEAHAK 2 1 NA NA NA
>>>>>>18 AAADLMAYCEAHAK 3 1 NA 1 NA
>>>>>>19 aAAEAANCIMEVSCGQAESSEKPNAEDMTSK 1-n_acPro/ 3 1 NA 1 NA
>>>>>>20 AAAEIYEEFLAAFEGSDGNK 2 1 NA 1 NA
>>>>>>21 AAAIGIDLGTTYSCVGVFQHGK 2 1 NA NA NA
>>>>>>22 AAAIGIDLGTTYSCVGVFQHGK 3 1 NA NA NA
>>>>>>23 AAALATVNAWAEQTGMK 2 1 NA NA NA
>>>>>>24 AAAPAPEEEMDECEQALAAEPK 2 1 NA NA NA
>>>>>>25 AAAQLLQSQAQQSGAQQTK 2 1 NA NA NA
>>>>>>26 AAATPESQEPQAK 2 1 NA NA NA
>>>>>>27 aAAVAAAGAGEPQSPDELLPK 1-n_acPro/ 2 1 NA NA NA
>>>>>>28 AAAVVGInSETIMKPASISEEELLNLINK 8-N_Deamidation/ 3 1 NA NA NA
>>>>>>29 aADTQVSETLKR 1-n_acPro/ 2 1 NA NA NA
>>>>>>30 AAEDDEDDDVDTKK 2 1 NA NA NA
>>>>>>31 AAEEPSKVEEK 2 1 NA NA NA
>>>>>>32 AAEEPSKVEEK 3 1 NA NA NA
>>>>>>33 AAEGGLSSPEFSELCIWLGSQIK 2 1 NA NA NA
>>>>>>34 AAELIANSLATAGDGLIELR 2 1 NA NA NA
>>>>>>35 aAEPNKTEIQTLFK 1-n_acPro/ 2 1 NA NA NA
>>>>>>36 AAEQILEDMITIDVENVMEDICSK 3 1 NA NA NA
>>>>>>37 AAESLADPTEYENLFPGLK 2 1 NA NA NA
>>>>>>38 AAFDDAIAELDTLSEESYK 2 1 NA NA NA
>>>>>>39 AAFDDAIAELDTLSEESYK 3 1 NA NA NA
>>>>>>40 AAFECMYTLLDSCLDR 2 1 NA NA NA
>>>>>>41 AAGGGAGSSEDDAQSR 2 1 NA NA NA
>>>>>>42 AAGHPGDPESQQR 2 1 NA NA NA
>>>>>>43 AAGHPGDPESQQR 3 2 NA NA NA
>>>>>>44 AAGLATMISTMRPDIDNMDEYVR 2 1 NA NA NA
>>>>>>45 aAGTLYTYPENWR 1-n_acPro/ 2 1 NA NA NA
>>>>>>46 aAGTSSYWEDLRK 1-n_acPro/ 2 1 NA NA NA
>>>>>>47 aAGVEAAAEVAATEIK 1-n_acPro/ 2 11 NA NA NA
>>>>>>48 AAGVNVEPFWPGLFAK 2 1 NA NA NA
>>>>>>49 AAHLCAEAALR 2 1 NA NA NA
>>>>>>50 AALAGGTTMIIDHVVPEPGTSLLAAFDQWR 3 1 NA NA NA
>>>>>>51 AALCHFCIDMLNAK 2 1 NA NA NA
>>>>>>52 aALDSLSLFTSLGLSEQK 1-n_acPro/ 2 1 NA NA NA
>>>>>>53 AALEALGSCLNNK 2 1 NA NA NA
>>>>>>54 AALEAQNALHNMK 2 1 NA NA NA
>>>>>>55 AALETDENLLLCAPTGAGK 2 1 NA NA NA
>>>>>>56 aALGVLESDLPSAVTLLK 1-n_acPro/ 2 1 NA NA NA
>>>>>>57 AALPGILSELDVDVnEGSLMELQGHIGR 15-N_Deamidation/ 3 1 NA NA NA
>>>>>>58 AALPSHVVTMLDNFPTnLHPMSQLSAAVTALNSESNFAR 17-N_Deamidation/ 4 1 NA NA NA
>>>>>>59 AALPSHVVTMLDNFPTNLHPMSQLSAAVTALNSESNFAR 3 1 NA NA NA
>>>>>>60 aALTAEHFAALQSLLK 1-n_acPro/ 2 1 NA NA NA
>>>>>>61 AAMADTFLEHMCR 3 1 NA NA NA
>>>>>>62 AAMFTAGSNFNHVVQNEK 3 1 NA NA NA
>>>>>>63 aANATTNPSQLLPLELVDK 1-n_acPro/ 2 1 NA NA NA
>>>>>>64 aAAAAVGNAVPCGAR 1-n_acPro/ 2 NA 1 NA 1
>>>>>>65 AAAAAWEEPSSGNGTAR 2 NA 1 NA 1
>>>>>>66 aAAAGAAAAAAAEGEAPAEMGALLLEK 1-n_acPro/ 3 NA 1 NA 1
>>>>>>67 aAAAVGAGHGAGGPGAASSSGGAR 1-n_acPro/ 2 NA 1 1 NA
>>>>>>68 aAAAVGAGHGAGGPGAASSSGGAR 1-n_acPro/ 3 NA NA 1 NA
>>>>>>69 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/ 2 NA NA NA 1
>>>>>>70 aAAAAAAAAAAASSPVGVGQR 1-n_acPro/ 2 NA NA NA 1
>>>>>>71 AAAAAAALQAK 2 NA NA NA 1
>>>>>>72 aAAAASAPQQLSDEELFSQLR 1-n_acPro/ 2 NA NA NA 1
>>>>>>73 aAAANSGSSLPLFDCPTWAGKPPPGLHLDVVK 1-n_acPro/ 3 NA NA NA 1
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>2013/2/25 arun <smartpink111 at yahoo.com>
>>>>>>
>>>>>>Hi,
>>>>>>>What i said was:the `spec` column didn't change before and after the aggregate() step. I think you did aggregate to group it based on Seq, Mod, z. In the example you provided, it was already grouped. May be it is not in your original dataset. Anyway, please email me the output you are getting for your codes.
>>>>>>>
>>>>>>>Arun
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>________________________________
>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>Sent: Monday, February 25, 2013 5:36 AM
>>>>>>>
>>>>>>>Subject: Re: reading data
>>>>>>>
>>>>>>>
>>>>>>>Sorry, I don't understand what you said.
>>>>>>>
>>>>>>>I need to
>>>>>>>- read data (like the code that you did)
>>>>>>>- select only data with FDR<0.01 for all files
>>>>>>>- remove first file of each group (a1,c1,t1,...)
>>>>>>>- select only column Seq, Mod, z, spec for all files
>>>>>>>- for each file behind merge data with the same spec, mod an z (grouping the spec)
>>>>>>>- table frequencies of spec like:
>>>>>>>
>>>>>>> seq c2 c3 c4 t1 ....
>>>>>>> aaaaA 0 2 5 6 this table is how many number I have in spec (in total)
>>>>>>>
>>>>>>>
>>>>>>>I think my small code isn't correct...
>>>>>>>
>>>>>>>Thank you
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>2013/2/23 arun <smartpink111 at yahoo.com>
>>>>>>>
>>>>>>>One more thing:
>>>>>>>>The last column 'spec' in the output is already aggregated based on `Seq`, `Mod`, `z` in the data.new directory.
>>>>>>>> res5[[3]][[1]]
>>>>>>>>
>>>>>>>> Seq Mod z spec
>>>>>>>>1 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/ 2 11833
>>>>>>>>2 aAAAAAAAAAAASSPVGVGQR 1-n_acPro/ 2 11833
>>>>>>>>3 aAAAAAAAAAGAAGGR 1-n_acPro/ 2 13103
>>>>>>>>4 AAAAAAALQAK 2 3084
>>>>>>>>5 aAAAAAGAGPEMVR 1-n_acPro/ 2 9646,9821 #################check here
>>>>>>>>
>>>>>>>>6 aAAAAEQQQFYLLLGNLLSPDNVVR 1-<_Carbamoylation/ 2 33650
>>>>>>>>7 aAAAAEQQQFYLLLGNLLSPDNVVR 1-<_Carbamoylation/ 3 33607
>>>>>>>>9 aAAAAEQQQFYLLLGNLLSPDNVVR 1-n_acPro/ 3 33769
>>>>>>>>11 aAAAASAPQQLSDEELFSQLR 1-n_acPro/ 2 20602
>>>>>>>>12 aAAAAVGNAVPCGAR 1-n_acPro/ 2 10018
>>>>>>>>13 AAAAAWEEPSSGNGTAR 2 5576
>>>>>>>>14 aAAAELSLLEK 1-n_acPro/ 1 19662
>>>>>>>>16 AAAAEVLGLILR 2 22857
>>>>>>>>17 aAAAGAAAAAAAEGEAPAEMGALLLEK 1-n_acPro/ 3 26060
>>>>>>>>18 aAAAGGGGPGTAVGATGSGIAAAAAGLAVYR 1-n_acPro/ 3 21479
>>>>>>>>19 aAAANSGSSLPLFDCPTWAGKPPPGLHLDVVK 1-n_acPro/ 3 21159
>>>>>>>>
>>>>>>>>aggregate() doesn't change anything here, especially in this dataset.
>>>>>>>>In the next line you used sapply(....., ), which gives an output,
>>>>>>>>sapply(res6[[3]][[1]]$spec,function(x) length(gsub("\\s","",unlist(strsplit(x,","))))) # this I believe is not correct
>>>>>>>># 11833 11833 13103 3084 9646,9821 33650 33607 33769 #here you have two `11833` and one `9646.9821`. Not really sure what you want here
>>>>>>>> # 1 1 1 1 2 1 1 1
>>>>>>>> # 20602 10018 5576 19662 22857 26060 21479 21159
>>>>>>>> # 1 1 1 1 1 1 1 1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>If it is:
>>>>>>>> table(unlist(strsplit(res6[[3]][[1]]$spec,","))) #this makes sense
>>>>>>>>
>>>>>>>>#10018 11833 13103 19662 20602 21159 21479 22857 26060 3084 33607 33650 33769
>>>>>>>> # 1 2 1 1 1 1 1 1 1 1 1 1 1
>>>>>>>># 5576 9646 9821
>>>>>>>> # 1 1 1
>>>>>>>>
>>>>>>>>Now coming to the last `merge` section:
>>>>>>>>do you want to merge the counts in each group by "spec" name: #in this case "Var1"
>>>>>>>>
>>>>>>>>$group_c
>>>>>>>>$group_c$c2
>>>>>>>> Var1 Freq
>>>>>>>>1 10039 1
>>>>>>>>2 13200 1
>>>>>>>>3 22929 1
>>>>>>>>4 26117 1
>>>>>>>>5 33712 1
>>>>>>>>6 33774 1
>>>>>>>>7 33867 1
>>>>>>>>8 379 1
>>>>>>>>9 4102 1
>>>>>>>>10 5664 1
>>>>>>>>11 9703 1
>>>>>>>>12 9876 1
>>>>>>>>
>>>>>>>>$group_c$c3
>>>>>>>> Var1 Freq
>>>>>>>>1 10325 1
>>>>>>>>2 21555 1
>>>>>>>>3 22994 1
>>>>>>>>4 26142 1
>>>>>>>>5 3341 1
>>>>>>>>6 33708 1
>>>>>>>>7 33870 1
>>>>>>>>8 34095 1
>>>>>>>>9 4397 1
>>>>>>>>10 4416 1
>>>>>>>>11 5960 1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>A.K.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>________________________________
>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>Sent: Friday, February 22, 2013 8:36 PM
>>>>>>>>
>>>>>>>>Subject: Re: reading data
>>>>>>>>
>>>>>>>>
>>>>>>>>Oh,sorry.
>>>>>>>>Now,I'm in phone. Tomorrow, i will send.
>>>>>>>>Thank you
>>>>>>>>No dia 22 de Fev de 2013 22:06, "arun" <smartpink111 at yahoo.com> escreveu:
>>>>>>>>
>>>>>>>>Hi,
>>>>>>>>>
>>>>>>>>>As I mentioned in my earlier post, results that you got from your code in the same dataset 'data.new' will make it easy for me rather than figuring out how your code works.
>>>>>>>>>Thanks,
>>>>>>>>>A.K.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>________________________________
>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>Sent: Friday, February 22, 2013 1:13 PM
>>>>>>>>>Subject: Re: reading data
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>Hi.
>>>>>>>>>
>>>>>>>>>I use you code and it was a good, good help. Thank you.
>>>>>>>>>
>>>>>>>>>I'm now doing a new study of the data but I need to optimize my code.
>>>>>>>>>
>>>>>>>>>For the same data, I need:
>>>>>>>>>
>>>>>>>>>- read data (like the code that you did)
>>>>>>>>>- select only data with FDR<0.01 for all files
>>>>>>>>>- remove first file of each group (a1,c1,t1,...)
>>>>>>>>>- select only column Seq, Mod, z, spec for all files
>>>>>>>>>- for each file behind merge data with the same spec, mod an z (grouping the spec)
>>>>>>>>>- table frequencies of spec like:
>>>>>>>>> seq c2 c3 c4 t1 ....
>>>>>>>>> aaaaA 0 2 5 6 this table is how many number I have in spec (in total)
>>>>>>>>> .....
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>I start doing the code.....
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>spec <- function(directory,number) {
>>>>>>>>> setwd(directory)
>>>>>>>>> direct<-dir(directory,pattern = paste("MSMS_",number,"PepInfo.txt",sep=""), full.names = FALSE, recursive = TRUE)
>>>>>>>>> directT <- direct[grepl("^t", direct)]
>>>>>>>>> directC <- direct[grepl("^c", direct)]
>>>>>>>>>
>>>>>>>>> lista<-lapply(direct, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>> listaC<-lapply(directC, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>> listaT<-lapply(directT, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>>
>>>>>>>>> #boxplots for each run
>>>>>>>>> dcf<-c()
>>>>>>>>> dtf<-c()
>>>>>>>>>
>>>>>>>>> for(i in 1:length(lista)){
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> for (i in 2:length(listaC)) {
>>>>>>>>> dcc1<-listaC[[i]][ifelse(listaC[[i]]$FDR<Pfdr, TRUE, FALSE),]
>>>>>>>>> dcc1<- aggregate(spec ~ Seq + Mod+z, data = dcc1, paste, collapse = ",")
>>>>>>>>> dcc1$counts <- sapply(dcc1$spec, function(x) length(gsub("\\s", "", unlist(strsplit(x, ",")))))
>>>>>>>>> dcc1<-dcc1[,-4]
>>>>>>>>> dcf<-list(dcf,dcc1)
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>> print(dcf)
>>>>>>>>>
>>>>>>>>>merg<-merge(dcf[[1]][[2]],dcf[[2]],by=c("Seq","Mod","z"),all=TRUE)
>>>>>>>>>print(merg)
>>>>>>>>> for (i in 2:length(listaT)) {
>>>>>>>>> dct1<-listaT[[i]][ifelse(listaT[[i]]$FDR<Pfdr, TRUE, FALSE),]
>>>>>>>>> dct1<- aggregate(spec ~ Seq + Mod+z, data = dct1, paste, collapse = ",")
>>>>>>>>> dct1$counts <- sapply(dct1$spec, function(x) length(gsub("\\s", "", unlist(strsplit(x, ",")))))
>>>>>>>>> dct1<-dct1[,-4]
>>>>>>>>> dtf<-list(dtf,dct1)
>>>>>>>>> }
>>>>>>>>>}
>>>>>>>>>spec("C:/Users/Vera Costa/Desktop/data.new",23)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>I can doing the new code. The problem is that I need a lot of time to do this row:
>>>>>>>>>dcc1<- aggregate(spec ~ Seq + Mod+z, data = dcc1, paste, collapse = ",")
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>I have near than 40000 rows.
>>>>>>>>>
>>>>>>>>>Could you help me to optimize this?
>>>>>>>>>
>>>>>>>>>Thank you.
>>>>>>>>>Vera
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>2013/2/20 Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>
>>>>>>>>>Thank you very much.
>>>>>>>>>>
>>>>>>>>>>I will try.
>>>>>>>>>>
>>>>>>>>>>thank you
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>2013/2/20 arun <smartpink111 at yahoo.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>
>>>>>>>>>>>You can change `res4` to:
>>>>>>>>>>>lev<-sort(unique(do.call(c,lapply(seq_along(res3),function(i) do.call(c,lapply(res3[[i]],function(x) unique(x$z)))))))
>>>>>>>>>>>res4<-lapply(seq_along(res3),function(i) do.call(rbind,lapply(res3[[i]],function(x) as.data.frame(table(factor(x$z,levels=lev))))))
>>>>>>>>>>>
>>>>>>>>>>>freqs1<-do.call(rbind,lapply(split(freq.f1,gsub("\\d+","",freq.f1$id)),function(x) x[-1,])) #here there is only level for a1. So, it is removed
>>>>>>>>>>> average1<- colMeans(freqs1[,-1])
>>>>>>>>>>> average1
>>>>>>>>>>># 1 2 3
>>>>>>>>>>>#0.3333333 8.0000000 3.6666667
>>>>>>>>>>>pvalues1<-do.call(rbind,lapply(seq_len(nrow(freqs1)),function(x) chisq.test(freqs1[x,-1],average1)))
>>>>>>>>>>> row.names(pvalues1)<- row.names(freqs1)
>>>>>>>>>>> pvalues1
>>>>>>>>>>># [,1]
>>>>>>>>>>>#c.group_c.2 0.7235907
>>>>>>>>>>>#c.group_c.3 0.7963287
>>>>>>>>>>>#t 0.9079200
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>A.K.
>>>>>>>>>>>
>>>>>>>>>>>----- Original Message -----
>>>>>>>>>>>
>>>>>>>>>>>From: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>To: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>Cc: R help <r-help at r-project.org>
>>>>>>>>>>>Sent: Tuesday, February 19, 2013 7:29 PM
>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>Try this:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>files<-paste("MSMS_",23,"PepInfo.txt",sep="")
>>>>>>>>>>>read.data<-function(x) {names(x)<-gsub("^(.*)\\/.*","\\1",x); lapply(x,function(y) read.table(y,header=TRUE,sep = "\t",stringsAsFactors=FALSE,fill=TRUE))}
>>>>>>>>>>>lista<-do.call("c",lapply(list.files(recursive=T)[grep(files,list.files(recursive=T))],read.data))
>>>>>>>>>>>names(lista)<-paste("group_",gsub("\\d+","",names(lista)),sep="")
>>>>>>>>>>>res2<-split(lista,names(lista))
>>>>>>>>>>>res3<- lapply(res2,function(x) {names(x)<-paste(gsub(".*_","",names(x)),1:length(x),sep="");x})
>>>>>>>>>>>#Freq whole data
>>>>>>>>>>>res4<-lapply(seq_along(res3),function(i) do.call(rbind,lapply(res3[[i]],function(x) as.data.frame(table(factor(x$z,levels=1:3))))))
>>>>>>>>>>>names(res4)<- names(res2)
>>>>>>>>>>>library(reshape2)
>>>>>>>>>>>freq.i1<-do.call(rbind,lapply(res4,function(x) dcast(melt(data.frame(id=gsub("\\..*","",row.names(x)),x),id.var=c("id","Var1")),id~Var1,value.var="value")))
>>>>>>>>>>>freq.i1
>>>>>>>>>>># id 1 2 3
>>>>>>>>>>>#group_a a1 1 12 6
>>>>>>>>>>>#group_c.1 c1 0 10 3
>>>>>>>>>>>#group_c.2 c2 0 12 3
>>>>>>>>>>>#group_c.3 c3 0 13 4
>>>>>>>>>>>#group_t.1 t1 0 10 4
>>>>>>>>>>>#group_t.2 t2 1 12 6
>>>>>>>>>>>
>>>>>>>>>>>freq.rel.i1<- as.matrix(freq.i1[,-1]/rowSums(freq.i1[,-1]) )
>>>>>>>>>>> freq.rel.i1
>>>>>>>>>>> # 1 2 3
>>>>>>>>>>>#group_a 0.05263158 0.6315789 0.3157895
>>>>>>>>>>>#group_c.1 0.00000000 0.7692308 0.2307692
>>>>>>>>>>>#group_c.2 0.00000000 0.8000000 0.2000000
>>>>>>>>>>>#group_c.3 0.00000000 0.7647059 0.2352941
>>>>>>>>>>>#group_t.1 0.00000000 0.7142857 0.2857143
>>>>>>>>>>>#group_t.2 0.05263158 0.6315789 0.3157895
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>#Freq with FDR< 0.01
>>>>>>>>>>>res5<-lapply(seq_along(res3),function(i) do.call(rbind,lapply(res3[[i]],function(x) as.data.frame(table(factor(x$z[x[["FDR"]]<0.01],levels=1:3))))))
>>>>>>>>>>>names(res5)<- names(res2)
>>>>>>>>>>>
>>>>>>>>>>>freq.f1<- do.call(rbind,lapply(res5,function(x) dcast(melt(data.frame(id=gsub("\\..*","",row.names(x)),x),id.var=c("id","Var1")),id~Var1,value.var="value")))
>>>>>>>>>>>
>>>>>>>>>>> freq.f1
>>>>>>>>>>> # id 1 2 3
>>>>>>>>>>>#group_a a1 1 10 5
>>>>>>>>>>>#group_c.1 c1 0 7 2
>>>>>>>>>>>#group_c.2 c2 0 8 2
>>>>>>>>>>>#group_c.3 c3 0 6 4
>>>>>>>>>>>#group_t.1 t1 0 7 4
>>>>>>>>>>>#group_t.2 t2 1 10 5
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>freq.rel.f1<- as.matrix(freq.f1[,-1]/rowSums(freq.f1[,-1]))
>>>>>>>>>>>
>>>>>>>>>>>colour<-sample(rainbow(nrow(freq.rel.i1)))
>>>>>>>>>>>par(mfrow=c(1,2))
>>>>>>>>>>>barplot(freq.rel.i1,beside=T,main=("Sample"),xlab="Charge",ylab="Relative Frequencies",col=colour,legend.text = rownames(freq.rel.i1))
>>>>>>>>>>>barplot(freq.rel.f1,beside=T,main=("Sample with FDR<0.01"),xlab="Charge",ylab="Relative Frequencies",col=colour,legend.text = rownames(freq.rel.f1))
>>>>>>>>>>>#change the legend position
>>>>>>>>>>>
>>>>>>>>>>>Also, didn't check the rest of the code from chisquare test.
>>>>>>>>>>>A.K.
>>>>>>>>>>>________________________________
>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>Sent: Tuesday, February 19, 2013 4:19 PM
>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Here is the code and some outputs.
>>>>>>>>>>>
>>>>>>>>>>>z.plot <- function(directory,number) {
>>>>>>>>>>> #reading data
>>>>>>>>>>> setwd(directory)
>>>>>>>>>>> direct<-dir(directory,pattern = paste("MSMS_",number,"PepInfo.txt",sep=""), full.names = FALSE, recursive = TRUE)
>>>>>>>>>>> directT <- direct[grepl("^t", direct)]
>>>>>>>>>>> directC <- direct[grepl("^c", direct)]
>>>>>>>>>>>
>>>>>>>>>>> lista<-lapply(direct, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>>>> listaC<-lapply(directC, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>>>> listaT<-lapply(directT, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>>>>
>>>>>>>>>>> #count different z values
>>>>>>>>>>> cab <- vector()
>>>>>>>>>>> for (i in 1:length(lista)) {
>>>>>>>>>>> dc<-lista[[i]][ifelse(lista[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>> dc<-table(dc$z)
>>>>>>>>>>> cab <- c(cab, names(dc))
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> #Relative freqs to construct the graph
>>>>>>>>>>> cab <- unique(cab)
>>>>>>>>>>> print(cab)
>>>>>>>>>>>
>>>>>>>>>>>###[1] "2" "3" "1"
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> d <- matrix(ncol=length(cab))
>>>>>>>>>>> dci<- d[-1,]
>>>>>>>>>>> dcf <- d[-1,]
>>>>>>>>>>> dti <- d[-1,]
>>>>>>>>>>> dtf <- d[-1,]
>>>>>>>>>>>
>>>>>>>>>>> for (i in 1:length(listaC)) {
>>>>>>>>>>>
>>>>>>>>>>> #Relative freq of all data
>>>>>>>>>>> dcc<-listaC[[i]]
>>>>>>>>>>> dcc<-table(factor(dcc$z, levels=cab))
>>>>>>>>>>> dci<- rbind(dci, dcc)
>>>>>>>>>>> rownames(dci)<-rownames(1:(nrow(dci)), do.NULL = FALSE, prefix = "c")
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> #Relative freq of data with FDR<0.01
>>>>>>>>>>> dcc1<-listaC[[i]][ifelse(listaC[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>> dcc1<-table(factor(dcc1$z, levels=cab))
>>>>>>>>>>> dcf<- rbind(dcf,dcc1)
>>>>>>>>>>> rownames(dcf)<-rownames(1:(nrow(dcf)), do.NULL = FALSE, prefix = "c")
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> for (i in 1:length(listaT)) {
>>>>>>>>>>>
>>>>>>>>>>> #Relative freq of all data
>>>>>>>>>>> dct<-listaT[[i]]
>>>>>>>>>>> dct<-table(factor(dct$z, levels=cab))
>>>>>>>>>>> dti<- rbind(dti, dct)
>>>>>>>>>>> rownames(dti)<-rownames(1:(nrow(dti)), do.NULL = FALSE, prefix = "t")
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> #Relative freq of data with FDR<0.01
>>>>>>>>>>> dct1<-listaT[[i]][ifelse(listaT[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>> dct1<-table(factor(dct1$z, levels=cab))
>>>>>>>>>>> dtf<- rbind(dtf,dct1)
>>>>>>>>>>> rownames(dtf)<-rownames(1:(nrow(dtf)), do.NULL = FALSE, prefix = "t")
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> freq.i<-rbind(dci,dti)
>>>>>>>>>>> freq.f<-rbind(dcf,dtf)
>>>>>>>>>>> freq.rel.i<-freq.i/apply(freq.i,1,sum)
>>>>>>>>>>> freq.rel.f<-freq.f/apply(freq.f,1,sum)
>>>>>>>>>>>
>>>>>>>>>>> print(freq.i)
>>>>>>>>>>>## 2 3 1
>>>>>>>>>>>#c1 10 3 0
>>>>>>>>>>>#c2 12 3 0
>>>>>>>>>>>#c3 13 4 0
>>>>>>>>>>>#t1 10 4 0
>>>>>>>>>>>#t2 12 6 1
>>>>>>>>>>>
>>>>>>>>>>> print(freq.f)
>>>>>>>>>>> ### 2 3 1
>>>>>>>>>>>#c1 7 2 0
>>>>>>>>>>>#c2 8 2 0
>>>>>>>>>>>#c3 6 4 0
>>>>>>>>>>>#t1 7 4 0
>>>>>>>>>>>#t2 10 5 1
>>>>>>>>>>>
>>>>>>>>>>> print(freq.rel.i)
>>>>>>>>>>>### 2 3 1
>>>>>>>>>>>#c1 0.7692308 0.2307692 0.00000000
>>>>>>>>>>>#c2 0.8000000 0.2000000 0.00000000
>>>>>>>>>>>#c3 0.7647059 0.2352941 0.00000000
>>>>>>>>>>>#t1 0.7142857 0.2857143 0.00000000
>>>>>>>>>>>#t2 0.6315789 0.3157895 0.05263158
>>>>>>>>>>> print(freq.rel.f)
>>>>>>>>>>>
>>>>>>>>>>>### 2 3 1
>>>>>>>>>>>#c1 0.7777778 0.2222222 0.0000
>>>>>>>>>>>#c2 0.8000000 0.2000000 0.0000
>>>>>>>>>>>#c3 0.6000000 0.4000000 0.0000
>>>>>>>>>>>#t1 0.6363636 0.3636364 0.0000
>>>>>>>>>>>#t2 0.6250000 0.3125000 0.0625
>>>>>>>>>>>
>>>>>>>>>>>#Graph plot
>>>>>>>>>>>colour<-sample(rainbow(nrow(freq.rel.i)))
>>>>>>>>>>>par(mfrow=c(1,2))
>>>>>>>>>>>barplot(freq.rel.i,beside=T,main=("Sample"),xlab="Charge",ylab="Relative Frequencies",col=colour,legend.text = rownames(freq.rel.i))
>>>>>>>>>>>barplot(freq.rel.f,beside=T,main=("Sample with FDR<0.01"),xlab="Charge",ylab="Relative Frequencies",col=colour,legend.text = rownames(freq.rel.f))
>>>>>>>>>>>
>>>>>>>>>>>#average of the group (except c1&t1)
>>>>>>>>>>>freqs<-rbind(dcf[-1,], dtf[-1,])
>>>>>>>>>>>average<-apply(freqs,2,mean)
>>>>>>>>>>>print(average)
>>>>>>>>>>>
>>>>>>>>>>>### 2 3 1
>>>>>>>>>>>#8.0000000 3.6666667 0.3333333
>>>>>>>>>>>
>>>>>>>>>>>#chisquare test function
>>>>>>>>>>>chisq.test<-function(x,y){
>>>>>>>>>>> somax<-sum(x)
>>>>>>>>>>> somay<-sum(y)
>>>>>>>>>>> nj.<-x+y
>>>>>>>>>>> nj<-sum(nj.)
>>>>>>>>>>> ejx<-(nj./nj)*somax
>>>>>>>>>>> ejy<-(nj./nj)*somay
>>>>>>>>>>> ETx<-((x-ejx)^2)/ejx
>>>>>>>>>>> ETy<-((y-ejy)^2)/ejy
>>>>>>>>>>> ETobs<-sum(ETx)+sum(ETy)
>>>>>>>>>>> pvalue<-1-pchisq(c(ETobs),df=length(x|y)-1,lower.tail=TRUE)
>>>>>>>>>>> return(pvalue)
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>#pvalues of the chisquare test between sample and average (H0: two samples has the same distribution)
>>>>>>>>>>>pvalues<-c()
>>>>>>>>>>>for (i in 1:(nrow(freqs))){
>>>>>>>>>>>a<-chisq.test(freqs[i,],average)
>>>>>>>>>>>pvalues<-c(pvalues,a)
>>>>>>>>>>>}
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>#data frame with final p-values
>>>>>>>>>>>dataframe<-data.frame(c(rownames(freqs)), c(pvalues))
>>>>>>>>>>>colnames(dataframe)<-c("sample name","pvalue")
>>>>>>>>>>>print(dataframe)
>>>>>>>>>>>
>>>>>>>>>>>### sample name pvalue
>>>>>>>>>>>#1 c2 0.7235907
>>>>>>>>>>>#2 c3 0.7963287
>>>>>>>>>>>#3 0.9079200
>>>>>>>>>>>}
>>>>>>>>>>>z.plot("C:/Users/Vera Costa/Desktop/dados",23)
>>>>>>>>>>>
>>>>>>>>>>>###and two barplots..
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Here, I remove the group a1.
>>>>>>>>>>>
>>>>>>>>>>>Thank you
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>2013/2/19 arun <smartpink111 at yahoo.com>
>>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>Could you send the results for the folder that was sent to me? It will be easy for me.
>>>>>>>>>>>>
>>>>>>>>>>>>Arun
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>________________________________
>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>Sent: Tuesday, February 19, 2013 3:47 PM
>>>>>>>>>>>>
>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>Oh sorry, I change the folder.
>>>>>>>>>>>>
>>>>>>>>>>>>I send for your folder
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>2013/2/19 arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>
>>>>>>>>>>>>Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regarding the results, is it from the same folder that you sent to me??
>>>>>>>>>>>>>I am getting different results by running your steps.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>direct<- list.files(recursive=TRUE)
>>>>>>>>>>>>> direct
>>>>>>>>>>>>>#[1] "a1/MSMS_23PepInfo.txt" "c1/MSMS_23PepInfo.txt" "c2/MSMS_23PepInfo.txt"
>>>>>>>>>>>>>#[4] "c3/MSMS_23PepInfo.txt" "t1/MSMS_23PepInfo.txt" "t2/MSMS_23PepInfo.txt"
>>>>>>>>>>>>>
>>>>>>>>>>>>> directT<- list.files(recursive=TRUE)[grepl("^t",dir())]
>>>>>>>>>>>>>
>>>>>>>>>>>>>directT
>>>>>>>>>>>>>#[1] "t1/MSMS_23PepInfo.txt" "t2/MSMS_23PepInfo.txt"
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>directC<- list.files(recursive=TRUE)[grepl("^c",dir())]
>>>>>>>>>>>>>
>>>>>>>>>>>>>directC
>>>>>>>>>>>>>#[1] "c1/MSMS_23PepInfo.txt" "c2/MSMS_23PepInfo.txt" "c3/MSMS_23PepInfo.txt"
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>lista<- lapply(direct,function(x) read.table(x,header=TRUE,stringsAsFactors=FALSE,sep="\t",fill=TRUE))
>>>>>>>>>>>>>
>>>>>>>>>>>>>listaT<-lapply(directT, function(x) read.table(x,header=TRUE, sep = "\t",fill=TRUE))
>>>>>>>>>>>>>listaC<-lapply(directC, function(x) read.table(x,header=TRUE, sep = "\t",fill=TRUE))
>>>>>>>>>>>>>
>>>>>>>>>>>>> #count different z values
>>>>>>>>>>>>> cab <- vector()
>>>>>>>>>>>>> for (i in 1:length(lista)) {
>>>>>>>>>>>>> dc<-lista[[i]][ifelse(lista[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>>>> dc<-table(dc$z)
>>>>>>>>>>>>> cab <- c(cab, names(dc))
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> #Relative freqs to construct the graph
>>>>>>>>>>>>> cab <- unique(cab)
>>>>>>>>>>>>> print(cab)
>>>>>>>>>>>>>
>>>>>>>>>>>>>#[1] "1" "2" "3" #Here results are not correct
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>d <- matrix(ncol=length(cab))
>>>>>>>>>>>>> dci<- d[-1,]
>>>>>>>>>>>>> dcf <- d[-1,]
>>>>>>>>>>>>> dti <- d[-1,]
>>>>>>>>>>>>> dtf <- d[-1,]
>>>>>>>>>>>>>
>>>>>>>>>>>>> for (i in 1:length(listaC)) {
>>>>>>>>>>>>>
>>>>>>>>>>>>> #Relative freq of all data
>>>>>>>>>>>>> dcc<-listaC[[i]]
>>>>>>>>>>>>> dcc<-table(factor(dcc$z, levels=cab))
>>>>>>>>>>>>> dci<- rbind(dci, dcc)
>>>>>>>>>>>>> rownames(dci)<-rownames(1:(nrow(dci)), do.NULL = FALSE, prefix = "c")
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> #Relative freq of data with FDR<0.01
>>>>>>>>>>>>> dcc1<-listaC[[i]][ifelse(listaC[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>>>> dcc1<-table(factor(dcc1$z, levels=cab))
>>>>>>>>>>>>> dcf<- rbind(dcf,dcc1)
>>>>>>>>>>>>> rownames(dcf)<-rownames(1:(nrow(dcf)), do.NULL = FALSE, prefix = "c")
>>>>>>>>>>>>> }
>>>>>>>>>>>>> print(dci) #here too.
>>>>>>>>>>>>>
>>>>>>>>>>>>># 1 2 3
>>>>>>>>>>>>>#c1 0 10 3
>>>>>>>>>>>>>#c2 0 12 3
>>>>>>>>>>>>>#c3 0 13 4
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>It is important to clear this before I make any changes to the script. You need to send me the output of the same data folder to understand what is going on.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>Arun
>>>>>>>>>>>>>________________________________
>>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>Sent: Tuesday, February 19, 2013 9:24 AM
>>>>>>>>>>>>>
>>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>Ok.
>>>>>>>>>>>>>
>>>>>>>>>>>>>Here is the code and some outputs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>z.plot <- function(directory,number) {
>>>>>>>>>>>>> #reading data
>>>>>>>>>>>>> setwd(directory)
>>>>>>>>>>>>> direct<-dir(directory,pattern = paste("MSMS_",number,"PepInfo.txt",sep=""), full.names = FALSE, recursive = TRUE)
>>>>>>>>>>>>> directT <- direct[grepl("^t", direct)]
>>>>>>>>>>>>> directC <- direct[grepl("^c", direct)]
>>>>>>>>>>>>>
>>>>>>>>>>>>> lista<-lapply(direct, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>>>>>> listaC<-lapply(directC, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>>>>>> listaT<-lapply(directT, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>>>>>>
>>>>>>>>>>>>> #count different z values
>>>>>>>>>>>>> cab <- vector()
>>>>>>>>>>>>> for (i in 1:length(lista)) {
>>>>>>>>>>>>> dc<-lista[[i]][ifelse(lista[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>>>> dc<-table(dc$z)
>>>>>>>>>>>>> cab <- c(cab, names(dc))
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> #Relative freqs to construct the graph
>>>>>>>>>>>>> cab <- unique(cab)
>>>>>>>>>>>>> print(cab)
>>>>>>>>>>>>>
>>>>>>>>>>>>>###[1] "1" "2" "3" "4" "5"
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> d <- matrix(ncol=length(cab))
>>>>>>>>>>>>> dci<- d[-1,]
>>>>>>>>>>>>> dcf <- d[-1,]
>>>>>>>>>>>>> dti <- d[-1,]
>>>>>>>>>>>>> dtf <- d[-1,]
>>>>>>>>>>>>>
>>>>>>>>>>>>> for (i in 1:length(listaC)) {
>>>>>>>>>>>>>
>>>>>>>>>>>>> #Relative freq of all data
>>>>>>>>>>>>> dcc<-listaC[[i]]
>>>>>>>>>>>>> dcc<-table(factor(dcc$z, levels=cab))
>>>>>>>>>>>>> dci<- rbind(dci, dcc)
>>>>>>>>>>>>> rownames(dci)<-rownames(1:(nrow(dci)), do.NULL = FALSE, prefix = "c")
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> #Relative freq of data with FDR<0.01
>>>>>>>>>>>>> dcc1<-listaC[[i]][ifelse(listaC[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>>>> dcc1<-table(factor(dcc1$z, levels=cab))
>>>>>>>>>>>>> dcf<- rbind(dcf,dcc1)
>>>>>>>>>>>>> rownames(dcf)<-rownames(1:(nrow(dcf)), do.NULL = FALSE, prefix = "c")
>>>>>>>>>>>>> }
>>>>>>>>>>>>> print(dci)
>>>>>>>>>>>>>
>>>>>>>>>>>>>### 1 2 3 4 5
>>>>>>>>>>>>>#c1 93 8356 3621 450 55
>>>>>>>>>>>>>#c2 108 13513 6859 793 73
>>>>>>>>>>>>>#c3 97 13526 6724 739 82
>>>>>>>>>>>>>#c4 101 13417 6574 761 62
>>>>>>>>>>>>>
>>>>>>>>>>>>> print(dcf)
>>>>>>>>>>>>>
>>>>>>>>>>>>>### 1 2 3 4 5
>>>>>>>>>>>>>#c1 10 4576 2100 199 17
>>>>>>>>>>>>>#c2 7 7831 4039 314 23
>>>>>>>>>>>>>#c3 16 7887 4087 286 22
>>>>>>>>>>>>>#c4 20 7824 4045 311 20
>>>>>>>>>>>>>
>>>>>>>>>>>>> for (i in 1:length(listaT)) {
>>>>>>>>>>>>>
>>>>>>>>>>>>> #Relative freq of all data
>>>>>>>>>>>>> dct<-listaT[[i]]
>>>>>>>>>>>>> dct<-table(factor(dct$z, levels=cab))
>>>>>>>>>>>>> dti<- rbind(dti, dct)
>>>>>>>>>>>>> rownames(dti)<-rownames(1:(nrow(dti)), do.NULL = FALSE, prefix = "t")
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> #Relative freq of data with FDR<0.01
>>>>>>>>>>>>> dct1<-listaT[[i]][ifelse(listaT[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>>>> dct1<-table(factor(dct1$z, levels=cab))
>>>>>>>>>>>>> dtf<- rbind(dtf,dct1)
>>>>>>>>>>>>> rownames(dtf)<-rownames(1:(nrow(dtf)), do.NULL = FALSE, prefix = "t")
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> print(dti)
>>>>>>>>>>>>>
>>>>>>>>>>>>>### 1 2 3 4 5
>>>>>>>>>>>>>#t1 32 8640 4098 429 36
>>>>>>>>>>>>>#t2 128 13209 6723 788 75
>>>>>>>>>>>>>#t3 85 13043 6691 754 82
>>>>>>>>>>>>>#t4 139 13750 7036 807 84
>>>>>>>>>>>>>
>>>>>>>>>>>>> print(dtf)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>#### 1 2 3 4 5
>>>>>>>>>>>>>#t1 5 4885 2571 196 8
>>>>>>>>>>>>>#t2 12 7752 4209 360 28
>>>>>>>>>>>>>#t3 19 7563 4086 336 18
>>>>>>>>>>>>>#t4 14 8108 4218 312 26
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> freq.i<-rbind(dci,dti)
>>>>>>>>>>>>> freq.f<-rbind(dcf,dtf)
>>>>>>>>>>>>> freq.rel.i<-freq.i/apply(freq.i,1,sum)
>>>>>>>>>>>>> freq.rel.f<-freq.f/apply(freq.f,1,sum)
>>>>>>>>>>>>> print(freq.i)
>>>>>>>>>>>>>## 1 2 3 4 5
>>>>>>>>>>>>>#c1 93 8356 3621 450 55
>>>>>>>>>>>>>#c2 108 13513 6859 793 73
>>>>>>>>>>>>>#c3 97 13526 6724 739 82
>>>>>>>>>>>>>#c4 101 13417 6574 761 62
>>>>>>>>>>>>>#t1 32 8640 4098 429 36
>>>>>>>>>>>>>#t2 128 13209 6723 788 75
>>>>>>>>>>>>>#t3 85 13043 6691 754 82
>>>>>>>>>>>>>#t4 139 13750 7036 807 84
>>>>>>>>>>>>>
>>>>>>>>>>>>> print(freq.f)
>>>>>>>>>>>>> ### 1 2 3 4 5
>>>>>>>>>>>>>#c1 10 4576 2100 199 17
>>>>>>>>>>>>>#c2 7 7831 4039 314 23
>>>>>>>>>>>>>#c3 16 7887 4087 286 22
>>>>>>>>>>>>>#c4 20 7824 4045 311 20
>>>>>>>>>>>>>#t1 5 4885 2571 196 8
>>>>>>>>>>>>>#t2 12 7752 4209 360 28
>>>>>>>>>>>>>#t3 19 7563 4086 336 18
>>>>>>>>>>>>>#t4 14 8108 4218 312 26
>>>>>>>>>>>>>
>>>>>>>>>>>>> print(freq.rel.i)
>>>>>>>>>>>>>### 1 2 3 4 5
>>>>>>>>>>>>>#c1 0.007395626 0.6644930 0.2879523 0.03578529 0.004373757
>>>>>>>>>>>>>#c2 0.005059496 0.6330460 0.3213248 0.03714982 0.003419844
>>>>>>>>>>>>>#c3 0.004582389 0.6389834 0.3176493 0.03491119 0.003873772
>>>>>>>>>>>>>#c4 0.004829070 0.6415013 0.3143199 0.03638537 0.002964380
>>>>>>>>>>>>>#t1 0.002417832 0.6528145 0.3096335 0.03241405 0.002720060
>>>>>>>>>>>>>#t2 0.006117670 0.6313148 0.3213210 0.03766190 0.003584572
>>>>>>>>>>>>>#t3 0.004115226 0.6314694 0.3239409 0.03650448 0.003969983
>>>>>>>>>>>>>#t4 0.006371470 0.6302714 0.3225156 0.03699120 0.003850385
>>>>>>>>>>>>> print(freq.rel.f)
>>>>>>>>>>>>>
>>>>>>>>>>>>>### 1 2 3 4 5
>>>>>>>>>>>>>#c1 0.0014488554 0.6629962 0.3042596 0.02883222 0.002463054
>>>>>>>>>>>>>#c2 0.0005731128 0.6411495 0.3306861 0.02570820 0.001883085
>>>>>>>>>>>>>#c3 0.0013010246 0.6413238 0.3323305 0.02325581 0.001788909
>>>>>>>>>>>>>#c4 0.0016366612 0.6402619 0.3310147 0.02545008 0.001636661
>>>>>>>>>>>>>#t1 0.0006523157 0.6373125 0.3354207 0.02557078 0.001043705
>>>>>>>>>>>>>#t2 0.0009707952 0.6271337 0.3405064 0.02912386 0.002265189
>>>>>>>>>>>>>#t3 0.0015804359 0.6290967 0.3398769 0.02794876 0.001497255
>>>>>>>>>>>>>#t4 0.0011042751 0.6395330 0.3327023 0.02460956 0.002050797
>>>>>>>>>>>>>
>>>>>>>>>>>>>#Graph plot
>>>>>>>>>>>>>colour<-sample(rainbow(nrow(freq.rel.i)))
>>>>>>>>>>>>>par(mfrow=c(1,2))
>>>>>>>>>>>>>barplot(freq.rel.i,beside=T,main=("Sample"),xlab="Charge",ylab="Relative Frequencies",col=colour,legend.text = rownames(freq.rel.i))
>>>>>>>>>>>>>barplot(freq.rel.f,beside=T,main=("Sample with FDR<0.01"),xlab="Charge",ylab="Relative Frequencies",col=colour,legend.text = rownames(freq.rel.f))
>>>>>>>>>>>>>
>>>>>>>>>>>>>#average of the group (except c1&t1)
>>>>>>>>>>>>>freqs<-rbind(dcf[-1,], dtf[-1,])
>>>>>>>>>>>>>average<-apply(freqs,2,mean)
>>>>>>>>>>>>>print(average)
>>>>>>>>>>>>>
>>>>>>>>>>>>>### 1 2 3 4 5
>>>>>>>>>>>>> # 14.66667 7827.50000 4114.00000 319.83333 22.83333
>>>>>>>>>>>>>
>>>>>>>>>>>>>#chisquare test function
>>>>>>>>>>>>>chisq.test<-function(x,y){
>>>>>>>>>>>>> somax<-sum(x)
>>>>>>>>>>>>> somay<-sum(y)
>>>>>>>>>>>>> nj.<-x+y
>>>>>>>>>>>>> nj<-sum(nj.)
>>>>>>>>>>>>> ejx<-(nj./nj)*somax
>>>>>>>>>>>>> ejy<-(nj./nj)*somay
>>>>>>>>>>>>> ETx<-((x-ejx)^2)/ejx
>>>>>>>>>>>>> ETy<-((y-ejy)^2)/ejy
>>>>>>>>>>>>> ETobs<-sum(ETx)+sum(ETy)
>>>>>>>>>>>>> pvalue<-1-pchisq(c(ETobs),df=length(x|y)-1,lower.tail=TRUE)
>>>>>>>>>>>>> return(pvalue)
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>#pvalues of the chisquare test between sample and average (H0: two samples has the same distribution)
>>>>>>>>>>>>>pvalues<-c()
>>>>>>>>>>>>>for (i in 1:(nrow(freqs))){
>>>>>>>>>>>>>a<-chisq.test(freqs[i,],average)
>>>>>>>>>>>>>pvalues<-c(pvalues,a)
>>>>>>>>>>>>>}
>>>>>>>>>>>>>print(pvalues)
>>>>>>>>>>>>>##[1] 0.5307206 0.6849480 0.8332661 0.3474956 0.5546527 0.9387602
>>>>>>>>>>>>>
>>>>>>>>>>>>>#data frame with final p-values
>>>>>>>>>>>>>dataframe<-data.frame(c(rownames(freqs)), c(pvalues))
>>>>>>>>>>>>>colnames(dataframe)<-c("sample name","pvalue")
>>>>>>>>>>>>>print(dataframe)
>>>>>>>>>>>>>
>>>>>>>>>>>>>### sample name pvalue
>>>>>>>>>>>>>#1 c2 0.5307206
>>>>>>>>>>>>>#2 c3 0.6849480
>>>>>>>>>>>>>#3 c4 0.8332661
>>>>>>>>>>>>>#4 t2 0.3474956
>>>>>>>>>>>>>#5 t3 0.5546527
>>>>>>>>>>>>>#6 t4 0.9387602
>>>>>>>>>>>>>}
>>>>>>>>>>>>>z.plot("C:/Users/Vera Costa/Desktop/dados",23)
>>>>>>>>>>>>>
>>>>>>>>>>>>>###and two barplots...
>>>>>>>>>>>>>
>>>>>>>>>>>>>Thank you
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>2013/2/19 arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>Got it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>So, if I run your codes that you sent yesterday, will I get the correct results for relative frequency etc. It would be also great if you can sent me the output generated using your codes (on two groups as you showed yesterday). It will help me in checking results much faster than running your code and see if that is the result (because I have to do some adjustment to your code for running in linux especially the ?dir()).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>I may be able to run it only later.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Arun
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>________________________________
>>>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>Sent: Tuesday, February 19, 2013 8:53 AM
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>I sent in second email.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>But I send again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>2013/2/19 arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Your attachment didn't came through.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Arun
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>________________________________
>>>>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>Sent: Tuesday, February 19, 2013 8:47 AM
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Sorry about a lot of questions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>I attach a small part of my real data (I have a lot of row).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>My main objective is construct two graph. The first with the relative frequencies of each group (c1,c2,c3....). The second with the same frequencies but with FDR<0.01.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>After that I need to do the average in each group (but without the first group-c1,t1,a1....) and do the qui square test to see if the groups has the same distribution. You understand?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>At first, I had only two groups, and I did the code that I sent you. But I need a general code, not for two groups that I know the names, but for all groups (sometimes I can have 7 or 8 or 9 groups).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>it´s better now my explanation? :-)
>>>>>>>>>>>>>>>My English isn't also very good :-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Please not publish this data in forum...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Thank you
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>2013/2/18 arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>I run the codes to understand what was going on.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>I didn't fully understand it as you constructed the codes for your original dataset and not for the 'data` directory you sent to me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>A.K.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>________________________________
>>>>>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>>Sent: Monday, February 18, 2013 4:02 PM
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Thank you.
>>>>>>>>>>>>>>>>I don't need the same,but equivalent. I will try your suggestions.
>>>>>>>>>>>>>>>>Thank you.
>>>>>>>>>>>>>>>>No dia 18 de Fev de 2013 19:41, "arun" <smartpink111 at yahoo.com> escreveu:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Hi,
>>>>>>>>>>>>>>>>>I am not able to open your graph. I am using linux.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>Also, the codes in the function are not reproducible
>>>>>>>>>>>>>>>>> directT <- direct[grepl("^t", direct)]
>>>>>>>>>>>>>>>>> directC <- direct[grepl("^c", direct)]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>It takes double the time to know what is going on.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>dir()
>>>>>>>>>>>>>>>>>#[1] "a1" "a2" "a3" "b1" "b2" "c1"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>direct<- list.files(recursive=TRUE)[grepl("^a|^b",dir())]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> direct
>>>>>>>>>>>>>>>>>#[1] "MSMS_23PepInfo.txt" "MSMS_23PepInfo.txt" "MSMS_23PepInfo.txt"
>>>>>>>>>>>>>>>>>#[4] "MSMS_23PepInfo.txt" "MSMS_23PepInfo.txt"
>>>>>>>>>>>>>>>>>directA<- list.files(recursive=TRUE)[grepl("^a",dir())]
>>>>>>>>>>>>>>>>>directB<- list.files(recursive=TRUE)[grepl("^b",dir())]
>>>>>>>>>>>>>>>>>lista<- lapply(direct,function(x) read.table(x,header=TRUE,stringsAsFactors=FALSE,sep="\t",fill=TRUE))
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>listaA<-lapply(directA, function(x) read.table(x,header=TRUE, sep = "\t",fill=TRUE))
>>>>>>>>>>>>>>>>>listaB<-lapply(directB, function(x) read.table(x,header=TRUE, sep = "\t",fill=TRUE))
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>#here I am changing the names listaT, z, etc..
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>count different mm values
>>>>>>>>>>>>>>>>> cab <- vector()
>>>>>>>>>>>>>>>>> for (i in 1:length(lista)) {
>>>>>>>>>>>>>>>>> dc<-lista[[i]][ifelse(lista[[i]]$b<0.01, TRUE, FALSE),]
>>>>>>>>>>>>>>>>> dc<-table(dc$mm)
>>>>>>>>>>>>>>>>> cab <- c(cab, names(dc))
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Relative freqs to construct the graph
>>>>>>>>>>>>>>>>> cab <- unique(cab)
>>>>>>>>>>>>>>>>> d <- matrix(ncol=length(cab))
>>>>>>>>>>>>>>>>> dci<- d[-1,]
>>>>>>>>>>>>>>>>> dcf <- d[-1,]
>>>>>>>>>>>>>>>>> dti <- d[-1,]
>>>>>>>>>>>>>>>>> dtf <- d[-1,]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ########################################
>>>>>>>>>>>>>>>>> for (i in 1:length(listaA)) {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Relative freq of all data
>>>>>>>>>>>>>>>>> dcc<-listaA[[i]]
>>>>>>>>>>>>>>>>> dcc<-table(factor(dcc$mm, levels=cab))
>>>>>>>>>>>>>>>>> dci<- rbind(dci, dcc)
>>>>>>>>>>>>>>>>> rownames(dci)<-rownames(1:(nrow(dci)), do.NULL = FALSE, prefix = "a")
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Relative freq of data with FDR<0.01
>>>>>>>>>>>>>>>>> dcc1<-listaA[[i]][ifelse(listaA[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>>>>>>>> dcc1<-table(factor(dcc1$mm, levels=cab))
>>>>>>>>>>>>>>>>> dcf<- rbind(dcf,dcc1)
>>>>>>>>>>>>>>>>> rownames(dcf)<-rownames(1:(nrow(dcf)), do.NULL = FALSE, prefix = "a")
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> for (i in 1:length(listaB)) {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Relative freq of all data
>>>>>>>>>>>>>>>>> dct<-listaB[[i]]
>>>>>>>>>>>>>>>>> dct<-table(factor(dct$mm, levels=cab))
>>>>>>>>>>>>>>>>> dti<- rbind(dti, dct)
>>>>>>>>>>>>>>>>> rownames(dti)<-rownames(1:(nrow(dti)), do.NULL = FALSE, prefix = "b")
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Relative freq of data with FDR<0.01
>>>>>>>>>>>>>>>>> dct1<-listaB[[i]][ifelse(listaB[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>>>>>>>> dct1<-table(factor(dct1$mm, levels=cab))
>>>>>>>>>>>>>>>>> dtf<- rbind(dtf,dct1)
>>>>>>>>>>>>>>>>> rownames(dtf)<-rownames(1:(nrow(dtf)), do.NULL = FALSE, prefix = "b")
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> freq.i<-rbind(dci,dti)
>>>>>>>>>>>>>>>>> freq.f<-rbind(dcf,dtf)
>>>>>>>>>>>>>>>>> freq.rel.i<-freq.i/apply(freq.i,1,sum)
>>>>>>>>>>>>>>>>> freq.rel.f<-freq.f/apply(freq.f,1,sum)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> freq.i
>>>>>>>>>>>>>>>>># 2 3
>>>>>>>>>>>>>>>>>#a1 4 1
>>>>>>>>>>>>>>>>>#a2 4 1
>>>>>>>>>>>>>>>>>#a3 4 1
>>>>>>>>>>>>>>>>>#b1 4 1
>>>>>>>>>>>>>>>>>#b2 4 1
>>>>>>>>>>>>>>>>>#b3 4 1
>>>>>>>>>>>>>>>>>#b4 4 1
>>>>>>>>>>>>>>>>>#result from my code.
>>>>>>>>>>>>>>>>> files<-paste("MSMS_",23,"PepInfo.txt",sep="")
>>>>>>>>>>>>>>>>>read.data<-function(x) {names(x)<-gsub("^(.*)\\/.*","\\1",x); lapply(x,function(y) read.table(y,header=TRUE,sep = "\t",stringsAsFactors=FALSE,fill=TRUE))}
>>>>>>>>>>>>>>>>>lista<-do.call("c",lapply(list.files(recursive=T)[grep(files,list.files(recursive=T))],read.data))
>>>>>>>>>>>>>>>>>names(lista)<-paste("group_",gsub("\\d+","",names(lista)),sep="")
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>res2<-split(lista,names(lista))
>>>>>>>>>>>>>>>>>res3<- lapply(res2,function(x) {names(x)<-paste(gsub(".*_","",names(x)),1:length(x),sep="");x})
>>>>>>>>>>>>>>>>>res4<-lapply(seq_along(res3),function(i) do.call(rbind,lapply(res3[[i]], function(x) table(x$mm[x[["b"]]<0.01]))))
>>>>>>>>>>>>>>>>> names(res4)<- names(res2)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>res4
>>>>>>>>>>>>>>>>>$group_a
>>>>>>>>>>>>>>>>># 2 3
>>>>>>>>>>>>>>>>>#a1 3 1
>>>>>>>>>>>>>>>>>#a2 3 1
>>>>>>>>>>>>>>>>>#a3 3 1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>#$group_b
>>>>>>>>>>>>>>>>> # 2 3
>>>>>>>>>>>>>>>>>#b1 3 1
>>>>>>>>>>>>>>>>>#b2 3 1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>#$group_c
>>>>>>>>>>>>>>>>> # 2 3
>>>>>>>>>>>>>>>>>#c1 3 1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>There is a difference in output from freq.i and res4. There were only two files under 'group_b`. So, check your codes.
>>>>>>>>>>>>>>>>>A.K.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>________________________________
>>>>>>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>>>Sent: Monday, February 18, 2013 10:27 AM
>>>>>>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>Hi!!!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>I'm coming to ask a new question.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>I want a function to do my statistics. I start with you had send me:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>z.plot <- function(directory,number) {
>>>>>>>>>>>>>>>>> setwd(directory)
>>>>>>>>>>>>>>>>> indx<-gsub("[./]","",list.dirs())
>>>>>>>>>>>>>>>>> indx1<- indx[indx!=""]
>>>>>>>>>>>>>>>>> print(indx1)
>>>>>>>>>>>>>>>>> files<-paste("MSMS_",number,"PepInfo.txt",sep="")
>>>>>>>>>>>>>>>>> read.data<-function(x) {names(x)<-gsub("^(.*)\\/.*","\\1",x); lapply(x,function(y) read.table(y,header=TRUE,sep = "\t",stringsAsFactors=FALSE,fill=TRUE))}
>>>>>>>>>>>>>>>>> lista<-do.call("c",lapply(list.files(recursive=T)[grep(files,list.files(recursive=T))],read.data))
>>>>>>>>>>>>>>>>> print(lista)
>>>>>>>>>>>>>>>>> #names(lista)<-paste("group_",gsub("\\d+","",names(lista)),sep="") ve = TRUE)
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>z.plot("C:/Users/Vera Costa/Desktop/dados.lixo",23)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>In my lista I can´t merge rows to have the group, because the idea is for each file count frequencies of mm, when b<0.01. after that I want a graph like the graph in attach.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>When I had 2 groups and knew the name of the groups, I did the code (but Know I have more groups and, maybe, I don´t know the name of the groups):
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>z.plot <- function(directory,number) {
>>>>>>>>>>>>>>>>> #reading data
>>>>>>>>>>>>>>>>> setwd(directory)
>>>>>>>>>>>>>>>>> direct<-dir(directory,pattern = paste("MSMS_",number,"PepInfo.txt",sep=""), full.names = FALSE, recursive = TRUE)
>>>>>>>>>>>>>>>>> directT <- direct[grepl("^t", direct)]
>>>>>>>>>>>>>>>>> directC <- direct[grepl("^c", direct)]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> lista<-lapply(direct, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>>>>>>>>>> listaC<-lapply(directC, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>>>>>>>>>> listaT<-lapply(directT, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #count different z values
>>>>>>>>>>>>>>>>> cab <- vector()
>>>>>>>>>>>>>>>>> for (i in 1:length(lista)) {
>>>>>>>>>>>>>>>>> dc<-lista[[i]][ifelse(lista[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>>>>>>>> dc<-table(dc$z)
>>>>>>>>>>>>>>>>> cab <- c(cab, names(dc))
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Relative freqs to construct the graph
>>>>>>>>>>>>>>>>> cab <- unique(cab)
>>>>>>>>>>>>>>>>> d <- matrix(ncol=length(cab))
>>>>>>>>>>>>>>>>> dci<- d[-1,]
>>>>>>>>>>>>>>>>> dcf <- d[-1,]
>>>>>>>>>>>>>>>>> dti <- d[-1,]
>>>>>>>>>>>>>>>>> dtf <- d[-1,]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> for (i in 1:length(listaC)) {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Relative freq of all data
>>>>>>>>>>>>>>>>> dcc<-listaC[[i]]
>>>>>>>>>>>>>>>>> dcc<-table(factor(dcc$z, levels=cab))
>>>>>>>>>>>>>>>>> dci<- rbind(dci, dcc)
>>>>>>>>>>>>>>>>> rownames(dci)<-rownames(1:(nrow(dci)), do.NULL = FALSE, prefix = "c")
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Relative freq of data with FDR<0.01
>>>>>>>>>>>>>>>>> dcc1<-listaC[[i]][ifelse(listaC[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>>>>>>>> dcc1<-table(factor(dcc1$z, levels=cab))
>>>>>>>>>>>>>>>>> dcf<- rbind(dcf,dcc1)
>>>>>>>>>>>>>>>>> rownames(dcf)<-rownames(1:(nrow(dcf)), do.NULL = FALSE, prefix = "c")
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> for (i in 1:length(listaT)) {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Relative freq of all data
>>>>>>>>>>>>>>>>> dct<-listaT[[i]]
>>>>>>>>>>>>>>>>> dct<-table(factor(dct$z, levels=cab))
>>>>>>>>>>>>>>>>> dti<- rbind(dti, dct)
>>>>>>>>>>>>>>>>> rownames(dti)<-rownames(1:(nrow(dti)), do.NULL = FALSE, prefix = "t")
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Relative freq of data with FDR<0.01
>>>>>>>>>>>>>>>>> dct1<-listaT[[i]][ifelse(listaT[[i]]$FDR<0.01, TRUE, FALSE),]
>>>>>>>>>>>>>>>>> dct1<-table(factor(dct1$z, levels=cab))
>>>>>>>>>>>>>>>>> dtf<- rbind(dtf,dct1)
>>>>>>>>>>>>>>>>> rownames(dtf)<-rownames(1:(nrow(dtf)), do.NULL = FALSE, prefix = "t")
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> freq.i<-rbind(dci,dti)
>>>>>>>>>>>>>>>>> freq.f<-rbind(dcf,dtf)
>>>>>>>>>>>>>>>>> freq.rel.i<-freq.i/apply(freq.i,1,sum)
>>>>>>>>>>>>>>>>> freq.rel.f<-freq.f/apply(freq.f,1,sum)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>#Graph plot
>>>>>>>>>>>>>>>>>colour<-sample(rainbow(nrow(freq.rel.i)))
>>>>>>>>>>>>>>>>>par(mfrow=c(1,2))
>>>>>>>>>>>>>>>>>barplot(freq.rel.i,beside=T,main=("Sample"),xlab="Charge",ylab="Relative Frequencies",col=colour,legend.text = rownames(freq.rel.i))
>>>>>>>>>>>>>>>>>barplot(freq.rel.f,beside=T,main=("Sample with FDR<0.01"),xlab="Charge",ylab="Relative Frequencies",col=colour,legend.text = rownames(freq.rel.f))
>>>>>>>>>>>>>>>>>#average of the group (except c1&t1)
>>>>>>>>>>>>>>>>>freqs<-rbind(dcf[-1,], dtf[-1,])
>>>>>>>>>>>>>>>>>average<-apply(freqs,2,mean)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>#chisquare test function
>>>>>>>>>>>>>>>>>chisq.test<-function(x,y){
>>>>>>>>>>>>>>>>> somax<-sum(x)
>>>>>>>>>>>>>>>>> somay<-sum(y)
>>>>>>>>>>>>>>>>> nj.<-x+y
>>>>>>>>>>>>>>>>> nj<-sum(nj.)
>>>>>>>>>>>>>>>>> ejx<-(nj./nj)*somax
>>>>>>>>>>>>>>>>> ejy<-(nj./nj)*somay
>>>>>>>>>>>>>>>>> ETx<-((x-ejx)^2)/ejx
>>>>>>>>>>>>>>>>> ETy<-((y-ejy)^2)/ejy
>>>>>>>>>>>>>>>>> ETobs<-sum(ETx)+sum(ETy)
>>>>>>>>>>>>>>>>> pvalue<-1-pchisq(c(ETobs),df=length(x|y)-1,lower.tail=TRUE)
>>>>>>>>>>>>>>>>> return(pvalue)
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>#pvalues of the chisquare test between sample and average (H0: two samples has the same distribution)
>>>>>>>>>>>>>>>>>pvalues<-c()
>>>>>>>>>>>>>>>>>for (i in 1:(nrow(freqs))){
>>>>>>>>>>>>>>>>>a<-chisq.test(freqs[i,],average)
>>>>>>>>>>>>>>>>>pvalues<-c(pvalues,a)
>>>>>>>>>>>>>>>>>}
>>>>>>>>>>>>>>>>>#data frame with final p-values
>>>>>>>>>>>>>>>>>dataframe<-data.frame(c(rownames(freqs)), c(pvalues))
>>>>>>>>>>>>>>>>>colnames(dataframe)<-c("sample name","pvalue")
>>>>>>>>>>>>>>>>>print(dataframe)
>>>>>>>>>>>>>>>>>}
>>>>>>>>>>>>>>>>>z.plot("C:/Users/Vera/Desktop/data",23)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>Thank you again
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>2013/2/17 arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>HI Vera,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>No problem. I am cc:ing to r-help.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>A.K.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>________________________________
>>>>>>>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>>>>Sent: Sunday, February 17, 2013 5:44 AM
>>>>>>>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>Hi. Thank you. It works now:-)
>>>>>>>>>>>>>>>>>>And yes, I use windows.
>>>>>>>>>>>>>>>>>>Thank you very much.
>>>>>>>>>>>>>>>>>>No dia 17 de Fev de 2013 00:44, "arun" <smartpink111 at yahoo.com> escreveu:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>Hi Vera,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>Have you tried the suggestion?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>Are you using Windows?
>>>>>>>>>>>>>>>>>>>Thanks,
>>>>>>>>>>>>>>>>>>>Arun
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>________________________________
>>>>>>>>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>>>>>Sent: Saturday, February 16, 2013 7:10 PM
>>>>>>>>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>Thank you.
>>>>>>>>>>>>>>>>>>>In mine, I have an error " 'what' must be a character string or a function".
>>>>>>>>>>>>>>>>>>>I need to do equivalent in my system.
>>>>>>>>>>>>>>>>>>>Thank you and sorry one more time.
>>>>>>>>>>>>>>>>>>>No dia 16 de Fev de 2013 23:53, "arun" <smartpink111 at yahoo.com> escreveu:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>Hi,
>>>>>>>>>>>>>>>>>>>>You didn't mention what the error message or whether you are reading file names which are not "mmmmm11kk.txt".
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>It is workiing on my system as I run it again.
>>>>>>>>>>>>>>>>>>>>?c() combine values into a vector or list.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> sessionInfo()
>>>>>>>>>>>>>>>>>>>>R version 2.15.1 (2012-06-22)
>>>>>>>>>>>>>>>>>>>>Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>locale:
>>>>>>>>>>>>>>>>>>>> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
>>>>>>>>>>>>>>>>>>>> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
>>>>>>>>>>>>>>>>>>>> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
>>>>>>>>>>>>>>>>>>>> [7] LC_PAPER=C LC_NAME=C
>>>>>>>>>>>>>>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>>>>>>>>>>>>>>>>[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>attached base packages:
>>>>>>>>>>>>>>>>>>>>[1] stats graphics grDevices utils datasets methods base
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>other attached packages:
>>>>>>>>>>>>>>>>>>>>[1] stringr_0.6.2 reshape2_1.2.2
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>loaded via a namespace (and not attached):
>>>>>>>>>>>>>>>>>>>>[1] plyr_1.8
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>#code
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>res<-do.call(c,lapply(list.files(recursive=T)[grep("mmmmm11kk",list.files(recursive=T))],function(x) {names(x)<-gsub("^(.*)\\/.*","\\1",x); lapply(x,function(y) read.table(y,header=TRUE,stringsAsFactors=FALSE,fill=TRUE))})) #it seems like one of the rows of your file doesn't have 6 elements, so added fill=TRUE
>>>>>>>>>>>>>>>>>>>> names(res)<-paste("group_",gsub("\\d+","",names(res)),sep="")
>>>>>>>>>>>>>>>>>>>>res2<-split(res,names(res))
>>>>>>>>>>>>>>>>>>>>res3<- lapply(res2,function(x) {names(x)<-paste(gsub(".*_","",names(x)),1:length(x),sep="");x})
>>>>>>>>>>>>>>>>>>>>#result
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>res3
>>>>>>>>>>>>>>>>>>>>#$group_a
>>>>>>>>>>>>>>>>>>>>#$group_a$a1
>>>>>>>>>>>>>>>>>>>> Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>$group_a$a2
>>>>>>>>>>>>>>>>>>>> Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>$group_a$a3
>>>>>>>>>>>>>>>>>>>> Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>$group_b
>>>>>>>>>>>>>>>>>>>>$group_b$b1
>>>>>>>>>>>>>>>>>>>> Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>$group_b$b2
>>>>>>>>>>>>>>>>>>>> Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>$group_c
>>>>>>>>>>>>>>>>>>>>$group_c$c1
>>>>>>>>>>>>>>>>>>>> Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>A.K.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>________________________________
>>>>>>>>>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>>>>>>Sent: Saturday, February 16, 2013 6:32 PM
>>>>>>>>>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>Sorry again... In:
>>>>>>>>>>>>>>>>>>>>res<-do.call(c,lapply(list.files(recursive=T)[grep("...
>>>>>>>>>>>>>>>>>>>>What is this c? In do.call(c, When I put this row im R, I have an error.
>>>>>>>>>>>>>>>>>>>>Thank you
>>>>>>>>>>>>>>>>>>>>No dia 15 de Fev de 2013 18:11, "arun" <smartpink111 at yahoo.com> escreveu:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>Hi,
>>>>>>>>>>>>>>>>>>>>>No problem.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>BTW, these questions are not stupid..
>>>>>>>>>>>>>>>>>>>>>Arun
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>________________________________
>>>>>>>>>>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>>>>>>>Sent: Friday, February 15, 2013 1:08 PM
>>>>>>>>>>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>Thank you very much.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>I will try to apply and after I tell you if it is ok :-)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>Thank you and sorry about this questions (sometimes stupid questions).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>2013/2/15 arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>HI,
>>>>>>>>>>>>>>>>>>>>>>No problem.
>>>>>>>>>>>>>>>>>>>>>>?c() for concatenate to vector or list().
>>>>>>>>>>>>>>>>>>>>>>If I use do.call(cbind,..) or do.call(rbind,...)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>do.call(cbind,lapply(list.files(recursive=T)[grep("mmmmm11kk",list.files(recursive=T))],function(x) {names(x)<-gsub("^(.*)\\/.*","\\1",x); lapply(x,function(y) read.table(y,header=TRUE,stringsAsFactors=FALSE,fill=TRUE))}))
>>>>>>>>>>>>>>>>>>>>>># [,1] [,2] [,3] [,4] [,5] [,6]
>>>>>>>>>>>>>>>>>>>>>>#a1 List,11 List,11 List,11 List,11 List,11 List,11
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> do.call(rbind,lapply(list.files(recursive=T)[grep("mmmmm11kk",list.files(recursive=T))],function(x) {names(x)<-gsub("^(.*)\\/.*","\\1",x); lapply(x,function(y) read.table(y,header=TRUE,stringsAsFactors=FALSE,fill=TRUE))}))
>>>>>>>>>>>>>>>>>>>>>># a1
>>>>>>>>>>>>>>>>>>>>>>#[1,] List,11
>>>>>>>>>>>>>>>>>>>>>>#[2,] List,11
>>>>>>>>>>>>>>>>>>>>>>#[3,] List,11
>>>>>>>>>>>>>>>>>>>>>>#[4,] List,11
>>>>>>>>>>>>>>>>>>>>>>#[5,] List,11
>>>>>>>>>>>>>>>>>>>>>>#[6,] List,11
>>>>>>>>>>>>>>>>>>>>>>ie.
>>>>>>>>>>>>>>>>>>>>>>list within in a list
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> restrial<-lapply(list.files(recursive=T)[grep("mmmmm11kk",list.files(recursive=T))],function(x) {names(x)<-gsub("^(.*)\\/.*","\\1",x); lapply(x,function(y) read.table(y,header=TRUE,stringsAsFactors=FALSE,fill=TRUE))})
>>>>>>>>>>>>>>>>>>>>>> str(restrial)
>>>>>>>>>>>>>>>>>>>>>>#List of 6
>>>>>>>>>>>>>>>>>>>>>># $ :List of 1
>>>>>>>>>>>>>>>>>>>>>> #..$ a1:'data.frame': 6 obs. of 11 variables:
>>>>>>>>>>>>>>>>>>>>>> .#. ..$ Id: chr [1:6] "aAA" "aAAAA" "aA" "aAA" ...
>>>>>>>>>>>>>>>>>>>>>> #.. ..$ M : chr [1:6] "1" "1" "2" "1" ...
>>>>>>>>>>>>>>>>>>>>>> #. ..$ mm: int [1:6] 2 2 1 2 3 2
>>>>>>>>>>>>>>>>>>>>>> #. ..$ x : int [1:6] 739 2263 1 1965 3660 1972
>>>>>>>>>>>>>>>>>>>>>> -----------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>>>str(res)
>>>>>>>>>>>>>>>>>>>>>>#List of 6
>>>>>>>>>>>>>>>>>>>>>># $ a1:'data.frame': 6 obs. of 11 variables:
>>>>>>>>>>>>>>>>>>>>>> # ..$ Id: chr [1:6] "aAA" "aAAAA" "aA" "aAA" ...
>>>>>>>>>>>>>>>>>>>>>> #..$ M : chr [1:6] "1" "1" "2" "1" ...
>>>>>>>>>>>>>>>>>>>>>> # ..$ mm: int [1:6] 2 2 1 2 3 2
>>>>>>>>>>>>>>>>>>>>>> # ..$ x : int [1:6] 739 2263 1 1965 3660 1972
>>>>>>>>>>>>>>>>>>>>>>-----------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>You mentioned about naming this to "group_a","group_b". etc..
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> names(res)<-paste("group_",gsub("\\d+","",names(res)),sep="")
>>>>>>>>>>>>>>>>>>>>>>res2<-split(res,names(res))
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>res3<- lapply(res2,function(x) {names(x)<-paste(gsub(".*_","",names(x)),1:length(x),sep="");x})
>>>>>>>>>>>>>>>>>>>>>> res3$group_a
>>>>>>>>>>>>>>>>>>>>>>$a1
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>># Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>>>#1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>#2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>#3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>#4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>#5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>#6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>#$a2
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>># Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>>>#1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>#2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>#3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>#4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>#5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>#6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>#$a3
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>>>#1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>#2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>#3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>#4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>#5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>#6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>A.K.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>________________________________
>>>>>>>>>>>>>>>>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>>>>>>>>>>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>>>>>>>>Sent: Friday, February 15, 2013 12:39 PM
>>>>>>>>>>>>>>>>>>>>>>Subject: Re: reading data
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>Thank you very much and sorry my questions.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>But this code isn't grouping for letters sure? I mean, a1,a2,a3 is the same group, (the first letter give me the name of the group)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>Another question, in do.call, you did do.call (c,.....) .What is c?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>Sorry
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>2013/2/15 arun <smartpink111 at yahoo.com>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>HI,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>Just to add:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>res<-do.call(c,lapply(list.files(recursive=T)[grep("mmmmm11kk",list.files(recursive=T))],function(x) {names(x)<-gsub("^(.*)\\/.*","\\1",x); lapply(x,function(y) read.table(y,header=TRUE,stringsAsFactors=FALSE,fill=TRUE))})) #it seems like one of the rows of your file doesn't have 6 elements, so added fill=TRUE
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> names(res)<-paste("group_",gsub("\\d+","",names(res)),sep="")
>>>>>>>>>>>>>>>>>>>>>>>res[grep("group_b",names(res))]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>I am not sure how you want the grouped data to look like. If you want something like this:
>>>>>>>>>>>>>>>>>>>>>>>res1<-do.call(rbind,res)
>>>>>>>>>>>>>>>>>>>>>>>res2<-lapply(split(res1,gsub("[.0-9]","",row.names(res1))),function(x) {row.names(x)<-1:nrow(x);x})
>>>>>>>>>>>>>>>>>>>>>>>res2
>>>>>>>>>>>>>>>>>>>>>>>#$group_a
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> # Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>>>>#1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>>#2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>>#3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>>#4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>>#5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>>#6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>>#7 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>>#8 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>>#9 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>>#10 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>>#11 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>>#12 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>>#13 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>>#14 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>>#15 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>>#16 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>>#17 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>>#18 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>#$group_b
>>>>>>>>>>>>>>>>>>>>>>> # Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>>>>#1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>>#2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>>#3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>>#4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>>#5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>>#6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>>#7 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>>#8 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>>#9 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>>#10 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>>#11 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>>#12 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>#$group_c
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> # Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>>>>#1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>>#2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>>#3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>>#4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>>#5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>>#6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>#or if you want it like this:
>>>>>>>>>>>>>>>>>>>>>>>res2<-split(res,names(res))
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>res2[["group_b"]]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>#$group_b
>>>>>>>>>>>>>>>>>>>>>>># Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>>>>#1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>>#2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>>#3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>>#4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>>#5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>>#6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>#$group_b
>>>>>>>>>>>>>>>>>>>>>>> # Id M mm x b u k j y p v
>>>>>>>>>>>>>>>>>>>>>>>#1 aAA 1 2 739 0.1257000 2 2 AA 2 8867 8926
>>>>>>>>>>>>>>>>>>>>>>>#2 aAAAA 1 2 2263 0.0004000 2 2 AR 4 7640 8926
>>>>>>>>>>>>>>>>>>>>>>>#3 aA 2 1 1 0.0845435 2 AA 2 6790 734,1092 NA
>>>>>>>>>>>>>>>>>>>>>>>#4 aAA 1 2 1965 0.0007000 4 3 AR 2 11616 8926
>>>>>>>>>>>>>>>>>>>>>>>#5 aAAA 1 3 3660 0.0008600 18 3 AA 2 20392 496
>>>>>>>>>>>>>>>>>>>>>>>#6 AA na 2 1972 0.0007000 11 3 AR 25 509 734
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>Hope this helps.
>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>A.K.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>----- Original Message -----
>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>From: "veracosta.rt at gmail.com" <veracosta.rt at gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>To: smartpink111 at yahoo.com
>>>>>>>>>>>>>>>>>>>>>>>Cc:
>>>>>>>>>>>>>>>>>>>>>>>Sent: Friday, February 15, 2013 9:15 AM
>>>>>>>>>>>>>>>>>>>>>>>Subject: reading data
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>Hi,
>>>>>>>>>>>>>>>>>>>>>>>I post yesterday and you helped me. I have little problem.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>At first, I never worked with regular expressions...
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>The code that you gave me it's ok, but my files are inside the folders a1,a2,a3. I try to explain better.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>I have one folder named "data". Inside this folder I have some other folders named "a1","a2","b1",b2",...and inside of each one of that I have some files. I want only the file "mmmmmm.txt" (in all folders I have One file with this name).
>>>>>>>>>>>>>>>>>>>>>>>The name of the folder give me the name of the group,but I need to read the file inside. And after, have "group_a", group_"b"...because I need to work with this data grouped (and know the name of the group).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>Thank you.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>
More information about the R-help
mailing list