John Kane
jrkrideau at inbox.com
Tue Jul 24 17:18:45 CEST 2012
I think this does what you want using two packages, plyr and reshape2 that
you may have to install. If so install.packages("plyr", "reshape2") should
do the trick.
library(plyr)
library(reshape2)
# using supplied file 'myfile" from below
time0total = sum(myfile[,2])
mydata <- myfile[, 2:10]
md1 <- melt(mydata, id = "Time_zero")
ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total)
John Kane
Kingston ON Canada
From: zj29 at cornell.edu
Sent: Tue, 24 Jul 2012 10:25:21 -0400
Hi John,
Thank you for the tips. My apologies about the unreadable sample data...
So here is the output of the sample data, and hopefully it works this time
:)
myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", "p2",
"p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L",
"R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L,
2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 =
structure(c(1L,
2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L,
1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L,
3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 =
structure(c(1L,
1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names =
c("Proteins",
"Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names =
c(NA,
4L), class = "data.frame")
And here is my original question:
Basically, I have a bunch of protein sequences composed of different amino
acid residues, and each residue is represented by an uppercase letter. I
want to calculate the ratio of different amino acid residues at each
position of the proteins.
If I name this table as myfile.txt, I have the following scripts to
calculate the ratio of each amino acid residue at position 1:
# showing levels of the 3rd column, which means the types of residues
>myfile[,3]
# calculating the ratio of L
>list=c(which(myfile[,3]=="L"))
>time0total=sum(myfile[,2])
>AA_L=0
>for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>ratio_L=AA_L/time0total
So how can I write a script to do the same thing for the other two levels (T
and R) in column 3, and also do this for every column that contains amino
acid residues?
Thanks a lot!
Regards,
Zhao
First thing is to supply the data in a useable format. As is it is
essenatially unreadable. All R-beginners do this. :)
Have a look at the dput function (?dput) for a good way to supply sample
data in an email.
If you have a large dataset probably a few dozen lines of data would be
fine.
Something like dput(head(mydata)) should be fine. Just copy and paste the
output into your email.
Welcome to R. I think you will like it.
John Kane
Kingston ON Canada
> Dear all,
>
>
>
> I am a R beginner, and I am looking for a way to do the same thing for
> all
> levels of a column in a table.
>
>
>
> Basically, I have a bunch of protein sequences composed of different
> amino
> acid residues, and each residue is represented by an uppercase letter. I
> want to calculate the ratio of different amino acid residues at each
> position of the proteins. Here is an example table:
>
> Proteins
>
> Time_zero
>
> 1
>
> 2
>
> 3
>
> 4
>
> 5
>
> 6
>
> 7
>
> 8
>
> p1
>
> 0.0050723
>
> L
>
> E
>
> Y
>
> I
>
> I
>
> P
>
> D
>
> A
>
> p2
>
> 0.0002731
>
> T
>
> E
>
> N
>
> L
>
> V
>
> P
>
> G
>
> A
>
> p3
>
> 9.757E-05
>
> L
>
> M
>
> Y
>
> Q
>
> I
>
> P
>
> E
>
> C
>
> p4
>
> 0.0002077
>
> R
>
> E
>
> Y
>
> L
>
> I
>
> S
>
> E
>
> A
>
>
>
> If I name this table as myfile.txt, I have the following scripts to
> calculate the ratio of each amino acid residue at position 1:
>
> # showing levels of the 3rd column, which means the types of residues
>
> >myfile[,3]
>
>
>
> # calculating the ratio of L
>
> >list=c(which(myfile[,3]=="L"))
>
> >time0total=sum(myfile[,2])
>
> >AA_L=0
>
> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>
> >ratio_L=AA_L/time0total
>
>
>
> So how can I write a script to do the same thing for the other two levels
> (T and R) in column 3, and also do this for every column that contains
> amino acid residues?
>
>
>
> Many thanks for any help you could give me on this topic! :)
>
>
>
> Regards,
>
> Zhao
>
--
Zhao JIN
Ph.D. Candidate
Ruth Ley Lab
467 Biotech
Field of Microbiology, Cornell University
Lab: 607.255.4954
Cell: 412.889.3675
